Gesture Hero is a hand gesture classification system that's build to differentiate between hand gestures representing numbers from 0-5.
1. Installing required packages
pip install -r requirements.txt
2. To run for a specific dataset
python ./main.py
Technologies used in the project:
- Python
- scikit-learn
- OpenCV
- skimage span
Gesture Hero is a hand gesture classification system that's build to differentiate between hand gestures representing numbers from 0-5. Gesture Hero is a machine learning based tool, trianed on a dataset with almost 2000 pictures, using an excellent preprocessing, HOG features descriptor and SVM for an accurate answer to your problem.
Gesture Hero's strongest point is its preprocessing. By applying classical image processing techniques to preprocess the data for removing shadows, enhancing colors, clipping the area surrounding the hand, rotating so all hands would be pointing in the same direction and resizing for efficiency.
The main preprocessing approach taking is ignoring/ eliminating the channels with misleading information, mainly the ones that represent the illumination.
Segmentation is done using the YCRB channels using a basic thresholding and ignoring the Y(illumination channel), then cropping the photo to only contain the hand by finding the maximum contours surrounding the hand.
To unify the orientations of hands, the preprocessed image is passed to a function to flip it so that the fingers are pointing to the left, based on the histogram of the image, the more dense half represents the palm while the other represents the fingers.
Raw image | Preprocessed |
---|---|
The main feature descriptor used is the Histogram of Gradients (HOG), since it’s robust to variations in appearance, computationally efficient, its discriminative power as it’s very efficient in capturing the distinguishing features of an object, and lastly its compatibility with machine learning algorithms.
Preprocessed image | Visualized HOG |
---|---|
The chosen model is a support vector machine (SVM) with an RBF kernel, trianed on the HOG extracted features.
Resulting confusion matrix is as follows
Noticing that there's a great confusion between 2,3 and 4 gestures, a 2 layer classification method is used to improve the performance.
The fisrt layer is an SVM model trained on the whole dataset, if the result label is a 2, 3 or 4, the image is then passed to another SVM model that's only trained on the 2,3,4 dataset. This has significantly improved the accuracy of the classificatin.
Sarah Elzayat | Ahmed ata abdallah | Doaa Magdy | Rufaida Kassem |
---|