<img src="https://github.com/dc-aihub/dc-aihub.github.io/blob/master/img/ai-logo-transparent-banner.png?raw=true" 
alt="Ai/Hub Logo"/>

<h1 style="text-align:center;color:#0B8261;"><center>Artificial Intelligence</center></h1>
<h1 style="text-align:center;"><center>Hand Gesture Recognition & The Importance of Image Pre-Processing </center></h1>

<center>***Code and Original Tutorial written by Sadaival Singh:*** <br/>https://www.youtube.com/watch?v=v-XcmsYlzjA</center>

<hr/>

<center><a href="#OVERVIEW">Overview</a></center>
<center><a href="#PURPOSE">Purpose</a></center>
<center><a href="#IMAGE-PREPROCESSING">Image Pre-Processing</a></center>
<center><a href="#GETTING-STARTED">Getting Started</a></center>
<center><a href="#CONCLUSION">Conclusion</a></center>

<hr/>

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="OVERVIEW">
OVERVIEW
</div>


For this exercise will create a program that will identify hand gestures in real time by streaming video from a webcam. The program will outline a hand within a given space on screen and then further determine the number of fingers that are showing which will give us our output. The possible hand gesture classes are as follows:


- 1 finger
- 2 fingers
- 3 fingers 
- 4 fingers
- 5 fingers
- Ok 
- Good Job

![](images/goodjob.png)

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="PURPOSE">
PURPOSE
</div>


Although in this exercise we will not be using an artificial intelligence network directly, our program will be able to capture and predict hand gestures with a high degree of accuracy using a programmatic solution. The value of the exercise in the context of AI is found in the practice and understanding of several image pre-processing techniques that could be invaluable when training a specialized neural network.


As you will see below, our hard coded approach to gesture recognition is limited by methods found in the OpenCV package. And although this program preforms rather admirably in this context, it must be noted that only a neural network would allow for a greater range of classes and abstract recognition. For example; attempting to program logic that would capture and translate sign-language from raw video data would be nearly an impossible task, however, a neural network would be far better suited to rise to this challenge.



<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="IMAGE-PREPROCESSING">
IMAGE PRE-PROCESSING
</div>


So why bother pre-process visual data in the first place and not just feed a neural network an unfiltered image? The answer lies in understanding how neural networks make their decisions; using layers of nodes each with their own calculated weights and biases. For our model to work efficiently the nodes must be able to positively respond to data that is relevant to our problem and conversely ignore what is not needed. 


In this sense we can understand that training a neural network involves both training it what to recognize but also what not to recognize, and in our case; what is a hand and what is not a hand. We certainly do not want the neurons of our network activating by random background imagery that may appear in our data but is not the intended focus. This brings us to the importance of pre-processing image data as there are steps we can take to feed our model with data that is more relevant and specific to our problem, giving it only data that and has a stake in determining the final output. We can essentially help eliminate noise and unwanted information that has the potential of drastically minimizing the size and training resources needed by our model while maintaining a higher accuracy. 


Specifically in our case we take the following image pre-processing steps:
 
 
- Select a boundary of the input image within which we will scan for the presence of a human hand. 
- Create a mask by selecting only pixels that match a specified colour range.
- Blur the mask image to fill in missing data points.
- Draw contour of the hand and identify fingers showing using tools from Open CV.


<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="GETTING-STARTED">
GETTING STARTED
</div>


Start by reviewing the packages that are being imported and ensure you have the required dependencies installed. Also note that for this program to work you will need to have a functional webcam equipped. 


Before running the code understand that this program works by recognizing pixels of the image that fall within a specified range. This range we set below in the code using HSV values. If you are unfamiliar with HSV colour values please check out the site here to visualize the colour range we have chosen. For best performance please ensure the background your webcam sees is around you hand is a different colour than what falls within our range. 


In [1]:
import cv2
import numpy as np
import math
cap = cv2.VideoCapture(0)

In [None]:
while(1):
        
    try:  #an error comes if it does not find anything in window as it cannot find contour of max area
          #therefore this try error statement
          
        ret, frame = cap.read()
        frame=cv2.flip(frame,1)
        kernel = np.ones((3,3),np.uint8)
        
        #define region of interest
        roi=frame[100:300, 100:300]
        
        
        cv2.rectangle(frame,(100,100),(300,300),(0,255,0),0)    
        hsv = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
        
        
         
    # define range of skin color in HSV
        lower_skin = np.array([0,20,70], dtype=np.uint8)
        upper_skin = np.array([20,255,255], dtype=np.uint8)
        
     #extract skin colur imagw  
        mask = cv2.inRange(hsv, lower_skin, upper_skin)
        
   
        
    #extrapolate the hand to fill dark spots within
        mask = cv2.dilate(mask,kernel,iterations = 4)
        
    #blur the image
        mask = cv2.GaussianBlur(mask,(5,5),100) 
        
        
        
    #find contours
        _,contours,hierarchy= cv2.findContours(mask,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
    
   #find contour of max area(hand)
        cnt = max(contours, key = lambda x: cv2.contourArea(x))
        
    #approx the contour a little
        epsilon = 0.0005*cv2.arcLength(cnt,True)
        approx= cv2.approxPolyDP(cnt,epsilon,True)
       
        
    #make convex hull around hand
        hull = cv2.convexHull(cnt)
        
     #define area of hull and area of hand
        areahull = cv2.contourArea(hull)
        areacnt = cv2.contourArea(cnt)
      
    #find the percentage of area not covered by hand in convex hull
        arearatio=((areahull-areacnt)/areacnt)*100
    
     #find the defects in convex hull with respect to hand
        hull = cv2.convexHull(approx, returnPoints=False)
        defects = cv2.convexityDefects(approx, hull)
        
    # l = no. of defects
        l=0
        
    #code for finding no. of defects due to fingers
        for i in range(defects.shape[0]):
            s,e,f,d = defects[i,0]
            start = tuple(approx[s][0])
            end = tuple(approx[e][0])
            far = tuple(approx[f][0])
            pt= (100,180)
            
            
            # find length of all sides of triangle
            a = math.sqrt((end[0] - start[0])**2 + (end[1] - start[1])**2)
            b = math.sqrt((far[0] - start[0])**2 + (far[1] - start[1])**2) 
            c = math.sqrt((end[0] - far[0])**2 + (end[1] - far[1])**2)
            s = (a+b+c)/2
            ar = math.sqrt(s*(s-a)*(s-b)*(s-c))
            
            #distance between point and convex hull
            d=(2*ar)/a
            
            # apply cosine rule here
            angle = math.acos((b**2 + c**2 - a**2)/(2*b*c)) * 57
            
        
            # ignore angles > 90 and ignore points very close to convex hull(they generally come due to noise)
            if angle <= 90 and d>30:
                l += 1
                cv2.circle(roi, far, 3, [255,0,0], -1)
            
            #draw lines around hand
            cv2.line(roi,start, end, [0,255,0], 2)
            
            
        l+=1
        
        #print corresponding gestures which are in their ranges
        font = cv2.FONT_HERSHEY_SIMPLEX
        if l==1:
            if areacnt<2000:
                cv2.putText(frame,'Put hand in the box',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
            else:
                if arearatio<12:
                    cv2.putText(frame,'0',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
                elif arearatio<17.5:
                    cv2.putText(frame,'Good Job',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
                   
                else:
                    cv2.putText(frame,'1',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
                    
        elif l==2:
            cv2.putText(frame,'2',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
            
        elif l==3:
         
              if arearatio<27:
                    cv2.putText(frame,'3',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
              else:
                    cv2.putText(frame,'ok',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
                    
        elif l==4:
            cv2.putText(frame,'4',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
            
        elif l==5:
            cv2.putText(frame,'5',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
            
        elif l==6:
            cv2.putText(frame,'reposition',(0,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
            
        else :
            cv2.putText(frame,'reposition',(10,50), font, 2, (0,0,255), 3, cv2.LINE_AA)
            
        #show the windows
        cv2.imshow('mask',mask)
        cv2.imshow('frame',frame)
    except:
        pass
        
    
    k = cv2.waitKey(5) & 0xFF
    if k == 2:
        break 
    
cv2.destroyAllWindows()
cap.release()

<div style="background-color:#0B8261; width:100%; height:38px; color:white; font-size:18px; padding:10px;" id="CONCLUSION">
CONCLUSION
</div>

In the above code we have taken a live video stream and extracted a set of data points which serve as reliably accurate representation of a hand position within a given frame. We can now take this data and train a neural network knowing that the information is highly specific to the problem that we need solving. This will ensure our model maintains top performance and decrease training time. 

From this example we now have two primary options to further train a neural network. One option is to take the series of coordinates produced by the convex hull wrapped around the hand in our image, in addition to the in number and positions of defects which mark visible fingers, and train a standard network using a straightforward multi-dimensional array as input. This particular solution would offer the simplest and smallest dataset to train our network on, however it may not be detailed enough to represent and recognize complex or nuanced hand gestures. 

Our second approach would be train a Convolutional neural network using the black and white 'mask' image we have produced in our program which is also displayed at runtime. The benefit of training our neural network using this mask is that the filters of the convolutional network will be able to clearly tell the difference between significant and insignificant data simply by the shade of each pixel.  

 