# Hand recognition learning project

## Import necessary packages

In [35]:
import cv2
import numpy as np
import mediapipe as mp
import tensorflow as tf
from tensorflow.keras.models import load_model

## Initialize models with MediaPipe

### MediaPipe is used to recognize the hand and the keys points. It returns the 21 key points for each hand detected
![HANDS](media/hand-landmarks.jpg)

In [36]:
mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1, min_detection_confidence=0.7)
mpDraw = mp.solutions.drawing_utils

- **mp.solutions.hands** perform the hand recognition algorithm so I created an object and stored at **mpHands**.
- Then I use the method **mpHands.Hands** where *max_num_hands* is the number of hands that the model will detect in a single frame.
- **mp.solutions.drawing_utils** draw the detected keys

## Initialize Tensorflow model

### Load the gesture recognizer model

In [37]:
model = load_model('mp_hand_gesture')

### Load class names

In [38]:
f = open('gesture.names','r')
classNames = f.read().split(',')
f.close()
classNames

['okay',
 'peace',
 'thumbs up',
 'thumbs down',
 'call me',
 'stop',
 'rock',
 'live long',
 'fist',
 'smile']

## Read frames from webcam and process data

### Initialize the webcam

### Read each frame from the webcam and draw the lines with MediaPipe

In [39]:
cap = cv2.VideoCapture(0)

while True:
    _, frame = cap.read()
    x,y,z = frame.shape

    #flip the frame vertically
    frame = cv2.flip(frame, 1)

    framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    result = hands.process(framergb)

    className = ""

    #postprocess the result
    if(result.multi_hand_landmarks):
        landmarks = []
        for handslms in result.multi_hand_landmarks:
            for lm in handslms.landmark:
                lmx = int(lm.x * x)
                lmy = int(lm.y * y)

                landmarks.append([lmx,lmy]) 

            #Draw the landmark on the frame
            mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS)

            prediction = model.predict([landmarks])

            className = classNames[np.argmax(prediction)] #get the index of the max value

    # show the prediction on the frame
    cv2.putText(frame, className, (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 
                   1, (0,0,255), 2, cv2.LINE_AA)
    
    #show the final output
    cv2.imshow("Output",frame)
    if(cv2.waitKey(1) == ord('q')):
        break

#release the webcam
cap.release()
cv2.destroyAllWindows()



- MediaPipe only works with RGB images and OpenCV read images on BGR format. That's why I changed the format.
- The **hands.process(framergb)** process the frame and return a class. Then I check if there is a hand detected with **result.multi_hand_landmarks** and get the name for this prediction

### The result is as follows
![RESULT](media/result.png)