# **Gesture-Recognition System For Drones**

## **Goal:**
Efficiently using a machine learning model and Mediapipe for hand-landmarking to create a system that can reliably predict hand gestures that pass on movement data to the drone.

## **The gestures:**
We will be using a total of 8 gestures to make the tello run
Gestures used are:
<div style="display: inline-block; margin-right: 10%; max-width: 200px; float: right;">
<img
  src="https://s-cdn.ryzerobotics.com/stormsend/uploads/13433930-d1e1-0135-d3c1-12530322f90d/guava-%E7%99%BD-pc-160_154_2x.png"
  alt="Drone Image"
  title="Fig.1 Drone"
  style="border-radius:5%;"><p align="center">Fig.1 Drone</p>
</div>

1. ```Up     - Point Upwards                         (2)```
2. ```Down   - Point Downwards                       (2)```
3. ```Left   - Point to Left                         (2)```
4. ```Right  - Point to Right                        (2)```
5. ```Front  - Flatten hand and Point Forward        (2)```
6. ```Back   - Thumb and Pinky Finger out            (2)```
7. ```Land   - Okay sign                             (2)```
8. ```Flip   - Yo! sign                              (4)```


#### This is  a **four-part project** and the dataset of this is available on kaggle.
The github repository of the project is:<br>
[https://github.com/RumbleJack56/HandGestureRecognition-P](https://github.com/RumbleJack56/HandGestureRecognition-P)


## **Part 1: Data-Collection**

We collect data using ```opencv``` library and use ```cv2.VideoCapture()``` for accessing camera to take images of different gestures as training examples.

* We check for webcams
* Then select the webcam
* and then we click picture every button press (s) and save it in dataset folder

In [1]:
#importing dependencies
import cv2
import os
import time
import pandas as pd
import numpy as np
print(os.getcwd())

e:\College\S2-even\ML\Project\HandGestureRecognition-P


In [19]:
available_cameras = list(filter(lambda x:cv2.VideoCapture(x) and cv2.VideoCapture(x).isOpened(),range(6)))
gesture_list = os.listdir(".dataset/")
maxEntries = 100
print("Available Cameras at ports : ",*available_cameras)
print("Gestures to record are :", *gesture_list)

Available Cameras at ports :  0 2
Gestures to record are : back1 back2 down1 down2 flip1 flip2 flip3 flip4 front1 front2 land1 land2 left1 left2 right1 right2 up1 up2


In [17]:
cap = cv2.VideoCapture(available_cameras[0])
mainFrame = np.zeros(500*480*3,dtype=np.uint8).reshape(500,480,3)
entryNum = 1

for gesture in gesture_list:
    mainFrame[0:20,:200,:] = np.zeros(20*200*3).reshape(20,200,3)
    cv2.putText(mainFrame,f"Save with S | Quit with Q",[200,15],0,0.5,[255,255,255])
    cv2.putText(mainFrame,f"{gesture}  Img:{entryNum}",[20,15],0,0.5,[255,255,255])
    while entryNum<=maxEntries:
        ret , frame = cap.read()
        mainFrame[20:,:,:] = frame[:,79:559,:]
        cv2.imshow("frame",mainFrame)

        inp = cv2.waitKey(5) & 0xFF

        if inp == ord("s"):
            mainFrame[0:20,:200,:] = np.zeros(20*200*3).reshape(20,200,3)
            cv2.putText(mainFrame,f"{gesture}  Img:{entryNum}",[20,15],0,0.5,[255,255,255])
            cv2.imwrite(f".dataset/{gesture}/{entryNum}.jpg", frame[:,79:559,:])
            entryNum+=1
        if inp == ord("q"):
            break
    entryNum=1
    mainFrame[0:20,:200,:] = np.zeros(20*200*3).reshape(20,200,3)
    cv2.putText(mainFrame,f"waiting 3 sec",[20,15],0,0.5,[255,255,255])
    cv2.imshow("frame",mainFrame)
    time.sleep(3)

cap.release()
cv2.destroyAllWindows()

## **Part 2: Data Preprocessing**

* Now we have the dataset containing 200imgs/gesture for a total for 1800 images
* Using MediaPipe, we can implement a program to convert these images into points on the hand
* We take the point, and the detail whether the hand is left or right hand as columns of a dataframe
* We save the Dataframe as a csv file

##### **First we import the necessary libraries :**

In [2]:
import pandas as pd
import numpy as np
import os
import cv2
from mediapipe import tasks,Image,solutions
from mediapipe.framework.formats import landmark_pb2
print(os.getcwd())


e:\College\S2-even\ML\Project\HandGestureRecognition-P


In [7]:
BaseOptions = tasks.BaseOptions
HandLandmarker = tasks.vision.HandLandmarker
HandLandmarkerOptions = tasks.vision.HandLandmarkerOptions
VisionMode_IMAGE = tasks.vision.RunningMode.IMAGE

# solution_landmark_style = solutions.drawing_styles.get_default_hand_landmarks_style
# solution_connection_style = solutions.drawing_styles.get_default_hand_connections_style

#define conversion Function
def convertToCords(img):
    landmarker_options = HandLandmarkerOptions(base_options=BaseOptions(model_asset_path="handlandmarker/hand_landmarker.task"),
                                           num_hands=1,
                                           running_mode=VisionMode_IMAGE)
    detector = HandLandmarker.create_from_options(landmarker_options)
    image = Image.create_from_file(img)
    rawOutput = detector.detect(image)
    
    if len(rawOutput.hand_landmarks)==0:
        return [0]*43 , rawOutput
    cords = [[pt.x,pt.y] for h in rawOutput.hand_landmarks for pt in h]
    hands = [x.category_name for y in rawOutput.handedness for x in y]
    hands = [0 if a.lower()=="left" else 1 for a in hands]
    cords = np.array(cords).reshape(-1)
    return np.concatenate([hands,cords]) , rawOutput


#create dataframe
df = pd.DataFrame(columns = ["Gesture","Specific","Hand"]+[a+str(b) for b in range(1,22)for a in "xy" ])

In [11]:
gesture_list = os.listdir(".dataset/")
errors = []
for gesture in gesture_list:
    for img in os.listdir(".dataset/"+gesture+"/"):
        coords , raw = convertToCords(f".dataset/{gesture}/{img}")
        if list(coords).count(0) > 10:
            errors.append([gesture,img,coords])
            continue
        print(coords, gesture, img)
        pd.concat([df,pd.DataFrame(coords)])
print(errors)
df

[0.         0.41414559 0.61219144 0.32820159 0.50949925 0.27439365
 0.37865618 0.23536134 0.280007   0.17261365 0.23247555 0.41758937
 0.30532795 0.47827506 0.24093845 0.44756183 0.32751256 0.41307917
 0.38455352 0.49244174 0.33957362 0.54515946 0.2862159  0.49117041
 0.38350204 0.4505741  0.43626019 0.55789894 0.37646991 0.61538696
 0.32783559 0.54938018 0.41298971 0.49626786 0.46253875 0.61134785
 0.41459703 0.67752737 0.33006907 0.72091234 0.28676805 0.76538628
 0.23765478] back1 1.jpg
[0.         0.37064987 0.61004949 0.29860824 0.50436479 0.24313119
 0.3833071  0.19821328 0.27697939 0.13023338 0.22163123 0.393534
 0.30755335 0.45379877 0.26899666 0.41793236 0.35267043 0.37766123
 0.39706632 0.46274257 0.34474587 0.51473421 0.31211543 0.45212993
 0.40336218 0.41719514 0.43495235 0.5226745  0.38791996 0.57972127
 0.3565661  0.50848621 0.43501094 0.45379198 0.4720746  0.5668878
 0.43004519 0.64214939 0.36560017 0.69294143 0.33370227 0.74519277
 0.29363739] back1 10.jpg
[0.         0.

Unnamed: 0,Gesture,Specific,Hand,x1,y1,x2,y2,x3,y3,x4,y4,x5,y5,x6,y6,x7,y7,x8,y8,x9,y9,x10,y10,x11,y11,x12,y12,x13,y13,x14,y14,x15,y15,x16,y16,x17,y17,x18,y18,x19,y19,x20,y20,x21,y21


In [12]:
df

Unnamed: 0,Gesture,Specific,Hand,x1,y1,x2,y2,x3,y3,x4,y4,x5,y5,x6,y6,x7,y7,x8,y8,x9,y9,x10,y10,x11,y11,x12,y12,x13,y13,x14,y14,x15,y15,x16,y16,x17,y17,x18,y18,x19,y19,x20,y20,x21,y21
