# Play Trex Game on Chrome By Gesture Using OpenCV and Mediapipe
Hello there surfer! 
Since few days, I am thinking about some cool projects that can be done within some hours using Mediapipe and OpenCV in Python.
In this blog, I am writing about how can we play the popular trex game by only moving our fingers in front of the camera. Many of us have played this game but none of us were interested to play. 🤣🤦‍♂️🤦‍♂️ Well in this blog, we are going to play with with full intention.

## Prerequisties
* Mediapipe: Install it using `pip install mediapipe`.
* OpenCV: It will be installed by default while installing `mediapipe`. 
* Keyboard: Not the physical one because we are simulating key events using gestures. `pip install keyboard`.

Once installed make sure you can use them. Just import them and see if any error pops up.

In [1]:
import mediapipe as mp
import cv2
import numpy as np
import keyboard

Using `keyboard` package is pretty easy just like `mouse` package. For test, we are going to simulate `down` and then `!echo hey`. I am using Jupyter Notebook hence I have to use `!` to use windows commands.

In [2]:
# lets simulate down key and hello world
keyboard.press_and_release("!,e,c,h,o,space,h,e,y")

In [3]:
# !ECHO HEY

We can even use keys like `control`.

In [4]:
# lets simulate down key and hello world
keyboard.press_and_release("h,e,l,l,o,ctrl+a")

HELLO

The focus is not on the `Keyboard` package but to play a dino game. We will do something like gesture recognition based on the distance between certain landmarks. So lets define a method to find Euclidean distance.


In [5]:
def euclidean(pt1, pt2):
    d = np.sqrt((pt1[0]-pt2[0])**2+(pt1[1]-pt2[1])**2)
    return d
euclidean((4, 3), (0, 0))

5.0

## Writing a Code

### Step By Step
It is necessary to view the landmark position before making a gesture assumptions. Please follow the below image.

![img](landmarks.png)
Source: [Official Hands Page](https://google.github.io/mediapipe/solutions/hands.html)

* Start by beginning a camera.
```python 
cam = cv2.VideoCapture(0)
```
* Define a frame size in our case 520 rows and 720 columns.
```python
fsize = (520, 720)
```
* Take modules `drawing_utilities` and `hands` from Mediapipe solutions's. As the name, `drawing_utils` will draw landmark here and the `hands` will let us work with detection models.
```python
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
```
* Define a variable to count the frame and make a constant to check the events on those frame count.
```python
check_every = 10
check_cnt = 0
```
* Prepare a variable to hold last event name.
```python
last_event = None
```
* Now prepare a Mediapipe Hand object by giving arguments like `max_num_hands`, `min_detection_confidence` and so on. As name suggests, `max_num_hands` is to search up to that number of hands and `min_detection_confidence` is the minimum confidence threshold value of detection and below which, detected hands are discarded.
```python
with mp_hands.Hands(
static_image_mode=True,
max_num_hands = 2,
min_detection_confidence=0.6) as hands:
```
* Read a Camera frame.
```python
    while cam.isOpened():
        ret, frame = cam.read()
        if not ret:
            continue
```
* Flip the frame to look like selfie camera.
```python
        frame = cv2.flip(frame, 1)
```
* Resize frame to our desired size.
```python
        frame = cv2.resize(frame, (fsize[1], fsize[0]))
```
* Extract width and height of frame.
```python        
        h, w,_ = frame.shape
```
* Convert frame from BGR to RGB because `Hand` object expects image as a RGB format.
```python
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
```
* Pass the RGB image ot `process` module of `Hand` object to get the result.
```python
        res = hands.process(rgb)
```
* Now for each hand, we will be extracting landmarks of fingers. Like index finger's tip, dip, middle and so on. There are overall 20 landmarks for each hand. After extracting, we need to convert it back to pixel coordinate world.
```python
        if res.multi_hand_landmarks:
            for hand_landmarks in res.multi_hand_landmarks:
                
                index_tip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y, 
                    w, h)
                
                thumb_tip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP].y, 
                    w, h)
                
                middle_tip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].y, 
                    w, h)
```
* Now if the current count of frame is equal to the value we defined earlier, then check for events. 
```python
                if index_tip is not None:
                    if check_cnt==check_every:
```
* If the distance between index finger's tip and middle finger's tip is less than 60 then consider that *space* is pressed. i.e touch index finger and middle finger for Jump. **The value 60 will be relative to the frame size.** Else, if last event is also Jump, then set last event to none.
```python
                        if index_tip is not None and middle_tip is not None:

                            if euclidean(index_tip, middle_tip)<40: 
                                last_event = "jump"
                            else:
                                if last_event=="jump":
                                    last_event=None
```
* If the distance between index pip, and thumb tip is less than 60 then consider that *down* key is pressed. i.e move thumb near to the bottom of index finger for duck. Else if last event is also duck, then set last event to none.
```python
                        if thumb_tip is not None and index_tip is not None:
                            if euclidean(thumb_tip, index_pip) < 60: # 60 should be relative to height/width of frame
                                last_event = "duck"
                            else:
                                if last_event=="duck":
                                    last_event=None
```
* After checking all events, set frame count to 0.
```python
                    check_cnt=0
```
* * Finally, if current frame count has been reseted then apply the event. And increase the frame count.
```python
            if check_cnt==0:
                if last_event=="jump":
                    keyboard.press_and_release("space")
                elif last_event=="duck":
                    keyboard.press("down")
                else:
                    keyboard.release("down")
                print(last_event)


            check_cnt+=1
```
* Draw each landmarks.
```python
            mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)```
* Show the frame.
```python
        cv2.imshow("Controller Window", frame)
```
* If Escape is pressed then close.
```python
        if cv2.waitKey(1)&0xFF == 27:
            break
```

## Final Code


In [7]:
cam = cv2.VideoCapture(0)
fsize = (520, 720)

last_event = None
check_cnt = 0
check_every = 5

mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands




with mp_hands.Hands(
static_image_mode=True,
max_num_hands = 1,
min_detection_confidence=0.6) as hands:
    while cam.isOpened():
        ret, frame = cam.read()
        if not ret:
            continue
        frame = cv2.flip(frame, 1)
        frame = cv2.resize(frame, (fsize[1], fsize[0]))
        
        h, w,_ = frame.shape
        
        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        rgb.flags.writeable = False
        
        res = hands.process(rgb)
        #cv2.imshow("roi", roi)
        rgb.flags.writeable = True
        
        
        if res.multi_hand_landmarks:
            for hand_landmarks in res.multi_hand_landmarks:
                
                index_dip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP].y, 
                    w, h)
                
                index_tip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y, 
                    w, h)
                
                index_pip = np.array(mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP].y, 
                    w, h))
                
                thumb_tip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP].y, 
                    w, h)
                
                middle_tip = mp_drawing._normalized_to_pixel_coordinates(
                    hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].x, 
                    hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP].y, 
                    w, h)
                
                if index_tip is not None:
                    if check_cnt==check_every:
                        if index_tip is not None and middle_tip is not None:

                            
                            if euclidean(index_tip, middle_tip)<40: # 60 should be relative to the height of frame
                                last_event = "jump"
                            else:
                                if last_event=="jump":
                                    last_event=None
                        
                        if thumb_tip is not None and index_tip is not None:
                            print(euclidean(index_tip, middle_tip))
                            if euclidean(thumb_tip, index_tip) < 60:
                                last_event="duck"
                            else:
                                if last_event == "duck":
                                    last_event = None
                        check_cnt=0
                
                if check_cnt==0:
                    if last_event=="jump":
                        keyboard.press_and_release("space")
                    elif last_event=="duck":
                        keyboard.press("down")
                    else:
                        keyboard.release("down")
                    print(last_event)

                check_cnt+=1

                mp_drawing.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)
        cv2.imshow("Controller Window", frame)
        
        if cv2.waitKey(1)&0xFF == 27:
            break
cam.release()
cv2.destroyAllWindows()
                        

None
53.03772242470448
None
50.21951811795888
None
45.34313619501854
None
46.22769732530488
None
41.340053217188775
duck
33.421549934136806
duck
33.52610922848042
duck
30.01666203960727
duck
14.317821063276353
duck
17.11724276862369
duck
48.08326112068523
None
69.85699678629192
duck
48.82622246293481
None
46.87216658103186
None
52.20153254455275
None
40.311288741492746
None
31.11269837220809
jump
47.4236228055175
None
60.0
None
51.40038910358559
None
46.69047011971501
None
48.08326112068523
None
48.79549159502341
None
54.56189146281496
None
58.60034129593445
None
56.0357029044876
None
27.80287754891569
jump
64.07027391856539
None
25.0
jump
29.206163733020468
jump
65.86349520030045
None
67.60177512462228
None
61.032778078668514
None
29.0
jump
57.42821606144492
None
54.817880294662984
None
61.587336360651285
None
61.91122676865643
None
61.91122676865643
None
63.812224534175265
None
58.83026432033091
None
61.032778078668514
None
61.05735008989499
None
64.66065264130884
None
41.03656905736

## Finally
This is the end of our blog and I hope you learned something valuable from here. Please let me know if you found any problems or errors. There is a video version of this blog and you can watch this on YouTube too.

* [GitHub]()
* [YouTube]()