# Thai Hand Gesture Recognition

<!-- ![numbers](./assets/numbers.png) -->
<p align="center">
  <img src="./assets/numbers.png" width="90%">
</p>

## Pipeline
To implement hand gesture recognition, there are `3` step which are shown below:

- **Hand Detection**: `MediaPipe`
  - To detect hands in frame
- **Hand Tracking**: `Algorithm`
  - To handle multiple hands in the same time
- **Gesture Recognition**: `Algorithm`
  - To overcome amount of gesture limitation 
  - To recognize gesture
  - To handle both `right-hand` and `left-hand`

## Hand Detection
Fotunately, [MediaPipe](https://mediapipe.dev/) has provided `MediaPipe Hands` which not only can detect hand in frame but also return the coordinate `x`, `y` and `z` of each `21 landmarks` on each hand in frame. Furthermore, it can also detect that whether the hand in frame is `right` or `left` hand.

In the [original project](https://github.com/NatthanonNon/HGR-TH/tree/main), They used `Python Solution API` as a hand detection, you can read more details in [their website](https://google.github.io/mediapipe/solutions/hands).

## Hand Tracking
There are only `3` events that can happen in each frame: the numbers of hand in frame is `increasing`, `decreasing` or `equaling`. So, in each events, we do the different process.

The main idea of this algorithm is comparing the center of box of each hand between the `previous frame` and the `present frame`. And using `Euclidean Distance` to compare them.

## Gesture Recognition
As `MediaPipe Hands` returns the coordinate `x`, `y` and `z` of each `21 landmarks` on each hand in frame, only coordinate `x` and `y` are used to recognize the gesture because the `z` is not robust enough.

There are `2` step in this process:
- 1. Use coordinate `x` and `y` to identify the `status` of each finger whether it is `on` or `off`
- 2. Use the status of each finger to identify the gesture by following the `Hand Gestures Definition`

To handle more than one digit, `timing` is brought into play. And using `threshold` to indicate that the recognized gesture is correct.

## Import Dependacies

In [1]:
import cv2
import numpy as np
import mediapipe as mp

from IPython.display import clear_output
from utils import calc_landmark_list, draw_landmarks, draw_info_text

In [2]:
mp_drawing = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands

In [3]:
# Colors RGB Format
BLACK  = (0, 0, 0)
RED    = (255, 0, 0)
GREEN  = (0, 255, 0)
BLUE   = (0, 0, 255)
YELLOW = (0, 255, 255)
WHITE  = (255, 255, 255)

# Constants
FONT = cv2.FONT_HERSHEY_SIMPLEX
TIMING = 10                     #@param {type: "integer"}
MAX_HANDS = 1                   #@param {type: "integer"}
min_detection_confidence = 0.6  #@param {type:"slider", min:0, max:1, step:0.01}
min_tracking_confidence  = 0.5  #@param {type:"slider", min:0, max:1, step:0.01}

current_hand = 0
gif_array = []

## Helper Functions

In [4]:
def classify_landmark(landmark):
    wrist = landmark[0]
    thump = landmark[1:5]
    index_finger = landmark[5:9]
    middle_finger = landmark[9:13]
    ring_finger = landmark[13:17]
    pinky = landmark[17:21]
    return [wrist, thump, index_finger, middle_finger, ring_finger, pinky]


def is_on(idx, finger, landmark_label):
    if idx == 0:
        if landmark_label == "Right":
            return finger[-1].x < finger[-2].x
        else:
            return finger[-1].x > finger[-2].x
    return finger[-1].y < finger[0].y


def gesture_recognition(finger_is_on):
    thump = finger_is_on[0]
    index_finger = finger_is_on[1]
    middle_finger = finger_is_on[2]
    ring_finger = finger_is_on[3]
    pinky = finger_is_on[4]
    
    # Thai Digits Hand Gesture
    if thump:
        if index_finger and middle_finger and ring_finger and pinky:
            return 5
        elif index_finger and middle_finger and ring_finger and not pinky:
            return 9
        elif index_finger and middle_finger and not ring_finger and not pinky:
            return 8
        elif index_finger and not middle_finger and not ring_finger and not pinky:
            return 7
        elif not index_finger and not middle_finger and not ring_finger and not pinky:
            return 6
    else:
        if index_finger and middle_finger and ring_finger and pinky:
            return 4
        elif index_finger and middle_finger and ring_finger and not pinky:
            return 3
        elif index_finger and middle_finger and not ring_finger and not pinky:
            return 2
        elif index_finger and not middle_finger and not ring_finger and not pinky:
            return 1
        elif not index_finger and not middle_finger and not ring_finger and not pinky:
            return 0

    # Unknown Gesture
    return "?"


def recognition(landmark, handness):
    landmark = landmark.landmark
    handness = handness.classification[0].label
    
    hand_landmark = classify_landmark(landmark)
    finger_landmark = hand_landmark[1:]
    
    finger_is_on = []
    for idx, finger in enumerate(finger_landmark):
        finger_is_on.append(is_on(idx, finger, handness))

    return gesture_recognition(finger_is_on)


def get_output(idx):
    global _output, output
    key = []
    for i in range(len(_output[idx])):
        number = _output[idx][i]
        counts = _output[idx].count(number)

        # Add number to key if exceed 'TIMING THRESHOLD'
        if number not in key:
            if counts > TIMING:
                key.append(number)

        # Handle duplicate numbers
        elif number != key[-1]:
            if counts > TIMING:
                key.append(number)

    # Add key number to output text
    text = ""
    for number in key:
        if number == "?":
            continue
        text += str(number)

    # Add word to output list
    if text != "":
        _output[idx] = []
        output.append(text)
    return None


def get_euclidean_distance(a, b):
    return np.linalg.norm(a - b)


def find_vanishing(_mean_xy):
    global mean_xy
    
    _a = get_euclidean_distance(mean_xy[0], _mean_xy[0])
    _b = get_euclidean_distance(mean_xy[1], _mean_xy[0])
    
    if _a > _b:
        mean_xy[0] = mean_xy[1]
        mean_xy[1] = []
        return 0
    else:
        mean_xy[1] = []
        return 1

## Hand Detection

In [5]:
def main(image, results):
    global mp_drawing, current_hand 
    global output, _output
    global mean_xy

    multi_hand_landmarks = results.multi_hand_landmarks
    multi_handedness = results.multi_handedness

    _mean_xy = []
    _gesture = []

    isIncreased = False
    isDecreased = False

    if current_hand != 0:
        if results.multi_hand_landmarks is None:
            isDecreased = True
        else:
            if len(multi_hand_landmarks) > current_hand:
                isIncreased = True
            elif len(multi_hand_landmarks) < current_hand:
                isDecreased = True

    if results.multi_hand_landmarks:
        h, w, _ = image.shape
        for idx in reversed(range(len(multi_hand_landmarks))):
            current_select_hand = multi_hand_landmarks[idx]

            # mp_drawing.draw_landmarks(image, current_select_hand, mp_hands.HAND_CONNECTIONS)
            landmark_list = calc_landmark_list(image, current_select_hand)
            image = draw_landmarks(image, landmark_list)

            min_x = int(min([current_select_hand.landmark[i].x for i in range(len(current_select_hand.landmark))]) * w)
            max_x = int(max([current_select_hand.landmark[i].x for i in range(len(current_select_hand.landmark))]) * w)
            min_y = int(min([current_select_hand.landmark[i].y for i in range(len(current_select_hand.landmark))]) * h)
            max_y = int(max([current_select_hand.landmark[i].y for i in range(len(current_select_hand.landmark))]) * h)

            # Drawing Bounding Box
            cv2.rectangle(image, (min_x - 10, min_y - 10), (max_x + 10, max_y + 10), BLACK, 2)
            gesture = recognition(current_select_hand, multi_handedness[idx])

            order_text = "Hand No. {}".format(idx)
            cv2.putText(image, order_text, (min_x - 10, max_y + 30), FONT, 0.5, GREEN, 2)

            # gesture_text = "Gesture: {}".format(gesture)
            cv2.rectangle(image, (min_x - 10, min_y - 10), (max_x + 10, max_y + 10), BLACK, 4)
            image = draw_info_text(image, [min_x - 10, min_y - 10, max_x + 10, max_y + 10], str(gesture))

            handness_text = "{} hand".format(multi_handedness[idx].classification[0].label)
            cv2.putText(image, handness_text, (min_x - 10, max_y + 60), FONT, 0.5, GREEN, 2)

            _mean_xy.append(np.array([(min_x + max_x) / 2, (min_y + max_y) / 2]))
            _gesture.append(gesture)

    # Number of hands is increasing
    if isIncreased == True:
        mean_xy[0] = _mean_xy[0]
        if current_hand == 1:  
            mean_xy[1] = _mean_xy[1]

    # Number of hands is decreasing
    elif isDecreased == True:
        if current_hand == 1:
            get_output(0)
        elif current_hand == 2:
            vanishing_index = find_vanishing(_mean_xy)
            get_output(vanishing_index)

    # Number of hands is the same
    else:
        if results.multi_hand_landmarks is not None:
            mean_xy[0] = _mean_xy[0]
            _output[0].append(_gesture[0])
            
            if current_hand == 2:
                mean_xy[1] = _mean_xy[1]
                _output[1].append(_gesture[1])

    # Track hand numbers
    if results.multi_hand_landmarks:
        current_hand = len(multi_hand_landmarks)
    else:
        current_hand = 0

    return image

In [7]:
output  = []
_output = [[], []]
mean_xy = [[], []]

# Webcam Input
capture = cv2.VideoCapture(0)

with mp_hands.Hands(
    min_detection_confidence=min_detection_confidence,
    min_tracking_confidence=min_tracking_confidence, 
    max_num_hands=MAX_HANDS
) as hands:
    while capture.isOpened():
        success, image = capture.read()
        if not success:
            print("Ignoring empty camera frame.")
            continue

        # Flip the image horizontally for a later selfie-view display, and convert the BGR image to RGB
        image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
        
        # To improve performance, optionally mark the image as not writeable to pass by reference
        image.flags.writeable = False
        results = hands.process(image)

        # Draw the hand annotations on the image
        image.flags.writeable = True
        image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

        try:
            image = main(image, results)
        except Exception as error:
            print(error)

        # Show output in Top-Left corner
        number_text = str(output)
        textsize = cv2.getTextSize(number_text, FONT, 0.5, 2)[0]
        cv2.rectangle(image, (5, 0), (10 + textsize[0], 10 + textsize[1]), YELLOW, -1)
        cv2.putText(image, number_text, (10, 15), FONT, 0.5, BLACK, 2)
        cv2.imshow('Number Recognition', image)

        # Save each frames to GIF
        gif_array.append(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

        # Press 'Esc' to quit
        if cv2.waitKey(5) & 0xFF == 27:
            clear_output()
            break

cv2.destroyAllWindows()
capture.release()

In [9]:
print(f"Gesture Recognition:\n{number_text}")

Gesture Recognition:
['0', '4', '2541', '2451', '1', '7', '60', '3605']


# Generate GIF Result

In [10]:
from utils import save_gif

fps = 30    #@param {type: "integer"}
save_gif(
    gif_array, fps=fps, 
    output_dir="./assets/result_number.gif"
)

Save to ./assets/result_number.gif!


# End of the Notebook