# Gesture Recognition using Google Mediapipe

## API Call to server

Everytime our model recognizes a gesture, we need it to make an api call to our custom backend so that we can recieve the data which will then trigger a python script to access AI assistant functionality

In [1]:
import requests

# The URL to which the request will be sent
URL = "http://localhost:8000/api/gesture"

def trigger_ai_assistant(gesture_name):
    # The data payload of the request, with the key 'gesture' and its value
    data = { "gesture": gesture_name }
    
    # Making the POST request
    response = requests.post(URL, json=data)
    
    # Checking the response
    if response.status_code == 200:
        print("Success:", response.json())
    else:
        print("Error:", response.status_code, response.text)

## Importing modules

In [2]:
import cv2
import time
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from mediapipe.tasks.python import text
from mediapipe.tasks.python import audio

## Initializing model path

We are using Google's gesture recognition model: `gesture.recognizer.task`

In [3]:
model_path = './gesture_recognizer.task'

## Create the task

The MediaPipe Gesture Recognizer task uses the `create_from_options` function to set up the task. The `create_from_options` function accepts values for configuration options to handle.

In [4]:
BaseOptions = mp.tasks.BaseOptions
GestureRecognizer = mp.tasks.vision.GestureRecognizer
GestureRecognizerOptions = mp.tasks.vision.GestureRecognizerOptions
GestureRecognizerResult = mp.tasks.vision.GestureRecognizerResult
VisionRunningMode = mp.tasks.vision.RunningMode

Creating a gesture recognizer instance with the live stream mode. This function runs every time our model processes a frame from the live video.

In [5]:
def handle_process(result: GestureRecognizerResult, output_image: mp.Image, timestamp_ms: int):
    # Only run if the model picks up any gestures at all
    if len(result.gestures) > 0 and result.gestures[0][0].category_name != "None":
        print("Received input")
        # We use the first gesture that the model picks because it might pick multiple gestures from multiple people.
        model_gesture_prediction = result.gestures[0][0].category_name
        # Trigger the AI assistant using the gesture
        trigger_ai_assistant(model_gesture_prediction)

Configuring options

In [6]:
options = GestureRecognizerOptions(
    base_options=BaseOptions(model_asset_path=model_path),
    running_mode=VisionRunningMode.LIVE_STREAM, # Using the live stream running mode so that we can use the model with openCV
    result_callback=handle_process # Call the callback function
)

## OpenCV live video stream

This is the most important part of the code where we setup a live stream using OpenCV and interpret a frame every `frame_interval` seconds and recognize the gesture using mediapipe ML model.

In [9]:
with GestureRecognizer.create_from_options(options) as recognizer:
    # Use OpenCV’s VideoCapture to start capturing from the webcam.
    cap = cv2.VideoCapture(0)

    start_time = time.time() # start time is when we start the live stream
    frame_interval = 5  # Time interval before we process the next frame for gesture recognition
    last_frame_time = 0 # Variable used to capture a frame every 5 seconds

    # Create a loop to read the latest frame from the camera using VideoCapture#read()
    while cap.isOpened():
        # Read a frame
        success, frame = cap.read()

        # Ignore the empty camera frame
        if not success:
            print("Ignoring empty camera frame.")
            continue

        # Calculate the elapsed time since the start.
        elapsed_time = time.time() - start_time  
        
        current_time = time.time()
        # Display the frame.
        cv2.imshow('MediaPipe Hands', frame)

        # Only process the frame if frame_interval has passed
        if (current_time - last_frame_time) > frame_interval:   
            # Convert elapsed time to milliseconds.
            frame_timestamp_ms = int(elapsed_time * 1000)  
    
            # Convert the frame received from OpenCV to a MediaPipe’s Image object.
            mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=frame)

            # Use the model to recognize the gesture in the frame and call the callback function
            recognizer.recognize_async(mp_image, frame_timestamp_ms)

            # Update the last frame time so that this snippet runs again after 'frame_interval' seconds
            last_frame_time = current_time
        
        # Break the loop when 'q' is pressed.
        if cv2.waitKey(5) & 0xFF == ord('q'):
            break
    
    # Release the webcam and close the window.
    cap.release()
    cv2.destroyAllWindows()

I0000 00:00:1709739208.951621       1 gl_context.cc:344] GL version: 2.1 (2.1 Metal - 88), renderer: Apple M1
W0000 00:00:1709739208.966032       1 gesture_recognizer_graph.cc:129] Hand Gesture Recognizer contains CPU only ops. Sets HandGestureRecognizerGraph acceleration to Xnnpack.
I0000 00:00:1709739208.972668       1 hand_gesture_recognizer_graph.cc:250] Custom gesture classifier is not defined.


Received input
Success: {'message': 'Gesture processed successfully', 'received_gesture': 'Thumb_Up'}
