## Step 1 – Import necessary packages:
To build this Hand Gesture Recognition project, we’ll need four packages. So first import these.
# import necessary packages for hand gesture recognition project using Python OpenCV

In [None]:
import cv2
import numpy as np
import mediapipe as mp
import tensorflow as tf
from tensorflow.keras.models import load_model

## Step 2 – Initialize models:
Initialize MediaPipe:

In [None]:
# Initialize MediaPipe
mpHands = mp.solutions.hands
hands = mpHands.Hands(max_num_hands=1, min_detection_confidence=0.7)
mpDraw = mp.solutions.drawing_utils

The mp.solutions.hands module is responsible for performing hand recognition tasks. To utilize this functionality, we create an instance of the Hands class from this module and store it in the variable mpHands.

By using the mpHands.Hands method, we configure the hand recognition model. The parameter max_num_hands specifies the maximum number of hands the model should detect in a single frame. Although MediaPipe is capable of detecting multiple hands at once, our project is set up to detect only one hand at a time.

Additionally, the mp.solutions.drawing_utils module provides tools to automatically draw the detected key points on the image, so we don’t need to manually draw them ourselves.

In [None]:
# Load the pre-trained gesture recognition model
model_path = 'mp_hand_gesture'
model = load_model(model_path)

# Load gesture class names from a file
def load_class_names(file_path):
    """Read and return a list of class names from a file."""
    with open(file_path, 'r') as file:
        return file.read().splitlines()

class_names = load_class_names('gesture.names')
print(class_names)

We use the load_model function to load a pre-trained TensorFlow model. The gesture.names file contains the names of the gesture classes used by the model. To access these class names, we first open the file using Python's built-in open function. We then read the contents of the file with the read() function.


##  Step 3 – Read frames from a webcam:

In [None]:
# Load the pre-trained gesture recognition model
model_path = 'mp_hand_gesture'
model = load_model(model_path)

# Load gesture class names from a file
def load_class_names(file_path):
    """Read and return a list of class names from a file."""
    with open(file_path, 'r') as file:
        return file.read().splitlines()

class_names = load_class_names('gesture.names')
print(class_names)

We create a VideoCapture object and provide the argument 0, which represents the camera ID of the system. In this scenario, 0 corresponds to the default webcam. If you have multiple webcams connected, you can adjust this argument to the appropriate camera ID. Otherwise, you can leave it as is.

The cap.read() function captures each frame from the webcam. To modify the frame, we use the cv2.flip() function to flip it horizontally. The cv2.imshow() function displays the frame in a new OpenCV window.

The cv2.waitKey() function keeps the window open and actively listens for user input. The window will remain open until the key 'q' is pressed, at which point it will close.

In [None]:
while True:
    # Read each frame from the webcam
    ret, frame = cap.read()
    if not ret:
        break
    
    x, y, c = frame.shape
    
    # Flip the frame vertically
    frame = cv2.flip(frame, 1)
    
    # Convert frame to RGB
    framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    
    # Get hand landmark prediction
    result = hands.process(framergb)
    
    class_name = ''
    
    # Post-process the result
    if result.multi_hand_landmarks:
        landmarks = []
        for handslms in result.multi_hand_landmarks:
            for lm in handslms.landmark:
                lmx = int(lm.x * x)
                lmy = int(lm.y * y)
                landmarks.append([lmx, lmy])
            
            # Draw landmarks on frame
            mpDraw.draw_landmarks(frame, handslms, mpHands.HAND_CONNECTIONS)
            
            # Predict gesture
            prediction = model.predict([landmarks])
            class_id = np.argmax(prediction)
            class_name = class_names[class_id]
    
    # Show the prediction on the frame
    cv2.putText(frame, class_name, (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA)
    
    # Show the frame
    cv2.imshow("Output", frame)
    
    if cv2.waitKey(1) == ord('q'):
        break

# Release the webcam and destroy all active windows
cap.release()
cv2.destroyAllWindows()

## Step 5 – Recognize hand gestures:

  # Predict gesture in Hand Gesture Recognition project
            prediction = model.predict([landmarks])
print(prediction)
            classID = np.argmax(prediction)
            className = classNames[classID]
  # show the prediction on the frame
  cv2.putText(frame, className, (10, 50), cv2.FONT_HERSHEY_SIMPLEX,
                   1, (0,0,255), 2, cv2.LINE_AA)

  The model.predict() function accepts a list of landmarks as input and produces an array of predictions. Each element in the array represents the likelihood of each of the 10 gesture classes for the given landmarks. The output array looks something like this:
  [[2.0691623e-18 1.9585415e-27 9.9990010e-01 9.7559416e-05
1.6617223e-06 1.0814080e-18 1.1070732e-27 4.4744065e-16 6.6466129e-07 4.9615162e-21]]
The np.argmax() function is then used to determine the index of the highest value in this array, which corresponds to the predicted gesture class. Using this index, we can retrieve the class name from the classNames list.

Finally, the cv2.putText() function is used to overlay the detected gesture class name onto the frame, allowing us to visualize the prediction in the displayed video feed.