# **<center><font style="color:rgb(100,109,254)">Module 9: Full-Body Sign Language Recognition</font> </center>**

<center>
    <img src='https://drive.google.com/uc?export=download&id=1kqMdoDJrt-YxUDPB0YLHcp9f3XAVTRsi' width=800> 
    <br/>
    <a href='https://www.signall.us'>Image Credits</a>
</center>
    

## **<font style="color:rgb(134,19,348)"> Module Outline </font>**

The module can be split into the following parts:

- *Lesson 1: Introduction to Long Short-Term Memory (LSTM) Networks Theory.*

- ***Lesson 2:* Collect Sign Language Recognition Dataset.** *(This Tutorial)*

- *Lesson 3:  Train a Sign Language Recognition LSTM Network.*


**Please Note**, these Jupyter Notebooks are not for sharing; do read the Copyright message below the Code License Agreement section which is in the last cell of this notebook.
-Taha Anwar

Alright, let's get started.

### **<font style="color:rgb(134,19,348)"> Import the Libraries</font>**

First, we will import the required libraries.

In [1]:
import os
import cv2
import mediapipe as mp
import numpy as np
from previous_lesson import detectPoseLandmarks

## **<font style="color:rgb(134,19,348)">Initialize the Pose Detection Model</font>**

After that, we will have to initialize the **`mp.solutions.pose`** class and then set up the **`mp.solutions.pose.Pose()`** function with appropriate arguments.

In [2]:
# Initialize the mediapipe pose class.
mp_pose = mp.solutions.pose

# Set up the pose landmarks function for videos.
pose_videos = mp_pose.Pose(static_image_mode=False, model_complexity=1, smooth_landmarks=True, 
                           enable_segmentation=True, smooth_segmentation=True, 
                           min_detection_confidence=0.5, min_tracking_confidence=0.8)

## **<font style="color:rgb(134,19,348)">Create a Function to Extract Pose Landmarks</font>**


Now we will create a function **`extractPoseKeypoints()`**, that will utilize the function **`detectPoseLandmarks()`** (created in a previous module) to extract the pose landmarks. Remember that, we had converted the Pose landmarks x and y coordinates into their original scale in the function **`detectPoseLandmarks()`**, so now we will have to normalize the coordinates back to the range [0-1], similar to what we had done in the previous module for our face landmarks coordinates.

In [3]:
def extractPoseKeypoints(image, pose):
    '''
    This function will extract the Pose Landmarks (after normalization) of a person in an image.
    Args:
        image: The input image of the person whose pose landmarks needs to be extracted.
        pose:  The Mediapipe's Pose landmarks detection function required to perform the landmarks detection.
    Returns:
        extracted_landmarks: A flattened array containing the extracted normalized pose landmarks (x and y coordinates).
    '''
    
    # Retrieve the height and width of the image.
    image_height, image_width, _ = image.shape
    
    # Perform the Pose landmarks detection on the image.
    image, pose_landmarks = detectPoseLandmarks(image, pose, draw=True, display=False)
    
    # Initialize a list to store the extracted landmarks.
    extracted_landmarks = []
    
    # Check if pose landmarks are found. 
    if len(pose_landmarks) > 0:
            
        # Iterate over the found pose landmarks. 
        for landmark in pose_landmarks:
            
            # Normalize the landmarks and append them into the list.
            extracted_landmarks.append((landmark[0]/image_width, landmark[1]/image_height))
        
    # Convert the list into an array and flatten the array.
    extracted_landmarks = np.array(extracted_landmarks).flatten()
    
    # Return the image and the extracted normalized pose landmarks.
    return image, extracted_landmarks

Now we will initialize the parameters like the signs which we want our model to recognize and the total number of sequences (videos), along with the length of each sequence, from which we want to extract the landmarks.

In [4]:
# Specify the classes of which you want to collect data.
# Feel free to choose any set of classes.
classes_list = ["Hello", "bye", "Thankyou"]

# Specify the number of frames of the videos.
sequence_length = 30 

# Specify the path where you want to store the dataset.
DATASET_DIR = 'dataset'

# Check if the directory doesnot already exist.
if not os.path.exists(DATASET_DIR):
    
    # Create the directory.
    os.mkdir(DATASET_DIR)

# Specify the total number of videos for which you want to extract the landmarks.
# This must be a multiple of the total number of classes.
total_videos = 90

# Raise an AssertionError exception, if the total number of videos is not a multiple of the number of classes.
# This is done to make sure that the equal number of videos landmarks for each class are extracted.
assert total_videos%len(classes_list) == 0, f'{total_videos} must be a multiple of {len(classes_list)}'

# Display the success message.
print('Initialization Completed.')

Initialization Completed.


## **<font style="color:rgb(134,19,348)">Data Collection</font>**

Now we will start collecting the dataset, we will utilize the function **`extractPoseKeypoints()`** to get the required landmarks for the specified number of sequences for each sign (which we want our sign recognizer to predict) and store the landmarks in the disk.

In [5]:
# Initialize the VideoCapture object to read from the webcam.
camera_video = cv2.VideoCapture(0)
camera_video.set(3,1280)
camera_video.set(4,960)

# Create named window for resizing purposes.
cv2.namedWindow('Data Collection', cv2.WINDOW_NORMAL)

# Iterate over the specified classes.
for sign in classes_list:
    
    # Iterate over the videos indexes for the class (sign), we are iterating upon.
    for video_index in range(total_videos//len(classes_list)):
        
        # Initialize a list to store the video landmarks.
        video_landmarks = []
        
        # Initialize a variable to store the frame counter.
        frame_counter = 0
                
        # Iterate through the video frames.
        while frame_counter < sequence_length:
            
            # Read a frame.
            ok, frame = camera_video.read()

            # Check if frame is not read properly.
            if not ok:
                
                # Continue to the next iteration to read the next frame.
                continue

            # Flip the frame horizontally for natural (selfie-view) visualization.
            frame = cv2.flip(frame, 1)

            # Get the height and width of the frame of the webcam video.
            frame_height, frame_width, _ = frame.shape

            # Extract the required pose keypoints of the person in the frame.
            frame, extracted_keypoints = extractPoseKeypoints(frame, pose_videos)
            
            # Check if the keypoints were not extracted successfully.
            if len(extracted_keypoints) == 0:
            
                # Continue to the next iteration to read the next frame.
                continue
            
            # Write info about the number of frames left of the video and sign that the user have to make.
            cv2.putText(frame, f'{sign.upper()}, Sign Video # {video_index}, Frames Left: {sequence_length-frame_counter-1}',
                        (10, frame_height-30), cv2.FONT_HERSHEY_SIMPLEX, 
                        1, (0, 255, 0), 2, cv2.LINE_AA)
            
            # Check if it is the first video and first frame of the sign, we are iterating upon.
            if video_index == 0 and frame_counter==0: 
                
                # Write the instructions to start collection data on the frame.
                cv2.putText(frame, f'Press any key to Start Collecting {sign.upper()} Sign Data.', (10, 30), 
                           cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0, 255), 4, cv2.LINE_AA)
                
                # Display the frame.
                cv2.imshow('Data Collection', frame)
                
                # Wait until a key is pressed.
                cv2.waitKey(0)
            
            # Display the frame.
            cv2.imshow('Data Collection', frame)
            
            # Wait for 1ms.
            cv2.waitKey(1) & 0xFF
            
            # Append the extracted landmarks into the list.
            video_landmarks.append(extracted_keypoints)
            
            # Increment the frame counter.
            frame_counter+=1
        
        # Get the path to store the video landmarks.
        video_landmarks_dir = os.path.join(DATASET_DIR, sign)
        
        # Check if the directory does not already exist.
        if not os.path.exists(video_landmarks_dir):
            
            # Create the directory.
            os.mkdir(video_landmarks_dir)
        
        # Save the extracted landmarks inside a .npy file.
        np.save(os.path.join(video_landmarks_dir, str(video_index)), video_landmarks)
        
        # Write the instructions to start collection data for the next video. 
        cv2.putText(frame, f'Press any key to Start Next Video.', (10, 30), 
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0, 255), 4, cv2.LINE_AA)
        
        # Display the frame.
        cv2.imshow('Data Collection', frame)
        
        # Wait until a key is pressed.
        # cv2.waitKey(0) 
                
# Release the VideoCapture Object and close the windows.
camera_video.release()
cv2.destroyAllWindows()



# Additional comments:
#           - This program is for creating a sign language data set
#           - This will use the mediapipe solutions to use the sequence of
#             landmark movements to detect a sign language.

Perfect! the dataset is collected and stored successfully in the disk.

### **<font style="color:rgb(255,140,0)"> Code License Agreement </font>**
```
Copyright (c) 2022 Bleedai.com

Feel free to use this code for your own projects commercial or noncommercial, these projects can be Research-based, just for fun, for-profit, or even Education with the exception that you’re not going to use it for developing a course, book, guide, or any other educational products.

Under *NO CONDITION OR CIRCUMSTANCE* you may use this code for your own paid educational or self-promotional ventures without written consent from Taha Anwar (BleedAI.com).

```



