<a href="https://colab.research.google.com/github/bigboyfreezy/sign-language-detection/blob/main/SignLanguageDetector.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SIGN LANGUAGE DETECTION WITH KERAS

---


**Detecting Sign Language Letters in Real Time Using MediaPipe and Keras**


**DATA PREPARATION FOR TRAINING**
1. **Matplotlib** - is a comprehensive library for creating static, animated, and interactive visualizations in Python.
2. **Keras** - a high-level neural networks API running on top of TensorFlow.
3. Sequential class  - from Keras, which is a linear stack of layers for building neural networks layer by layer


In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import keras
from keras.models import Sequential
#Imports various layers from Keras that are commonly used in convolutional neural networks (CNNs) for image classification tasks.
from keras.layers import Dense, Conv2D , MaxPool2D , Flatten , Dropout , BatchNormalization
#helps generate augmented images by applying various transformations to the input data during training.
from keras.preprocessing.image import ImageDataGenerator
# function from scikit-learn, which is used to split the dataset into training and testing sets.
from sklearn.model_selection import train_test_split
# Imports functions for generating a classification report and confusion matrix, which are useful for evaluating the performance of a classification model.
from sklearn.metrics import classification_report,confusion_matrix
# callback from Keras, which dynamically adjusts the learning rate during training based on a specified condition, typically used to improve training performance.
from keras.callbacks import ReduceLROnPlateau
import pandas as pd

**Pre Processing Key Points**
1. Normalization - In digital images, pixel values are typically represented as integers in the range [0, 255], where 0 corresponds to black and 255 corresponds to white. Normalizing by dividing by 255 scales these values to be in the range [0, 1]. This is a common preprocessing step for image data when working with neural networks. Normalization helps in stabilizing and accelerating the training process, making it easier for the optimization algorithm to converge.

2. In deep learning models, especially convolutional neural networks (CNNs), the input data is often required to have a specific shape. The common shape for image data is a 4D tensor with dimensions (number of samples, height, width, channels).

- The parameter -1 in the reshape operation is used to infer the number of samples automatically. It is a placeholder for the size of the first dimension, and the actual size is calculated based on the size of the remaining dimensions
- 28, 28 represents the height and width of the images, assuming they are 28x28 pixels.
- 1 is the number of channels. In this case, the images are assumed to be grayscale, so there is only one channel. For color images with RGB channels, this value would be 3.

In [5]:
# prompt:

# Read the training and testing dataset from  CSV file
train= pd.read_csv("/content/drive/MyDrive/sign_mnist_train/sign_mnist_train.csv")
test = pd.read_csv("/content/drive/MyDrive/sign_mnist_test/sign_mnist_test.csv")

#  Extracts the labels (target variable) from the training and testing dataset and assigns them to the variable x_train and y_train.
y_train = train['label']
y_test = test['label']
# Removes the 'label' column from both the training and testing DataFrames
del train['label']
del test['label']


from sklearn.preprocessing import LabelBinarizer
label_binarizer = LabelBinarizer()
y_train = label_binarizer.fit_transform(y_train)
y_test = label_binarizer.fit_transform(y_test)

x_train = train.values
x_test = test.values

x_train = x_train / 255
x_test = x_test / 255

x_train = x_train.reshape(-1,28,28,1)
x_test = x_test.reshape(-1,28,28,1)







**BreakDown Of ImageDatagenerator**

---


ImageDataGenerator for data augmentation and defining a learning rate reduction callback. Let's break it down:

- datagen = ImageDataGenerator(...): Creates an instance of ImageDataGenerator with various configuration options for data augmentation. This generator will be used to generate augmented images during training.

- featurewise_center, samplewise_center, featurewise_std_normalization, and samplewise_std_normalization: These options control whether to center the mean and normalize the standard deviation of the input data. In this case, they are set to False.

- zca_whitening: ZCA whitening is a form of data preprocessing that can enhance the effectiveness of the model by decorrelating the features. Here, it is set to False.

- rotation_range: Randomly rotates images in the range specified (degrees, 0 to 180). It is set to 10 degrees.

- zoom_range: Randomly zooms into the image. It is set to 0.1, meaning a maximum zoom of 10%.

- width_shift_range and height_shift_range: Randomly shifts images horizontally and vertically by a fraction of the total width and height, respectively. Both are set to 0.1, allowing for a maximum shift of 10%.

- horizontal_flip and vertical_flip: Randomly flip images horizontally and vertically. Both are set to False, meaning no flipping is applied.

- datagen.fit(x_train): Fits the ImageDataGenerator on the training data (x_train). This calculates the necessary statistics (e.g., mean and standard deviation) needed for data augmentation based on the provided configuration.

**learning_rate_reduction**


---
Creates an instance of ReduceLROnPlateau callback.


- monitor='val_accuracy': Monitors the validation accuracy during training.

- patience=2: Number of epochs with no improvement after which learning rate will be reduced.

- verbose=1: Prints a message when the learning rate is reduced.

- factor=0.5: Factor by which the learning rate will be reduced. In this case, it is reduced by half.

- min_lr=0.00001: Lower bound on the learning rate.

In [6]:
datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
        zoom_range = 0.1, # Randomly zoom image
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip images

datagen.fit(x_train)

learning_rate_reduction = ReduceLROnPlateau(monitor='val_accuracy', patience = 2, verbose=1,factor=0.5, min_lr=0.00001)

**Layer Breakdown**

---



***Sequential***
- model = Sequential(): Initializes a sequential model.

***Convolutional Layer 1***:
- 75 filters with a kernel size of (3,3).
- strides=1: The step size of the convolution operation.
- padding='same': Pads the input such that the output has the same height and width.
- activation='relu': Applies the Rectified Linear Unit (ReLU) activation function.
- input_shape=(28,28,1): Input shape of each image (height, width, channels), assuming grayscale images.

***Batch Normalization:***
- Normalizes the activations of the previous layer.

***MaxPooling Layer 1:***

- Performs max pooling with a pool size of (2,2).
- strides=2: The step size of the pooling operation.
- padding='same': Pads the input such that the output has the same height and width.

***Flatten Layer:***
- Flattens the input to a one-dimensional array.

***Fully Connected (Dense) Layer:***
- Fully connected layer with 512 units and ReLU activation.

***Dropout Layer:***
- Dropout(0.3): Applies dropout with a dropout rate of 0.3 to prevent overfitting.

***Output Layer:***
- Fully connected layer with 24 units (assuming 24 classes) and softmax activation for multi-class classification.

In [7]:
model = Sequential()
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu' , input_shape = (28,28,1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(50 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Conv2D(25 , (3,3) , strides = 1 , padding = 'same' , activation = 'relu'))
model.add(BatchNormalization())
model.add(MaxPool2D((2,2) , strides = 2 , padding = 'same'))
model.add(Flatten())
model.add(Dense(units = 512 , activation = 'relu'))
model.add(Dropout(0.3))
model.add(Dense(units = 24 , activation = 'softmax'))

**Breakdown**

---

***Model.compile:***
- optimizer='adam': Adam optimization algorithm is used as the optimizer. Adam is an adaptive learning rate optimization algorithm that is widely used in deep learning.

- loss='categorical_crossentropy': Categorical crossentropy is the loss function used for multi-class classification problems. Since you used one-hot encoded labels, categorical crossentropy is an appropriate choice.

- metrics=['accuracy']: The model's performance will be evaluated based on accuracy during training.

***Training The Model***
- datagen.flow(x_train, y_train, batch_size=128): Generates augmented batches of training data using the previously defined ImageDataGenerator.

- epochs=20: Number of epochs for training.

- validation_data=(x_test, y_test): Validation data to be used during training.

- callbacks=[learning_rate_reduction]: Uses the learning rate reduction callback during training.

In [9]:

model.compile(optimizer = 'adam' , loss = 'categorical_crossentropy' , metrics = ['accuracy'])
model.summary()

history = model.fit(datagen.flow(x_train,y_train, batch_size = 128) ,epochs = 15 , validation_data = (x_test, y_test) , callbacks = [learning_rate_reduction])

model.save('signlangmod.h5')

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 28, 28, 75)        750       
                                                                 
 batch_normalization (Batch  (None, 28, 28, 75)        300       
 Normalization)                                                  
                                                                 
 max_pooling2d (MaxPooling2  (None, 14, 14, 75)        0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 50)        33800     
                                                                 
 dropout (Dropout)           (None, 14, 14, 50)        0         
                                                                 
 batch_normalization_1 (Bat  (None, 14, 14, 50)        2

  saving_api.save_model(


# Part Two: Implementing the Model

---



1. First We install the TensorFlow library, which is commonly used for machine learning and deep learning tasks.
2. OpenCV library - Used in image processing and performing computer vision tasks. It is an open-source library that can be used to perform tasks like face detection, objection tracking, landmark detection, and much more
3. Mediapipe library, a library by Google that simplifies the development of applications for detecting and tracking various keypoints on the human body.
4.  Python os module, which provides a way of using operating system-dependent functionality, like reading or writing to the file system.
5.  NumPy library, which provides support for large, multi-dimensional arrays and matrices, along with mathematical functions.
6.  Pandas library, a powerful data manipulation and analysis library.

In [18]:
!python -m pip install mediapipe



In [None]:
! pip install tensorflow
! pip install opencv-python
!pip install mediapipe

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import cv2
import mediapipe as mp

from keras.models import load_model #function from the Keras library, which is used for loading pre-trained machine learning models
import numpy as np
import time #  allows you to measure and control the time taken by various operations.
import pandas as pd

We Have:
- An instance of the Hands class (hands) for hand tracking.
- The drawing_utils module (mp_drawing) for drawing landmarks and connections on images.
- A video capture object (cap) that can be used to capture frames from the camera.

In [None]:
model = load_model('/content/drive/MyDrive/signlangmod.h5')

mphands = mp.solutions.hands
hands = mphands.Hands()
mp_drawing = mp.solutions.drawing_utils
cap = cv2.VideoCapture(0)

_, frame = cap.read() #Reads the first frame from the camera.

h, w, c = frame.shape # Retrieves the height, width, and number of channels of the captured frame.

img_counter = 0 #A counter to keep track of the captured frames.
analysisframe = '' # A variable that might be used for processing or analyzing frames.
letterpred = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y'] #A list of letters representing the classes for sign language recognition.

In [None]:
while True:
    _, frame = cap.read()

    k = cv2.waitKey(1)
    if k%256 == 27:
        # ESC pressed
        print("Escape hit, closing...")
        break
    elif k%256 == 32:
        # SPACE pressed
        analysisframe = frame
        showframe = analysisframe
        cv2.imshow("Frame", showframe)
        framergbanalysis = cv2.cvtColor(analysisframe, cv2.COLOR_BGR2RGB)
        resultanalysis = hands.process(framergbanalysis)
        hand_landmarksanalysis = resultanalysis.multi_hand_landmarks
        if hand_landmarksanalysis:
            for handLMsanalysis in hand_landmarksanalysis:
                x_max = 0
                y_max = 0
                x_min = w
                y_min = h
                for lmanalysis in handLMsanalysis.landmark:
                    x, y = int(lmanalysis.x * w), int(lmanalysis.y * h)
                    if x > x_max:
                        x_max = x
                    if x < x_min:
                        x_min = x
                    if y > y_max:
                        y_max = y
                    if y < y_min:
                        y_min = y
                y_min -= 20
                y_max += 20
                x_min -= 20
                x_max += 20

        analysisframe = cv2.cvtColor(analysisframe, cv2.COLOR_BGR2GRAY)
        analysisframe = analysisframe[y_min:y_max, x_min:x_max]
        analysisframe = cv2.resize(analysisframe,(28,28))


        nlist = []
        rows,cols = analysisframe.shape
        for i in range(rows):
            for j in range(cols):
                k = analysisframe[i,j]
                nlist.append(k)

        datan = pd.DataFrame(nlist).T
        colname = []
        for val in range(784):
            colname.append(val)
        datan.columns = colname

        pixeldata = datan.values
        pixeldata = pixeldata / 255
        pixeldata = pixeldata.reshape(-1,28,28,1)
        prediction = model.predict(pixeldata)
        predarray = np.array(prediction[0])
        letter_prediction_dict = {letterpred[i]: predarray[i] for i in range(len(letterpred))}
        predarrayordered = sorted(predarray, reverse=True)
        high1 = predarrayordered[0]
        high2 = predarrayordered[1]
        high3 = predarrayordered[2]
        for key,value in letter_prediction_dict.items():
            if value==high1:
                print("First Predicted Character 1: ", key)
                print('Confidence 1: ', 100*value)
            elif value==high2:
                print("Second Predicted Character 2: ", key)
                print('Confidence 2: ', 100*value)
            elif value==high3:
                print("Third Predicted Character 3: ", key)
                print('Confidence 3: ', 100*value)
        time.sleep(5)

    framergb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    result = hands.process(framergb)
    hand_landmarks = result.multi_hand_landmarks
    if hand_landmarks:
        for handLMs in hand_landmarks:
            x_max = 0
            y_max = 0
            x_min = w
            y_min = h
            for lm in handLMs.landmark:
                x, y = int(lm.x * w), int(lm.y * h)
                if x > x_max:
                    x_max = x
                if x < x_min:
                    x_min = x
                if y > y_max:
                    y_max = y
                if y < y_min:
                    y_min = y
            y_min -= 20
            y_max += 20
            x_min -= 20
            x_max += 20
            cv2.rectangle(frame, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
    cv2.imshow("Frame", frame)
cap.release()
cv2.destroyAllWindows()