# Hand Gesture Prediction Using CNN

This project aims to build a Convolutional Neural Network (CNN) to predict hand gestures. The following code snippets demonstrate the process of importing necessary libraries, data preprocessing, model creation, training, and evaluation.

## Importing Libraries

First, we need to import the necessary libraries for data manipulation, visualization, and building the CNN model.

- `numpy` and `pandas` for data manipulation
- `matplotlib` and `seaborn` for data visualization
- `keras` for building the CNN model
- `tensorflow.keras.preprocessing.image.ImageDataGenerator` for augmenting the image data


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

import keras
from keras.models import Sequential
from keras.layers import Dense,Flatten,Conv2D,MaxPool2D,Dropout
import matplotlib.pyplot as plt
import seaborn as sns
from tensorflow.keras.preprocessing.image import ImageDataGenerator

### Loading the Dataset
Load the datasets for training and testing.


In [None]:
# Train datasets

train_df = pd.read_csv("Dataset/sign_mnist_train.csv")

# Test datasets

test_df = pd.read_csv("Dataset/sign_mnist_test.csv")

#### Exploring the Training Dataset

To understand the structure of the training dataset

In [None]:
train_df.info()

#### Exploring the Test Dataset

To understand the structure of the test dataset

In [None]:
test_df.info()

#### Descriptive Statistics of the Training Dataset

In [None]:
train_df.head(6)

#### Separating Labels and Features

In this step, we separate the labels and features in the training dataset.

In [None]:
train_label=train_df['label']
train_label.head()
trainset=train_df.drop(['label'],axis=1)
trainset.head()

#### Reshaping the Training Data
The feature set `trainset` is converted to a numpy array and reshaped to fit the input shape required by the CNN model.
then print the shape


In [None]:
X_train = trainset.values
X_train = trainset.values.reshape(-1,28,28,1)
print(X_train.shape)

#### Preparing the Test Data
Similarly to the training data, we prepare the test data by separating the labels and features.


In [None]:
test_label=test_df['label']
X_test=test_df.drop(['label'],axis=1)
print(X_test.shape)
X_test.head()

#### Encoding the Labels
Before training the model, we need to encode the labels using one-hot encoding. This step is necessary because our CNN model will output probabilities for each class, and one-hot encoding transforms the labels into a format suitable for this.
- We use `LabelBinarizer` from `sklearn.preprocessing` to perform one-hot encoding.



In [None]:
from sklearn.preprocessing import LabelBinarizer
lb=LabelBinarizer()
y_train=lb.fit_transform(train_label)
y_test=lb.fit_transform(test_label)

#### Inspecting the Encoded Training Labels

In [None]:
y_train

#### Reshaping the Test Data
Similar to the training data, we reshape the test data to fit the input shape required by the CNN model.

- The `values` attribute is used to convert `X_test` into a numpy array.
- The `reshape` method reshapes the data into a 4D tensor with dimensions (-1, 28, 28, 1), where:
  - `-1` indicates that the number of samples will be inferred automatically.
  - `28, 28` are the dimensions of the images.
  - `1` indicates that the images are grayscale.


In [None]:
X_test=X_test.values.reshape(-1,28,28,1)

### Shape of Data Arrays


In [None]:
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)

### Data Augmentation for Training Data

To improve the robustness and generalization of our model, we apply data augmentation techniques to the training data using `ImageDataGenerator` from `tensorflow.keras.preprocessing.image`.

- `rescale`: Rescales the pixel values of images to the range [0, 1].
- `rotation_range`, `height_shift_range`, `width_shift_range`, `shear_range`, `zoom_range`, `horizontal_flip`, `fill_mode`: Various parameters for augmenting the images by rotating, shifting, shearing, zooming, flipping horizontally, and filling in new pixels.

These techniques help the model generalize better by exposing it to a wider variety of augmented images during training.

Additionally, we normalize the pixel values of `X_test` by dividing by 255 to ensure consistency in data preprocessing.


In [None]:
train_datagen = ImageDataGenerator(rescale = 1./255,
                                  rotation_range = 0,
                                  height_shift_range=0.2,
                                  width_shift_range=0.2,
                                  shear_range=0,
                                  zoom_range=0.2,
                                  horizontal_flip=True,
                                  fill_mode='nearest')

X_test=X_test/255

### Preview of Dataset
The following code snippet generates a visual preview of sample images from the training dataset. 


In [None]:
fig,axe=plt.subplots(2,2)
fig.suptitle('Preview of dataset')
axe[0,0].imshow(X_train[0].reshape(28,28),cmap='gray')
axe[0,0].set_title('label: 3  letter: C')
axe[0,1].imshow(X_train[1].reshape(28,28),cmap='gray')
axe[0,1].set_title('label: 6  letter: F')
axe[1,0].imshow(X_train[2].reshape(28,28),cmap='gray')
axe[1,0].set_title('label: 2  letter: B')
axe[1,1].imshow(X_train[4].reshape(28,28),cmap='gray')
axe[1,1].set_title('label: 13  letter: M')

## Building the Convolutional Neural Network (CNN) Model

We define a Sequential model for the CNN architecture to classify hand gestures.

- The model begins with a `Conv2D` layer with 128 filters, each of size 5x5, using ReLU activation, and input shape (28, 28, 1).
- A `MaxPool2D` layer follows with a pool size of 3x3 and strides of 2, using 'same' padding.
- The next `Conv2D` layer has 64 filters of size 2x2, also using ReLU activation and 'same' padding.
- Another `MaxPool2D` layer follows with a pool size of 2x2 and strides of 2.
- Finally, a third `Conv2D` layer has 32 filters of size 2x2, ReLU activation, and 'same' padding.
- Another `MaxPool2D` layer with a pool size of 2x2 and strides of 2 follows.

The `Flatten()` layer is added to convert the 2D feature maps into a 1D vector, which will be fed into the fully connected layers for classification.


In [None]:
model=Sequential()
model.add(Conv2D(128,kernel_size=(5,5),
                 strides=1,padding='same',activation='relu',input_shape=(28,28,1)))
model.add(MaxPool2D(pool_size=(3,3),strides=2,padding='same'))
model.add(Conv2D(64,kernel_size=(2,2),
                strides=1,activation='relu',padding='same'))
model.add(MaxPool2D((2,2),2,padding='same'))
model.add(Conv2D(32,kernel_size=(2,2),
                strides=1,activation='relu',padding='same'))
model.add(MaxPool2D((2,2),2,padding='same'))
          
model.add(Flatten())

### Adding Fully Connected Layers

We add fully connected (Dense) layers to the CNN model for classification.

- `Dense(units=512, activation='relu')`: Adds a dense layer with 512 units and ReLU activation function.
- `Dropout(rate=0.25)`: Applies dropout regularization with a rate of 25% to prevent overfitting.
- `Dense(units=24, activation='softmax')`: Adds the output layer with 24 units (corresponding to the number of classes) and softmax activation function for multi-class classification.

The `summary()` method prints a summary of the model architecture, displaying the number of parameters and the output shape at each layer.


In [None]:
model.add(Dense(units=512,activation='relu'))
model.add(Dropout(rate=0.25))
model.add(Dense(units=24,activation='softmax'))
model.summary()

### Compiling the Model

Before training, we compile the CNN model with the specified optimizer, loss function, and metrics.


In [None]:
model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])

### Training the Model

We train the compiled CNN model using the `fit` method.

- `train_datagen.flow(X_train, y_train, batch_size=200)`: Generates batches of augmented data from `X_train` and `y_train` using the previously defined `train_datagen`.
  - `batch_size=200`: Specifies the batch size for training.

- `epochs=35`: Number of epochs (iterations over the entire dataset) to train the model.

- `validation_data=(X_test, y_test)`: Optional validation data to evaluate the model performance after each epoch on data not used for training.

- `shuffle=1`: Shuffles the training data before each epoch.

The `fit` method trains the model on the training data and validates it on the validation data if provided, while also tracking metrics such as loss and accuracy over epochs.



In [None]:
history= model.fit(train_datagen.flow(X_train,y_train,batch_size=200),
         epochs = 35,
          validation_data=(X_test,y_test),
          shuffle=1
         )

## Evaluating the Model

We evaluate the trained CNN model using the `evaluate` method.

- `x=X_test, y=y_test`: Specifies the test data (`X_test` and `y_test`) to evaluate the model's performance.

The `evaluate` method returns a tuple containing the loss value and accuracy score achieved by the model on the test data.



In [None]:
(ls,acc)=model.evaluate(x=X_test,y=y_test)

### Displaying Model Accuracy

To present the model's accuracy in a human-readable format, we print it as a percentage.


In [None]:
print('MODEL ACCURACY = {}%'.format(acc*100))

### Plotting Training and Validation Metrics

We visualize the training and validation metrics (accuracy and loss) over epochs using matplotlib.

- The first subplot (`plt.subplot(1, 2, 1)`) plots the model accuracy (`accuracy` and `val_accuracy`) over epochs.
  - `history.history['accuracy']`: Training accuracy values stored during model training.
  - `history.history['val_accuracy']`: Validation accuracy values stored during model training.

- The second subplot (`plt.subplot(1, 2, 2)`) plots the model loss (`loss` and `val_loss`) over epochs.
  - `history.history['loss']`: Training loss values stored during model training.
  - `history.history['val_loss']`: Validation loss values stored during model training.

Each plot includes labels, titles, and legends to enhance readability. The `plt.tight_layout()` ensures that subplots are neatly arranged, and `plt.show()` displays the plots.



In [None]:
import matplotlib.pyplot as plt

# Plot training & validation accuracy values
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

# Plot training & validation loss values
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

plt.tight_layout()
plt.show()


### Visualizing the Confusion Matrix

We generate and display the confusion matrix to evaluate the model's performance in predicting hand gestures.

In [None]:
# Confusion Matrix
from sklearn.metrics import confusion_matrix, classification_report
y_pred = model.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
y_true = np.argmax(y_test, axis=1)

def plot_confusion_matrix(y_true, y_pred_classes):
    cm = confusion_matrix(y_true, y_pred_classes)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=lb.classes_, yticklabels=lb.classes_)
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.title('Confusion Matrix')
    plt.show()

plot_confusion_matrix(y_true, y_pred_classes)

### Mapping Labels to Letters

The `getLetter` function maps numeric labels to corresponding letters based on a predefined dictionary (`classLabels`).

In [None]:
# Create function to match label to letter

def getLetter(result):
  classLabels ={
      0:'A',
      1:'B',
      2:'C',
      3:'D',
      4:'E',
      5:'F',
      6:'G',
      7:'H',
      8:'I',
      9:'K',
      10:'L',
      11:'M',
      12:'N',
      13:'O',
      14:'P',
      15:'Q',
      16:'R',
      17:'S',
      18:'T',
      19:'U',
      20:'V',
      21:'W',
      22:'X',
      23:'Y',
      24:'Z'}

  try:
    res = int(result)
    return classLabels[res]
  except:
    return "Error"

## Classification Report

We generate and print the classification report to evaluate the precision, recall, F1-score, and support for each class.

- `classification_report(y_true, y_pred_classes, target_names=[getLetter(i) for i in range(len(lb.classes_))])`: Computes and prints a detailed classification report using `classification_report` from `sklearn.metrics`.
  - `y_true`: True class labels for each sample in `y_test`.
  - `y_pred_classes`: Predicted class labels (indices of maximum probabilities) for each sample in `X_test`.
  - `target_names`: Optional parameter specifying the display names for each class, obtained using `getLetter(i)` function for each class index.

## Distribution of Labels in Training Set

We visualize the distribution of labels in the training set using a bar plot.

- `sns.countplot(x=train_label)`: Generates a count plot of `train_label` using `countplot` from `seaborn`.
- `ax.set_xticklabels(class_labels)`: Sets the x-axis tick labels to actual letter representations obtained from `getLetter(i)` function for each class index.
- Annotations (`ax.annotate`) are added on top of each bar to display the count of samples for each class.

The plot provides insights into the distribution of different hand gesture labels in the training dataset.



In [None]:
# Classification Report
print(classification_report(y_true, y_pred_classes, target_names=[getLetter(i) for i in range(len(lb.classes_))]))

# Distribution of Labels in Training Set
import seaborn as sns

# Class labels mapping
class_labels = [getLetter(i) for i in range(len(lb.classes_))]

plt.figure(figsize=(12, 6))
ax = sns.countplot(x=train_label)

# Set labels for x-axis with actual letter representation
ax.set_xticklabels(class_labels)

# Add count labels on top of each bar
for p in ax.patches:
    ax.annotate(f'{p.get_height()}', (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='center', fontsize=9, color='black', xytext=(0, 5),
                textcoords='offset points')

plt.title('Distribution of Labels in Training Set')
plt.xlabel('Label')
plt.ylabel('Count')
plt.show()

# https://www.kaggle.com/code/sayakdasgupta/sign-language-classification-cnn-99-40-accuracy/notebook

### Real-time Hand Gesture Recognition Using OpenCV and Trained Model

This code snippet demonstrates real-time hand gesture recognition using a webcam (assuming `model` and `getLetter` functions are defined beforehand).

- `cap = cv2.VideoCapture(0)`: Initializes the webcam capture using OpenCV.

In [None]:
import numpy as np
import cv2

# Assuming model and getLetter are defined before this code

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    # Flip the frame horizontally
    frame = cv2.flip(frame, 1)

    # Define region of interest (ROI)
    roi = frame[100:400, 320:620]
    cv2.imshow('roi', roi)
    roi = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
    roi = cv2.resize(roi, (28, 28), interpolation=cv2.INTER_AREA)

    cv2.imshow('roi scaled and gray', roi)
    copy = frame.copy()
    cv2.rectangle(copy, (320, 100), (620, 400), (0, 255, 0), 5)

    roi = roi.reshape(-1, 28, 28, 1)

    # Use predict method instead of predict_classes
    predictions = model.predict(roi)
    predicted_class = np.argmax(predictions[0])

    result = str(predicted_class)
    cv2.putText(copy, getLetter(result), (320, 90), cv2.FONT_HERSHEY_SIMPLEX, 3, (0, 255, 0), 2, cv2.LINE_AA)
    cv2.imshow('frame', copy)

    if cv2.waitKey(1) == 13:  # Press 'Enter' key to break the loop
        break

cap.release()
cv2.destroyAllWindows()
