# **Notebook overview**


This notebook is structured to implement a real-time sign language detection systemo on ArSL dataset. It includes data preprocessing, model training, and evaluation steps designed to efficiently process and recognize sign language from input data.


The following are the preprocessing techniques applied to the chosen subset of the dataset.

1. **Image resizing**: The images are resized to (3, 224, 224) where 3 represents the number of channels (RGB) in the image and (224, 224) represents the 2D dimensions of each image.

2. **Grayscale conversion**: Converting images to grayscale can reduce the computational complexity as it reduces the number of channels in each image from three (RGB) to one.


3. **Background Subtraction** : To focus the model on the hand gestures, it's helpful to remove or standardize the background. Techniques like thresholding or using a consistent backdrop during image capture can be effective.

4. **Normalization**: Scaling pixel values to a range, typically between 0 and 1, helps in speeding up convergence during training.

5. **Data Augmentation**: To make the model robust to various orientations and scales, augmenting the dataset with transformed images (e.g., rotations, scaling, translations, flipping) is beneficial.

## Needed libraries

In [None]:
import numpy as np
import cv2
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import os
import torch
import pandas as pd
import torch.nn as nn
import torchvision.transforms.v2 as transforms
import torchvision.models
from  torch.utils.data import DataLoader, Dataset
from albumentations.pytorch import ToTensorV2

# **Data Preprocessing**

## Creating The DataFrams For All Characters

The code begins by defining the destination_folder, which is the directory containing the dataset. It then proceeds to gather and organize all image files and their associated labels into a structured format, specifically a Pandas DataFrame. This DataFrame will serve as the foundation for further data handling tasks such as preprocessing, model training, and evaluation.This code block efficiently organizes a potentially large and unstructured dataset into a manageable and easy-to-access format, setting the stage for more advanced data processing and machine learning tasks.

In [None]:
destination_folder = "/kaggle/input/rgb-arabic-alphabets-sign-language-dataset/RGB ArSL dataset"


# Get list of .npy files in the directory
char_folders = [file for file in os.listdir(destination_folder)]

# Initialize lists to store image paths and labels
images_paths = []
labels = []

# Iterate through each character folder
for char_folder in char_folders:
    # Extract label from the folder name
    label = char_folder

    # Get the full path to the character folder
    full_path = os.path.join(destination_folder, char_folder)

    # Get all file paths within the character folder
    files_in_folder = [os.path.join(full_path, file) for file in os.listdir(full_path)]

    # Append the list of image paths to images_paths
    images_paths.extend(files_in_folder)

    # Append the label to the labels list
    labels.extend([label] * len(files_in_folder))


data=pd.DataFrame({"image_path":images_paths,"label":labels})
data.head()



Unnamed: 0,image_path,label
0,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
1,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
2,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
3,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
4,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain


## Data Preparation and Encoding

This section of the notebook focuses on preparing the dataset for training. It includes splitting the data into training and testing sets, encoding the labels, and resetting the indices of the DataFrames to ensure clean and organized data. Each step is crucial for setting up a structured and efficient data pipeline, which is essential for the successful training and evaluation of machine learning models.


In [None]:
train_data,test_data,train_labels,test_labels= train_test_split(data['image_path'],data['label'], test_size=0.2, stratify=labels)

In [None]:
print("train data shape ",train_data.shape,"test data shape" ,test_data.shape)

train data shape  (6284,) test data shape (1572,)


In [None]:
# One hot encoding

label_encoder = LabelEncoder()
train_labels = label_encoder.fit_transform(train_labels)
test_labels = label_encoder.fit_transform(test_labels)


In [None]:
# Resetting indices
train_data = train_data.reset_index(drop=True)
test_data = test_data.reset_index(drop=True)

## Data Preparation and Encoding

### Custom Dataset Handling and Image Preprocessing

This section focuses on creating a custom dataset class and applying a series of image transformations for preprocessing. The primary goal here is to ensure that the images are properly formatted and augmented to enhance the model's ability to generalize from the training data.

In [None]:
class MyDataset(Dataset):
    def __init__(self, image,labels, transforms=None):
        self.image = image
        self.labels=labels
        self.transforms = transforms
    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        label = self.labels[idx]
        # Read the image using OpenCV
        image = cv2.imread(self.image[idx])
        # Convert the image from BGR to RGB (OpenCV uses BGR by default)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        # Convert the image to a PIL Image
        image = Image.fromarray(image)

        if self.transforms:
            image = self.transforms(image)

        return image, label

In [None]:
# Can add multiple transformations to enhance the training of the model

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

test_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])



In [None]:
trainset = MyDataset(train_data,train_labels,train_transforms)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)

testset = MyDataset(test_data,test_labels,test_transforms)
testloader = DataLoader(testset, batch_size=64, shuffle=False)

## **Comprehensive Model Setup, Training, and Evaluation**

This section outlines the setup, training, and evaluation of The EfficientNet model, designed for efficiency and scalability in image classification tasks, demonstrated strong learning capabilities in training results. Initially, the model is loaded with pre-trained weights and adjusted to match the specific requirements of the dataset by modifying its final layers. The training process is facilitated by defining an appropriate loss function (cross-entropy) and an optimizer (Adam), which are crucial for optimizing the model's performance. The section includes detailed training and validation functions that operate over multiple epochs to incrementally improve accuracy and reduce loss.

In [None]:
import torchvision.models as models
device = 'cuda' if torch.cuda.is_available() else 'cpu'


# Load the pre-trained EfficientNet-B0 model
model_efficientnet = models.efficientnet_b0(weights=models.EfficientNet_B0_Weights.DEFAULT)
for param in model_efficientnet.parameters():
    param.requires_grad = False

# Replace the last fully connected layer for our specific case
num_features = model_efficientnet.classifier[1].in_features
model_efficientnet.classifier[1] = nn.Linear(num_features, 31)
model_efficientnet = model_efficientnet.to(device)

# Loss function and optimizer
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_efficientnet.parameters(), lr=0.001)

# Print model architecture (optional)
print(model_efficientnet)

Downloading: "https://download.pytorch.org/models/efficientnet_b0_rwightman-7f5810bc.pth" to /root/.cache/torch/hub/checkpoints/efficientnet_b0_rwightman-7f5810bc.pth
100%|██████████| 20.5M/20.5M [00:00<00:00, 112MB/s] 


EfficientNet(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): SiLU(inplace=True)
    )
    (1): Sequential(
      (0): MBConv(
        (block): Sequential(
          (0): Conv2dNormActivation(
            (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
            (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (2): SiLU(inplace=True)
          )
          (1): SqueezeExcitation(
            (avgpool): AdaptiveAvgPool2d(output_size=1)
            (fc1): Conv2d(32, 8, kernel_size=(1, 1), stride=(1, 1))
            (fc2): Conv2d(8, 32, kernel_size=(1, 1), stride=(1, 1))
            (activation): SiLU(inplace=True)
            (scale_activation): Sigmoid()
          )
          (2): Conv2dNormActivat

In [None]:
def train(model, dataloader, loss, optimizer,device='cuda'):
    model.train()
    acc = []
    lss_history = []
    for data,labels in dataloader:
        data=data.to(device)
        labels=labels.to(device)
        optimizer.zero_grad()

        pred = model(data)
        lss = loss(pred, labels)
        lss.backward()
        optimizer.step()
        # acc calculations
        lss_history.append(lss.item())
        acc.append(((pred.argmax(axis = 1) == labels).type(torch.float)).mean().item())
    return np.mean(lss_history) ,np.mean(acc)



In [None]:
# function to validate the model
def validate(model, dataloader, loss_func,device='cuda'):
    model.eval()
    loss_values = []
    acc_values = []
    with torch.no_grad():
        for data,labels in dataloader:
            data=data.to(device)
            labels=labels.to(device)
            pred = model(data)
            loss = loss_func(pred, labels)
            loss_values.append(loss.item())
            acc_value = (pred.argmax(axis = 1) == labels).type(torch.float32)
            acc_values.append(acc_value.mean().item())
    return np.mean(loss_values), np.mean(acc_values)

# **Results Overview from Initial Training Cycles**

The results from the initial training cycles using the EfficientNet model demonstrate promising progress in both training and testing phases. Over the first three epochs, the training loss decreased significantly from 2.722 to 1.530, while training accuracy improved from 33.6% to 63.6%. This rapid improvement indicates that EfficientNet is effectively learning and adapting to the dataset. Additionally, the testing accuracy also showed consistent growth, starting at 54.2% and reaching 65.1% by the third epoch. These outcomes highlight EfficientNet's strengths in handling complex image classification tasks efficiently, achieving substantial gains in performance over just a few epochs.

In [None]:
def tune_model(epochs, model, train_dataloader, test_dataloader, loss_func, optimizer):
    for epoch in range(epochs):
        train_loss, train_acc = train(model, train_dataloader, loss_func, optimizer,device=device)
        test_loss, test_acc= validate(model, test_dataloader, loss,device=device)
        print(f"Epoch : {epoch + 1} || Train loss : {train_loss:5.3f} || Train accuracy : {train_acc:5.3f}", end="")
        print(f" Test loss : {test_loss:5.3f} || Test accuracy : {test_acc:5.3f}")


In [None]:
tune_model(10, model_efficientnet, trainloader, testloader, loss, optimizer)


Premature end of JPEG file


Epoch : 1 || Train loss : 2.722 || Train accuracy : 0.336 Test loss : 2.047 || Test accuracy : 0.542


Premature end of JPEG file


Epoch : 2 || Train loss : 1.875 || Train accuracy : 0.574 Test loss : 1.602 || Test accuracy : 0.619


Premature end of JPEG file


Epoch : 3 || Train loss : 1.530 || Train accuracy : 0.636 Test loss : 1.397 || Test accuracy : 0.651


Premature end of JPEG file


In [None]:
# Load the saved model weights
model_mobilenet.cpu()
torch.save(model_efficientnet, f"/kaggle/working/model_efficientnet_cpu.pth")
