# **Notebook overview**


This notebook is structured to implement a real-time sign language detection systemo on ArSL dataset. It includes data preprocessing, model training, and evaluation steps designed to efficiently process and recognize sign language from input data.


The following are the preprocessing techniques applied to the chosen subset of the dataset.

1. **Image resizing**: The images are resized to (3, 224, 224) where 3 represents the number of channels (RGB) in the image and (224, 224) represents the 2D dimensions of each image.

2. **Grayscale conversion**: Converting images to grayscale can reduce the computational complexity as it reduces the number of channels in each image from three (RGB) to one.


3. **Background Subtraction** : To focus the model on the hand gestures, it's helpful to remove or standardize the background. Techniques like thresholding or using a consistent backdrop during image capture can be effective.

4. **Normalization**: Scaling pixel values to a range, typically between 0 and 1, helps in speeding up convergence during training.

5. **Data Augmentation**: To make the model robust to various orientations and scales, augmenting the dataset with transformed images (e.g., rotations, scaling, translations, flipping) is beneficial.

## Needed libraries

In [1]:
import numpy as np
import cv2
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import os
import torch
import pandas as pd
import torch.nn as nn
import torchvision.transforms.v2 as transforms
import torchvision.models
from  torch.utils.data import DataLoader, Dataset
# from albumentations.pytorch import ToTensorV2

# **Data Preprocessing**

## Creating The DataFrams For All Characters

The code begins by defining the destination_folder, which is the directory containing the dataset. It then proceeds to gather and organize all image files and their associated labels into a structured format, specifically a Pandas DataFrame. This DataFrame will serve as the foundation for further data handling tasks such as preprocessing, model training, and evaluation.This code block efficiently organizes a potentially large and unstructured dataset into a manageable and easy-to-access format, setting the stage for more advanced data processing and machine learning tasks.

In [2]:
destination_folder = "/kaggle/input/rgb-arabic-alphabets-sign-language-dataset/RGB ArSL dataset"


# Get list of .npy files in the directory
char_folders = [file for file in os.listdir(destination_folder)]

# Initialize lists to store image paths and labels
images_paths = []
labels = []

# Iterate through each character folder
for char_folder in char_folders:
    # Extract label from the folder name
    label = char_folder
    
    # Get the full path to the character folder
    full_path = os.path.join(destination_folder, char_folder)
    
    # Get all file paths within the character folder
    files_in_folder = [os.path.join(full_path, file) for file in os.listdir(full_path)]
    
    # Append the list of image paths to images_paths
    images_paths.extend(files_in_folder)
    
    # Append the label to the labels list
    labels.extend([label] * len(files_in_folder))

    
data=pd.DataFrame({"image_path":images_paths,"chars":labels})
data.head()



Unnamed: 0,image_path,chars
0,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
1,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
2,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
3,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain
4,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain


## Mapping Char to Numbers

In [3]:
data['label'] = data['chars'].apply(lambda x: char_folders.index(x))

data.head()

Unnamed: 0,image_path,chars,label
0,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain,0
1,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain,0
2,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain,0
3,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain,0
4,/kaggle/input/rgb-arabic-alphabets-sign-langua...,Zain,0


In [22]:
mapping={}
for i in range(len(char_folders)):
    mapping[i]=char_folders[i]
mapping

{0: 'Zain',
 1: 'Zah',
 2: 'Meem',
 3: 'Seen',
 4: 'Teh',
 5: 'Lam',
 6: 'Dad',
 7: 'Teh_Marbuta',
 8: 'Reh',
 9: 'Sad',
 10: 'Dal',
 11: 'Sheen',
 12: 'Hah',
 13: 'Beh',
 14: 'Tah',
 15: 'Alef',
 16: 'Waw',
 17: 'Qaf',
 18: 'Al',
 19: 'Ghain',
 20: 'Heh',
 21: 'Ain',
 22: 'Kaf',
 23: 'Thal',
 24: 'Feh',
 25: 'Khah',
 26: 'Yeh',
 27: 'Jeem',
 28: 'Theh',
 29: 'Noon',
 30: 'Laa'}

## Data Preparation and Encoding

This section of the notebook focuses on preparing the dataset for training. It includes splitting the data into training and testing sets.

In [4]:
train_data, test_data= train_test_split(data, test_size=0.2, stratify=labels)

In [5]:
print("train data shape ",train_data.shape,"test data shape" ,test_data.shape)

train data shape  (6284, 3) test data shape (1572, 3)


### Custom Dataset Handling and Image Preprocessing

This section focuses on creating a custom dataset class and applying a series of image transformations for preprocessing. The primary goal here is to ensure that the images are properly formatted and augmented to enhance the model's ability to generalize from the training data.

In [6]:
class MyDataset(Dataset):
    def __init__(self, data, transforms=None):
        self.data = data
        self.transforms = transforms

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row=self.data.iloc[idx]
        image_path = row['image_path']
        label = row['label']
        char=row['chars']
        # Read the image using OpenCV
        image = cv2.imread(image_path)
        # Convert the image from BGR to RGB (OpenCV uses BGR by default)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        # Convert the image to a PIL Image
        image = Image.fromarray(image)
        
        if self.transforms:
            image = self.transforms(image)

        return image, label,char

In [7]:
# Can add multiple transformations to enhance the training of the model

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(30),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])

test_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
])



In [8]:
trainset = MyDataset(train_data,train_transforms)
trainloader = DataLoader(trainset, batch_size=64, shuffle=True)

testset = MyDataset(test_data,test_transforms)
testloader = DataLoader(testset, batch_size=64, shuffle=False)

## **Comprehensive Model Setup, Training, and Evaluation**

This section outlines the setup, training, and evaluation of ResNet-18 which is a deep learning model from the Residual Network family, distinguished by its 18-layer architecture. It incorporates residual connections that prevent the vanishing gradient problem, facilitating effective learning even in deep networks. Commonly used for image classification, . Initially, the model is loaded with pre-trained weights and adjusted to match the specific requirements of the dataset by modifying its final layers. The training process is facilitated by defining an appropriate loss function (cross-entropy) and an optimizer (Adam), which are crucial for optimizing the model's performance. The section includes detailed training and validation functions that operate over multiple epochs to incrementally improve accuracy and reduce loss. 

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'

model_resnet18 = torchvision.models.resnet18(weights="DEFAULT")
for name, param in model_resnet18.named_parameters():
    if "layer4" in name or "fc" in name:
        param.requires_grad = True
    else:
        param.requires_grad = False

model_resnet18.fc = nn.Linear(512, 31)
model_resnet18.to(device)
print(model_resnet18)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 123MB/s] 


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
  

In [9]:
def train(model, dataloader, loss, optimizer,device='cuda'):
    model.train()
    acc = []
    lss_history = []
    for data,labels,chars in dataloader:
        optimizer.zero_grad()
        data=data.to(device)
        labels=labels.to(device)
        pred = model(data)
        lss = loss(pred, labels)
        lss.backward()
        optimizer.step()
        # acc calculations
        lss_history.append(lss.item())
        acc.append(((pred.argmax(axis = 1) == labels).type(torch.float)).mean().item())
    return np.mean(lss_history) ,np.mean(acc)



In [10]:
# function to validate the model
def validate(model, dataloader, loss_func,device='cuda'):
    model.eval()
    loss_values = []
    acc_values = []
    with torch.no_grad():
        for data,labels,chars in dataloader:
            data=data.to(device)
            labels=labels.to(device)
            pred = model(data)
            loss = loss_func(pred, labels)
            loss_values.append(loss.item())
            acc_value = (pred.argmax(axis = 1) == labels).type(torch.float32)
            acc_values.append(acc_value.mean().item())
    return np.mean(loss_values), np.mean(acc_values)

In [12]:
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_resnet18.parameters(), lr=0.0001)

# **Results Overview from Initial Training Cycles**

The results for the ResNet-18 model over 10 training epochs demonstrate a consistent improvement, with training loss decreasing from 2.182 to 0.896 and accuracy increasing to 59.8%. This steady progression reflects the model's effective learning and generalization to the training data. Upon testing, the model showcased excellent performance with a test loss of 0.216 and an impressive accuracy of 93.4%, indicating robust generalization to unseen data. These outcomes highlight ResNet-18's capability in handling complex image classification tasks, suggesting it has been well-adapted and tuned for the dataset while maintaining a balance against overfitting.

In [13]:
def tune_model(epochs, model, train_dataloader, test_dataloader, loss_func, optimizer):
    for epoch in range(epochs):
        train_loss, train_acc = train(model, train_dataloader, loss_func, optimizer)
        print(f"Epoch : {epoch + 1} || Train loss : {train_loss:5.3f} || Train accuracy : {train_acc:5.3f}", end="")


In [14]:
tune_model(10, model_resnet18, trainloader, testloader, loss, optimizer)
torch.save(model_resnet18, f"/kaggle/working/resnet18 with epoch{epoch+1}.pth")


Premature end of JPEG file


Epoch : 1 || Train loss : 2.102 || Train accuracy : 0.471

Premature end of JPEG file


Epoch : 2 || Train loss : 0.812 || Train accuracy : 0.821

Premature end of JPEG file


Epoch : 3 || Train loss : 0.485 || Train accuracy : 0.893

Premature end of JPEG file


Epoch : 4 || Train loss : 0.348 || Train accuracy : 0.925

Premature end of JPEG file


Epoch : 5 || Train loss : 0.251 || Train accuracy : 0.945

Premature end of JPEG file


Epoch : 6 || Train loss : 0.205 || Train accuracy : 0.955

Premature end of JPEG file


Epoch : 7 || Train loss : 0.159 || Train accuracy : 0.964

Premature end of JPEG file


Epoch : 8 || Train loss : 0.129 || Train accuracy : 0.973

Premature end of JPEG file


Epoch : 9 || Train loss : 0.113 || Train accuracy : 0.979

Premature end of JPEG file


Epoch : 10 || Train loss : 0.096 || Train accuracy : 0.980

## **Testing The Model**

In [19]:
test_loss, test_acc= validate(model_resnet18, testloader, loss)
print(f" Train loss : {test_loss:5.3f} || Train accuracy : {test_acc:5.3f}")


 Train loss : 0.216 || Train accuracy : 0.934
