# **Environmental SetUp 🔩**

In this comprehensive notebook, we embark on an exhilarating journey to craft a cutting-edge model with the remarkable ability to discern sign language from images. Our mission is to not just meet but exceed the boundaries of conventional detection systems. To elevate the prowess of our model, we enthusiastically introduce an augmented dataset, injecting vitality and diversity into our training regimen. Buckle up as we traverse the fascinating landscape of sign language recognition, fusing the art of technology with the eloquence of non-verbal communication.

In [None]:
# Common Imports
import os
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from IPython.display import clear_output as cls

# Data loading
from keras.preprocessing.image import ImageDataGenerator

# Data Visualization
import plotly.express as px
import matplotlib.pyplot as plt

# Model Loading
from tensorflow.keras import layers
from tensorflow.keras import Sequential

# Transfer Learning
from tensorflow.keras.applications import InceptionV3, Xception
from tensorflow.keras.applications import ResNet50V2, ResNet152V2
from tensorflow.keras.applications import MobileNetV2, MobileNetV3Small

# Model Tunning
import keras_tuner as kt

2024-02-17 00:36:10.249179: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-17 00:36:10.249262: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-17 00:36:10.374753: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [None]:
# Constants and Random Seed
IMAGE_SIZE = (224, 224)
BATCH_SIZE = 64
train_dir = "/kaggle/input/asl-alphabet/asl_alphabet_train/asl_alphabet_train/"
test_dir  = "/kaggle/input/asl-alphabet/asl_alphabet_test/asl_alphabet_test/"

np.random.seed(42)
tf.random.set_seed(42)

# **Data Loading 🗃️**

Recognizing the enormity of our dataset, we strategically opt for efficiency by harnessing the power of the image data generator bestowed by Keras. This ingenious approach not only accelerates the loading process but also affords us the luxury of on-the-fly data preprocessing. No longer confined by the limitations of memory, our model embarks on a swift and dynamic journey through the expansive realms of sign language images.

In [None]:
# Initialize Image Data Generator
train_gen = ImageDataGenerator(
    rescale = 1./255,
    validation_split=0.2
)

# Loading Training Data
train_data = train_gen.flow_from_directory(
    directory = train_dir,
    target_size = IMAGE_SIZE,
    batch_size = BATCH_SIZE,
    color_mode = "rgb",
    class_mode = "binary",
    shuffle = True,
    subset = 'training',
    seed = 42
)

Found 69600 images belonging to 29 classes.


In [None]:
# Mapping from classes to numeric values
classes_to_num = train_data.class_indices
num_to_classes = {value:key for key, value in classes_to_num.items()}

print(f"Class to Number Mapping:")
for alphabet, num in classes_to_num.items():
    print(f"{alphabet:10} -> {num}")

Class to Number Mapping:
A          -> 0
B          -> 1
C          -> 2
D          -> 3
E          -> 4
F          -> 5
G          -> 6
H          -> 7
I          -> 8
J          -> 9
K          -> 10
L          -> 11
M          -> 12
N          -> 13
O          -> 14
P          -> 15
Q          -> 16
R          -> 17
S          -> 18
T          -> 19
U          -> 20
V          -> 21
W          -> 22
X          -> 23
Y          -> 24
Z          -> 25
del        -> 26
nothing    -> 27
space      -> 28


In [None]:
# Loading Validation Data
valid_data = train_gen.flow_from_directory(
    directory = train_dir,
    target_size = IMAGE_SIZE,
    batch_size = BATCH_SIZE,
    color_mode = "rgb",
    class_mode = "binary",
    subset = 'validation',
    seed = 42
)

Found 17400 images belonging to 29 classes.


In [None]:
# Testing data
test_files = os.listdir(test_dir)

test_images = np.empty(shape=(len(test_files), *IMAGE_SIZE, 3), dtype=np.float32)
test_labels = np.empty(shape=(len(test_files), 1), dtype=np.int32)

for index, file in enumerate(test_files):

    # Loading and Processing image file
    img = tf.io.read_file(os.path.join(test_dir, file))
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, IMAGE_SIZE)
    img = tf.cast(img, tf.float32)/255.

    # Extract Label
    label = classes_to_num[file.split("_")[0]]

    # Add the loaded file and label to data
    test_images[index], test_labels[index] = img, label

In [None]:
# Confirmation check
print(np.squeeze(test_labels))

[ 0  4 11 13 18  3  6  8 22 12 27 23  7 16  2 19 15 21 24 20 28 14  1 17
  5 25  9 10]


# **Data Visualization 📊**

Having successfully loaded the dataset into memory, it's now imperative to embark on a visual exploration that transcends mere data inspection. Visualization becomes our lens to decipher subtle nuances, confirm assumptions, and unravel the captivating patterns woven within the fabric of our sign language images.

This essential step not only refines our understanding of the data but also humanizes the information, enabling us to forge a profound connection between the raw pixels and the expressive gestures they encapsulate.

In [None]:
# Collecting class data
n_classes = train_data.num_classes
class_names = sorted(os.listdir(train_dir))

print(f"Number of Classes: {n_classes}")

# Computing class distribution
class_dis = [len(os.listdir(train_dir + class_name)) for class_name in class_names]

Number of Classes: 29


In [None]:
# Visualizing Class Distribution
pie_plot = px.pie(
    names = class_names,
    values = class_dis,
    color = class_names,
    hole = .4,
    title = "ASL Class Distribution"
)
pie_plot.show()

bar_plot = px.bar(
    x = class_names,
    y = class_dis,
    color = class_names,
    title = "ASL Class Distribution"
)
bar_plot.update_layout(showlegend=False, yaxis_title="Frequency Count", xaxis_title=None)
bar_plot.show()





Excitingly, our dataset boasts 29 classes, each adorned with a petite yet robust collection of 3,000 images. The perfection of class distribution is paramount, eliminating any bias and ensuring equal attention to each class. This meticulous balance serves as a cornerstone, paving the way for a model that approaches every class with the same level of scrutiny. With no room for bias, our dataset becomes the ideal playground for training a model that harmoniously navigates the intricacies of each class.

# **Backbone Comparision 📑**

Having meticulously navigated through the essential phases of data preprocessing, visualization, and analysis, we now stand at the threshold of a pivotal juncture—the selection of the optimal transfer learning model or backbone. This critical decision is poised to propel our journey towards crafting a robust and efficient model at an accelerated pace.

In [None]:
# Initializing all the Backbones
backbones = [
    (
        "ResNet50V2",
        ResNet50V2(
            input_shape = (*IMAGE_SIZE, 3),
            weights = "imagenet",
            include_top = False
        )
    ),

    (
        "ResNet152V2",
        ResNet152V2(
            input_shape = (*IMAGE_SIZE, 3),
            weights = "imagenet",
            include_top = False
        )
    ),

    (
        "Xception",
        Xception(
            input_shape = (*IMAGE_SIZE, 3),
            weights = "imagenet",
            include_top = False
        )
    ),

    (
        "InceptionV3",
        InceptionV3(
            input_shape = (*IMAGE_SIZE, 3),
            weights = "imagenet",
            include_top = False
        )
    ),

    (
        "MobileNetV2",
        MobileNetV2(
            input_shape = (*IMAGE_SIZE, 3),
            weights = "imagenet",
            include_top = False
        )
    ),

    (
        "MobileNetV3Small",
        MobileNetV3Small(
            input_shape = (*IMAGE_SIZE, 3),
            weights = "imagenet",
            include_top = False
        )
    ),
]

In [None]:
# Recording backbone history
BACKBONE_HISTORIES = {}

# New batch size for fast bacnkbone testing
train_data.batch_size = 16

# Loop over backbones
for (name, backbone) in backbones:

    print(f"Testing : {name}")

    # Freeze the Model weights
    backbone.trainable = False

    # Creating a base model
    model = keras.Sequential([
        layers.InputLayer((*IMAGE_SIZE, 3), name = "InputLayer"),
        backbone,
        layers.Dropout(0.2, name = "SlightDropout"),
        layers.GlobalAveragePooling2D(name = "GAP2D"),
        layers.Dense(n_classes, activation="softmax")
    ])

    # Train the model for few iterations
    model.compile(
        loss = "sparse_categorical_crossentropy",
        optimizer = "adam",
        metrics = ['accuracy']
    )

    history = model.fit(
        train_data,
        validation_data = valid_data,
        epochs = 5,
    )

    BACKBONE_HISTORIES[name] = pd.DataFrame(history.history)
    cls()
    print("\n")


Let's visualize the learning curves of the backbones. This will allow us to evaluate which backbone to choose for final model building.

In [None]:
# Displying the Learning curves
for metric in ['loss', 'accuracy']:

    # Visualizing Training and validation scores
    plt.figure(figsize=(15, 5))
    for i, sub in enumerate(['Train', 'Val']):

        plt.subplot(1, 2, i+1)
        plt.title(f"{sub} {metric} Plot")

        # Looping over all the backbones
        for name, history in BACKBONE_HISTORIES.items():

            # It had the worst performance
            if name=="MobileNetV3Small":
                break

            plt.plot(history[metric] if sub=="Train" else history[f"val_{metric}"], label = name)
            plt.xlabel("Epochs")
            plt.ylabel(metric.title())
            plt.grid()
            plt.legend()

    plt.tight_layout()
    plt.savefig(f"{metric.title()}_LC.png")
    plt.show()

In our quest to identify the most promising backbone, a noteworthy observation emerges — each model exhibits signs of overfitting. Surprisingly, this turns out to be a positive indicator at this juncture, signaling that every contender comprehends the intricacies of our dataset. The standout performer in this race appears to be ResNet-152V2. Notably, at the third epoch, it showcased remarkable performance, followed by a slight dip in the subsequent epoch, yet still maintaining a strong position in the validation stage. Exception, though displaying a transient dip in performance, remained a contender during evaluation.

On the contrary, Inception proved sluggish and exhibited signs of severe overfitting, making it less favorable for robustness. Considering the balance between speed and accuracy, ResNet-152V2 emerges as the frontrunner. Its efficacy in both speed and accuracy, coupled with its established track record, positions it as the backbone of choice. While ResNet-50 also presents a strong case, ResNet-152V2's performance edge makes it the prime candidate for our model's foundation. With these deliberations in mind, we embark on the next phase of our journey, leveraging ResNet-152V2 as the key architectural element in our pursuit of crafting a robust and effective sign language detection model.

# **Model Building 👨‍🏭**

Having identified the optimal backbone in ResNet-152V2, our focus now shifts towards sculpting the best overall architecture for a robust performance. To expedite this intricate process, we leverage the Keras Tuner, employing its Random Search algorithm. This dynamic tool becomes our ally, diligently exploring a spectrum of hyperparameters and parameters to unveil the model configuration that excels in both training and validation stages.

The Keras Tuner serves as our compass in the vast landscape of potential architectures, allowing us to navigate efficiently and effectively towards the pinnacle of model performance. As we embark on this quest, the synergy between ResNet-152V2 and the Keras Tuner promises to unearth a finely-tuned and robust sign language detection model.

With the Random Search algorithm as our guide, the journey unfolds, poised to reveal the architectural gem that encapsulates the essence of speed, accuracy, and resilience. Let the exploration of optimal model configurations commence, ushering us closer to the realization of a powerful and efficient sign language detection model.

In [None]:
def build_model(hps):

    # Loading Backbone
    backbone = ResNet152V2(
        input_shape = (*IMAGE_SIZE, 3),
        weights = "imagenet",
        include_top = False
    )

    # Parameter Search
    rate = hps.Choice('rate', [0.2, 0.4, 0.6])
    n_layers = hps.Choice('n_layers', [2,4,6])
    units = hps.Choice('n_units', [64, 128, 256])
    optim = hps.Choice('optimizer', ['adam', 'rmsprop'])
    use_bn = hps.Choice('use_bn', [True, False])

    # Building Model
    model = Sequential([
        layers.InputLayer((*IMAGE_SIZE, 3)),
        backbone,
        layers.GlobalAveragePooling2D(),
    ])

    # Top model layers
    for _ in range(n_layers):
        if use_bn:
            model.add(layers.BatchNormalization())
        model.add(layers.Dense(units, activation='relu', kernel_initializer="he_normal"))

    # Add Ouput layer
    model.add(layers.Dropout(rate))
    model.add(layers.Dense(n_classes, activation="softmax"))

    # Compile Model
    model.compile(
        loss = "sparse_categorical_crossentropy",
        optimizer = optim,
        metrics = ['accuracy'],
        callbacks = [keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)]
    )

    return model

In [None]:
# Initialize Random Searcher
random_searcher = kt.RandomSearch(
    hypermodel=build_model,
    objective='val_accuracy',
    max_trials=10,
    project_name="ASLModelSearch",
    seed=42,
)

# Start Searching
search = random_searcher.search(
    train_data,
    validation_data=valid_data,
    epochs = 10,
)

In [None]:
# Loading Backbone
backbone = ResNet152V2(
    input_shape = (*IMAGE_SIZE, 3),
    weights = "imagenet",
    include_top = False
)

# Parameter Search
rate = 0.2
n_layers = 2
units = 128
optim = 'adam'
use_bn = False

# Building Model
model = Sequential([
    layers.InputLayer((*IMAGE_SIZE, 3)),
    backbone,
    layers.GlobalAveragePooling2D(),
], name="ASL-ResNet152V2")

# Top model layers
for _ in range(n_layers):
    if use_bn:
        model.add(layers.BatchNormalization())
    model.add(layers.Dense(units, activation='relu', kernel_initializer="he_normal"))

# Add Ouput layer
model.add(layers.Dropout(rate))
model.add(layers.Dense(n_classes, activation="softmax"))

# Compile Model
model.compile(
    loss = "sparse_categorical_crossentropy",
    optimizer = optim,
    metrics = ['accuracy'],
)

# Model Architecture
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet152v2_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "ASL-ResNet152V2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 resnet152v2 (Functional)    (None, 7, 7, 2048)        58331648  
                                                                 
 global_average_pooling2d (  (None, 2048)              0         
 GlobalAveragePooling2D)                                         
                                                                 
 dense (Dense)               (None, 128)               262272    
                                                                 
 dense_1 (Dense)             (None, 128)               16512     
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                     

In [None]:
# Model Training
history = model.fit(
    train_data,
    validation_data = valid_data,
    epochs = 20,
    callbacks = [
        keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
        keras.callbacks.ModelCheckpoint("ASL-ResNet152V2.keras", save_best_only=True),
    ]
)

Epoch 1/20


I0000 00:00:1708130441.522394     115 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20


In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test Loss: {test_loss}")
print(f"Test Acc : {test_acc}")

Test Loss: 1.5284260825865204e-06
Test Acc : 1.0


In [None]:
model.evaluate(train_data)



[0.0018078760476782918, 0.99949711561203]