Logistic Regression ASL Model
<br>
<br>
Ankit Dheendsa - Brainstation
<br>
September 2023

The goal of creating a CNN is to build a deeplearning neural network that can perform ASL translation fast and reliable. We utilize a Convolutional Neural Network due to its ability to process more complex patterns and to show more accurate results despite harder validation sets. Our CNN will utilize the image data we collected for different hand signs instead of numerical representations of the landmark positions. The process of determining which deep learning method to pursue was an iterative one. After running the logistic regression demo it was found that using numerical representations of the hand data would require a lot more processing of the data and mutliple types of averaging techniques would need to be used to combat the constantly changing numerical values when performing a hand sign on a live webcam feed opposed to a picture. As a result a CNN made the most sense as it would be able to comprehend complex hand patterns that best represent the data it has been trained on and therefore would be far more reliable and quicker.

Getting started with our CNN we will import all required modules, libraries and frameworks

In [1]:
# Required imports
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
import os
from PIL import Image, ImageOps, UnidentifiedImageError
from keras.models import load_model
from tqdm import tqdm  # Import tqdm for progress bar


We will start by getting the path of the data folder, due to different systems and directories we will use the .expanduser() function to get the desktop path and then add in the relative path to avoid bugs and issues when trying to replicate this project on your own machine.

In [2]:
# Get the absolute path to the "CNN" folder on your desktop (sometimes the folder wasnt being accessed via relative path so this gets rid of that uncertainty)
# In order to accomplish this we will use the os module
desktop_path = os.path.expanduser("~/Desktop")
dataset_dir = os.path.join(desktop_path, "Jupyter Notebooks/CNN Model/Data")

We will continue by setting the img width and height parameters to standardize the image sizing going into training the model and we will set the batch size to 32 (this value was determined after an iterative process of testing various hyper parameters for optimal results)

In [3]:
# Define image dimensions and batch size (we keep the sizing the same as when the data was created to normalize the images as much as possible)
img_width, img_height = 224, 224

# Here we define our batch size (this is an iterative process that needs to be tested with other hyper parameters for optimal model performance)
batch_size = 32 # We found that a batch size of 32 was suitable for our models purpose

Moving on, we will start by creating an image data generator which will be used for data augmentation and pre processing. An image data generator allows us to apply transformations to the image so that the model can learn to distinguish a class despite image altering transformations, thus creating a smarter and more robust model that will respond better to unseen (validation) data.

In [5]:
# Next we will create an ImageDataGenerator for data augmentation and preprocessing
# The purpose of this step is to apply various transformations to the image so that the model is more robust and better responds to unseen data that can be encountered later during actual usage
datagen = ImageDataGenerator(
    rescale=1.0/255.0,  # This normalizes pixel values to [0, 1]
    rotation_range=20,  # Randomly rotate images up to 20 degrees
    width_shift_range=0.1,  # Randomly shift the width of the images
    height_shift_range=0.1,  # Randomly shift the height of the images
    shear_range=0.2,  # Randomly apply shearing transformations
    zoom_range=0.2,  # Randomly zoom in on images
    horizontal_flip=True,  # Randomly flip images horizontally
    fill_mode='nearest'  # Fill in new pixels using the nearest available pixel (good for generalization and pattern recognition)
)

We will continue by creating a train generator which will be used to create a batch of training data from the data folder. This allows us to easily pass this to the CNN to train it effectively. The train and validation generator is where we will utilize our batch size variable that we instantiated earlier.

In [6]:
train_generator = datagen.flow_from_directory(
    dataset_dir,  # Specify the path to the data directory (Jupyter Notebooks/CNN Model/Data)
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical',  # We can use 'categorical' for multi-class classification
    subset='training'  # Here we specify that this is the training subset
)

Found 15146 images belonging to 8 classes.


We will recreate the same step as above by creating a validation generator which will be used to create a batch of validation data from the data folder. This will be passed to the CNN later on when we are testing our model on validation (unseen data).

In [None]:
validation_generator = datagen.flow_from_directory(
    dataset_dir,  # Specify path to the data directory (Jupyter Notebooks/CNN Model/Data)
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical',  # Use 'categorical' for multi-class classification
    subset='validation'  # Specify that this is the validation subset
)

Next, we will get the class labels and their index and print them. This is an important step as a correct label order is required for our labels.txt file so that the CNN can accurately label the predicted class.

In [8]:
# Indicing the class labels/index for ordering purposes
class_indices = train_generator.class_indices
# Here we print the class indices to see the order (will be important to ensure correct mapping in conjunction with the "labels.txt" file)
print(class_indices)

{'A': 0, 'B': 1, 'C': 2, 'HELLO': 3, 'I': 4, 'K': 5, 'N': 6, 'T': 7}


Now we can move on to defining the CNN model as well as its convolutional layers, we will instantiate a sequential model type with 3 convolutional 2-D layers along with 3 max-pooling layers to reduce spatial dimensions which would make the model less prone to overfitting and makes it more efficient (time and computationally).

In [9]:
# Here we define the CNN model
model = Sequential()

# Now we can setup the convolutional and max-pooling layers
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(img_width, img_height, 3))) #setting it to have 32 filters with each having a size of (3,3) and the input shape is the images width and height
model.add(MaxPooling2D((2, 2))) #After the convolutional layer we add a max pooling layer which reduces the spatial dimensions of the feature maps (makes it more efficient and less prone to overfitting)
model.add(Conv2D(64, (3, 3), activation='relu')) #We repeat the same architecture as seen above for 2 more convolutional and max-pooling layers, each new convolutional layer has 2x the filters of the previous
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))

Now we can continue to flatten the layers into a 1-D vector of connected layers rather than keeping the model 3 dimensional. Here is where we will also incorproate fully connected layers with a number of "neurons" whos job is to learn complex patterns with the flattened feature data we created above.

In [10]:
# We now flatten layers from the 3-D output gained from the above step (3 convolutional and max-pooling layers) into a 1-D vector (connected layers)
model.add(Flatten())

# We can now incorporate the fully connected layers
model.add(Dense(128, activation='relu')) #Here we add a fully connected/dense layer with 128 "neurons" or units that are used for learning complex patterns with the newly flattened feature data
model.add(Dropout(0.5))  # Dropout for regularization to help with minimizing overfitting risk (reduces reliance on specific neurons to improve generalization)
model.add(Dense(len(train_generator.class_indices), activation='softmax'))  # Number of classes based on folder names are added as neurons using softmax as an activation type (suitable method for mulit-class classification)


Now we can compile our model and incorporate categorical cross entropy for our multi-class classification as well as defining our learning rate.

In [11]:
# Next we can compile the model
model.compile(
    loss='categorical_crossentropy',  # We use categorical cross-entropy for multi-class classification
    optimizer=Adam(lr=0.001), #We can specify the learning rate
    metrics=['accuracy']
)



We can continue by training the model using the train generator we instantiated earlier, we will also set the number of epochs (times the model goes through the entire data set) to 15 (this was an iterative process just like the batch size to get as close to 100% accuracy). The expected output will be a list of trials (1 for every epoch) where the model will train and the accuracy goes up over time whilst the loss will go down.

In [13]:
# Now we are able to train the model using the train_generator we instantiated earlier
history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=15,  # We adjust the number of epochs (same as batch size) iteratively to determine the best value for model training
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // batch_size
)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


We can see from our results that 15 epochs was a suitable setting as we ended up with a 97.41% accuracy rate by the end of the training phase (the model had hit 99% at one point during the 15th epoch so it wouldnt be wise to continue iterating as we might risk overfitting and the accuracy going down).

Finally, we can save the model as a .h5 file so that we may import it into our demo script later on to use in a practical live setting.

In [14]:
# Finally we can save the trained model
# We wont run this cell again since we already ran it the file was saved in the directory
model.save("asl_classifier.h5")

  saving_api.save_model(


As a statistical metric we will create a custom classification table that will measure confidence scores of the model as well as the accuracy in predicting the class. The reason we incorporate confidence scores is to be able to determine where issues may lie within the model, for if the model is confident in its prediction (99% for example) but predicts wrong, it may be that our dataset is fault and needs to be looked at again rather than attempting to "fix" the model. The expected output for this code chunk is a table that prints out the class name, the average confidence score of the model when predicting within that class, and the accuracy of the model.
<br>
<br>
We will accomplish this by iterating through our Validation Data set which consists of 500 images for each class. All images have not been seen by the model before which will allow us to get a good idea of how the model would perform in a real life scenario. We will iterate through the validation folder into the subfolders of each class that contain the image data, we will then pass each image iteratively through the model and have it make a prediction. We will then record the confidence score of that prediction and compare the predicted class with the actual class of the image to determine if it was accurate or not. The final scores are tallies up and divided to get our averaged values.
<br>
<br>
As this can be a somewhat lengthy process (2 min 3 sec on average), we will incorporate a progress bar using the tqdm library.

In [2]:
# We will disable scientific notation for clarity (making it easier to read for those not familiar with scientific notation)
np.set_printoptions(suppress=True)

# Here we will load the model that we created and saved
model = load_model("asl_classifier.h5", compile=False)

# We will continue to load the labels of the validation data set
class_names = open("validation_labels.txt", "r").readlines()

# We will instantiate an empty lis to store results
results = []

# Here we specify the path to the main directory containing subfolders (each subfolder is a class)
main_directory = "Validation Data"

# Now we will get a list of the subfolders 
class_folders = [class_name for class_name in os.listdir(main_directory) if os.path.isdir(os.path.join(main_directory, class_name))]

# We can now use tqdm to create a progress bar for convenience and better understanding of the iterative process and its timing
with tqdm(total=len(class_folders)) as pbar:
    # We loop through subfolders 
    for class_name in class_folders:
        class_directory = os.path.join(main_directory, class_name)

        # Here we instantiate empty variables that will hold our results when they are calcualted
        correct_predictions = 0
        total_confidence = 0.0
        total_images = 0

        # Now we loop through images in the class directory so that they can be passed to the model
        for image_filename in os.listdir(class_directory):
            image_path = os.path.join(class_directory, image_filename)

            try:
                # We preprocess the image similar to how we did with our generators and in the data collection python script for generalization (so that the images are more readable to the model and what its used to)
                image = Image.open(image_path).convert("RGB")
                size = (224, 224)
                image = ImageOps.fit(image, size, Image.Resampling.LANCZOS)
                image_array = np.asarray(image) #we store the image as a numpy array so that we can apply normalization methods
                normalized_image_array = (image_array.astype(np.float32) / 127.5) - 1 #now we can normalize the previously created image array
                data = np.ndarray(shape=(1, 224, 224, 3), dtype=np.float32) #we define the shape and the data type (set to float) so that there are no errors
                data[0] = normalized_image_array

                # Now we can predict the class label for the image using our CNN
                prediction = model.predict(data, verbose=0) #we set verbose to 0 so that each iteration isnt printed in the terminal (keeps the output clean and more readable)
                index = np.argmax(prediction)
                confidence_score = prediction[0][index] #we can keep track of the confidence score by indexing the prediction

                # Check if the prediction matches the actual class name
                if confidence_score > 0.99 and class_name == class_name:
                    correct_predictions += 1 #if confidence is above 99% and the class names match we consider it accurate (this decreases likelihood of false positives)

                # We can now add to our tallies
                total_confidence += confidence_score
                total_images += 1

            # Here we can throw an exception if an image is unreadble or cannot be opened due to some kind of unforseen error
            except (UnidentifiedImageError, OSError):
                # Skip images that cannot be identified or opened
                continue

        # Calculate accuracy and average confidence for the class
        if total_images > 0:
            accuracy = correct_predictions / total_images
            average_confidence = total_confidence / total_images
        else:
            accuracy = 0.0
            average_confidence = 0.0

        # Now we can append results to the list
        results.append((class_name, average_confidence, accuracy))
        
        # Here we update the progress bar after a class has been finished testing
        pbar.update(1)

# Print the final summary table
print("Class | Average Confidence | Accuracy")
for result in results:
    class_name, avg_confidence, accuracy = result
    print(f"{class_name} | {avg_confidence:.4f} | {accuracy:.4f}")


100%|██████████| 8/8 [02:05<00:00, 15.71s/it]

Class | Average Confidence | Accuracy
I | 0.9999 | 1.0000
N | 0.9977 | 0.9680
T | 0.9984 | 0.9782
A | 0.9994 | 0.9940
Hello | 1.0000 | 1.0000
C | 1.0000 | 1.0000
B | 1.0000 | 1.0000
K | 1.0000 | 1.0000





We can see with our final results that the model performs exceptionally well with the lowest accuracy score being 96% for the letter 'N'. We notice a major trend here however with the classes that had the lowest accuracy scores. If we pay attention to the letters N,T and A we see all the accuracy scores being less than 100% whilst all the other classes achieved 100% accuracy.
<br>
<br>
We can credit these results due to the nature of the hand signs themselves. The letters N,T and A are all extremely similar with only minor adjustments to differentiate them all, whilst all the classes that scored 100% accuracy are very distinct signs that typically require the spread of your fingers to be further from the palm (refer to an ASL chart for better understanding).
<br>
<br>
As a result, without the need of a confusion matrix we can determine the reasoning for these model results and a possible solution. In terms of training the model to better differentiate between more similar signs, we can simply increase our dataset size for those classes. If that doesnt work entirely our next plan of action would be to optimize the model specifically for those classes as it takes less resources for the model to translate the classes with 100% accuracy we can dedicate a certain amount of computational power to specifically be good at differentiating minor pattern changes with signs that are hyper similar.