# ASL Recognition with Deep Learning

### Table of Contents

* [Project Overview](#chapter1)
* [Loading the ASL Image Dataset](#chapter2)
* [Examining the Dataset](#chapter3)
* [One-hot encoding the data](#chapter4)
* [Defining the model](#chapter5)
* [Compiling the model](#chapter6)
* [Training the model](#chapter7)
* [Testing the model](#chapter8)
* [Visualizing mistakes](#chapter9)

## 1. Project Overview <a class="anchor" id="chapter1"></a> 

American Sign Language (ASL) is the primary language used by many deaf individuals in North America, and it is also used by hard-of-hearing and hearing individuals. The language is as rich as spoken languages and employs signs made with the hand, along with facial gestures and bodily postures.

![](pictures/asl.png)

A lot of recent progress has been made towards developing computer vision systems that translate sign language to spoken language. This technology often relies on complex neural network architectures that can detect subtle patterns in streaming video. However, as a first step, towards understanding how to build a translation system, we can reduce the size of the problem by translating individual letters, instead of sentences.

**In this notebook**, we will train a convolutional neural network to classify images of American Sign Language (ASL) letters. After loading, examining, and preprocessing the data, we will train the network and test its performance.

In the code cell below, we load the training and test data.

* `x_train` and `x_test` are arrays of image data.
* `y_train` and `y_test` are arrays of category labels.

## 2. Loading the ASL Image Dataset <a class="anchor" id="chapter2"></a> 

To load the training data, we will:
* Define the file paths for the training and testing datasets
* Create a list of folders' names for each ASL symbol to avoid using `os.listdir()`
* Load the needed libraries and set NumPy random seed
* Define functions to load, shuffle, and split the data

In [1]:
# Define the file paths for the training and testing datasets
train_dir = '/Users/inmilk306/Documents/projects/ASL_Recognition w_DL/archive/asl_alphabet_train/asl_alphabet_train'
test_dir = '/Users/inmilk306/Documents/projects/ASL_Recognition w_DL/archive/asl_alphabet_test/asl_alphabet_test'

In [2]:
# Create a list of folders' names for each ASL symbol
folder = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','del','nothing','space']

In [3]:
# Import packages and set numpy random seed
import random
import numpy as np
from os.path import join
from os import listdir
import matplotlib.pyplot as plt
import cv2
import skimage
from skimage.transform import resize
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from keras.preprocessing import image

np.random.seed(5) 
tf.random.set_seed(2)
%matplotlib inline

# Define functions to load, shuffle, and split the data
def load_data(container_path=train_dir, folders=folder,
              size=87000, test_split=0.2, seed=0): # 3000 images in 29 classes
    """
    Loads sign language dataset.
    """
    
    filenames, labels = [], []

    for label, folder in enumerate(folders):
        folder_path = join(container_path, folder)
        images = [join(folder_path, d)
                     for d in sorted(listdir(folder_path))]
        labels.extend(len(images) * [label])
        filenames.extend(images)
    
    random.seed(seed)
    data = list(zip(filenames, labels))
    random.shuffle(data)
    data = data[:size]
    filenames, labels = zip(*data)

    
    # Get the images
    x = paths_to_tensor(filenames).astype('float32')/255
    # Store the one-hot targets
    y = np.array(labels)

    x_train = np.array(x[:int(len(x) * (1 - test_split))])
    y_train = np.array(y[:int(len(x) * (1 - test_split))])
    x_test = np.array(x[int(len(x) * (1 - test_split)):])
    y_test = np.array(y[int(len(x) * (1 - test_split)):])
     
    print('Loaded', len(x_train),'images for training,','Train data shape =',x_train.shape)
    print('Loaded', len(x_test),'images for testing','Test data shape =',x_test.shape)
    print('Loaded', len(x_train),'labels for training,','Train data shape =',y_train.shape)
    print('Loaded', len(x_test),'labels for testing','Test data shape =',y_test.shape)
    return (x_train, y_train), (x_test, y_test)


def path_to_tensor(img_path, size):
    # loads RGB image as PIL.Image.Image type
    img = image.load_img(img_path, target_size=(size, size))
    # convert PIL.Image.Image type to 3D tensor
    x = image.img_to_array(img)
    # convert 3D tensor to 4D tensor 
    return np.expand_dims(x, axis=0)

def paths_to_tensor(img_paths, size=50):
    list_of_tensors = [path_to_tensor(img_path, size) for img_path in img_paths]
    return np.vstack(list_of_tensors)

In [4]:
# Load pre-shuffled training and test datasets
(x_train, y_train), (x_test, y_test) = load_data()

KeyboardInterrupt: 

## 3. Examining the Dataset <a class="anchor" id="chapter3"></a> 

Now we'll begin by creating a list of string-valued labels containing the letters that appear in the dataset. Then, we visualize the first several images in the training data, along with their corresponding labels.

In [None]:
# Print the first several training images, along with the labels
fig = plt.figure(figsize=(20,5))
for i in range(36):
    ax = fig.add_subplot(3, 12, i + 1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(x_train[i]))
    ax.set_title("{}".format(folder[y_train[i]]))
plt.show()

Let's examine how many images of each letter can be found in the dataset.

Remember that dataset has already been split into training and test sets, where `x_train` and `x_test` contain the images, and `y_train` and `y_test` contain their corresponding labels.

For example, each entry in `y_train` and `y_test` is one of **0**, **1**, or **2**, corresponding to the letters 'A', 'B', and 'C', respectively.

We will use the arrays `y_train` and `y_test` to verify that both the training and test sets each have roughly equal proportions of each symbol.

In [None]:
labels_dict = {'A':0,'B':1,'C':2,'D':3,'E':4,'F':5,'G':6,'H':7,'I':8,'J':9,'K':10,'L':11,'M':12,
                   'N':13,'O':14,'P':15,'Q':16,'R':17,'S':18,'T':19,'U':20,'V':21,'W':22,'X':23,'Y':24,
                   'Z':25,'space':26,'del':27,'nothing':28}

# Number of each symbol in the training and testing datasets
train_dict = {}
test_dict = {}
for key in labels_dict:
    train_dict[key] = sum(y_train==labels_dict[key])
    test_dict[key] = sum(y_test==labels_dict[key])

In [None]:
train_dict

In [None]:
test_dict

We have roughly equal proportions of each symbol.

## 4. One-hot encoding the data <a class="anchor" id="chapter4"></a> 

Currently, our labels for each of the letters are encoded as categorical integers, where, for example, 'A', 'B' and 'C' are encoded as 0, 1, and 2, respectively. However, Keras models do not accept labels in this format, and we must first one-hot encode the labels before supplying them to a Keras model.

This conversion will turn the one-dimensional array of labels into a two-dimensional array.

![](pictures/onehot.png)

Each row in the two-dimensional array of one-hot encoded labels corresponds to a different image. The row has a 1 in the column that corresponds to the correct label, and 0 elsewhere.

For instance,

* 0 is encoded as [1, 0, 0],
* 1 is encoded as [0, 1, 0], and
* 2 is encoded as [0, 0, 1].

In [None]:
# One-hot encode the training labels
y_train_OH = to_categorical(y_train)

# One-hot encode the test labels
y_test_OH = to_categorical(y_test)

## 5. Defining the model <a class="anchor" id="chapter5"></a> 

Now it's time to define a convolutional neural network to classify the data.

This network accepts an image of an American Sign Language letter as input. The output layer returns the network's predicted probabilities that the image belongs in each category.

In [None]:
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Flatten, Dense
from keras.models import Sequential

model = Sequential()
# First convolutional layer accepts image input
model.add(Conv2D(filters=5, kernel_size=5, padding='same', activation='relu', 
                        input_shape=(50, 50, 3)))
# Add a max pooling layer
model.add(MaxPooling2D(pool_size=(4,4)))
# Add a convolutional layer
model.add(Conv2D(filters=15, kernel_size=5, padding='same', activation='relu'))
# Add another max pooling layer
model.add(MaxPooling2D(pool_size=(4,4)))
# Flatten and feed to output layer
model.add(Flatten())
model.add(Dense(29, activation='softmax'))

# Summarize the model
model.summary()

In the above waring, Tensorflow simply tells us that the version we have installed can use the AVX and AVX2 operations and is set to do so by default in certain situations (say inside a forward or back-prop matrix multiply), which can speed things up. This is not an error, it is just telling us that it can and will take advantage of our CPU to get that extra speed out.

## 6. Compiling the model <a class="anchor" id="chapter6"></a> 

After we have defined a neural network in Keras, the next step is to compile it!

In [None]:
# Compile the model
model.compile(optimizer='rmsprop', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

## 7. Training the model <a class="anchor" id="chapter7"></a> 

Once we have compiled the model, we're ready to fit it to the training data.

In [None]:
# Train the model
hist = model.fit(x_train,y_train_OH, epochs=50, batch_size=64, validation_split=0.2)

In [None]:
# Plot the loss and accuracy curves for training and validation 
fig, ax = plt.subplots(2,1)
ax[0].plot(hist.history['loss'], color='b', label="Train")
ax[0].plot(hist.history['val_loss'], color='r', label="validation",axes =ax[0])
legend = ax[0].legend(loc='best', shadow=True)

ax[1].plot(hist.history['accuracy'], color='b', label="Train")
ax[1].plot(hist.history['val_accuracy'], color='r',label="Validation")
legend = ax[1].legend(loc='best', shadow=True)

## 8. Testing the model <a class="anchor" id="chapter8"></a> 

To evaluate the model, we'll use the test dataset. This will tell us how the network performs when classifying images it has never seen before!

If the classification accuracy on the test dataset is similar to the training dataset, this is a good sign that the model did not overfit to the training data.

In [None]:
# Obtain accuracy on test set
score = model.evaluate(x=x_test, 
                       y=y_test_OH,
                       verbose=0)
print('Test accuracy:', score[1])

The classification accuracy on the test dataset is similar to the training dataset, so the model did not overfit to the training data.

## 9. Visualizing mistakes <a class="anchor" id="chapter9"></a> 