# American Sign Language (ASL) Image Recognition

## Introduction

## Loading in dataset

### Dividing data into training, validation, and testing sets

Now that preprocessing the images is completed (see `data_preprocessing.ipynb` notebook), the full dataset will be split into training, validation, and testing sets. The testing set will be all the images from one subject to mirror the "Spelling It Out" paper's method so the benchmark model can be compared. The rest of the images will be randomly split; 80% of images for training, 20% of the images for validation.

In [None]:
import re
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from sklearn.datasets import load_files

data_dir = 'data'

In [None]:
def get_testing_data(data_dir, subject_num='4'):
    '''Get all data/images pertaining to one subject'''
    # Only search in directory for images with that subject
    file_list = [x for x in os.listdir(data_dir) if re.search(f'\d+_{subject_num}_\d*.png', x)]
    
    # Make a new testing data directory if doesn't exist
    testing_dir = os.path.join(data_dir, 'testing')
    if not os.path.exists(testing_dir):
        os.makedirs(testing_dir)
        
    # Move images of particular subject into testing directory
    for image_filename in file_list:
        # file is **_n_****.png where n is an integer representing a subject
        _, subject, _ = image_filename.split('_')
        # Move file into testing directory
        path = os.path.join(data_dir, image_filename)
        new_path = os.path.join(testing_dir, image_filename)
        os.rename(path, new_path)
get_testing_data('data')

In [None]:
 def load_dataset(path):
    data = load_files(path)
    image_files = np.array(data['filenames'])
    return image_files

data = load_dataset(data_dir)

### Display some of the images

In [None]:
# Optional way of getting image filenames
# data = os.listdir(data_dir)

random.seed(8675309)
%matplotlib inline

# Display image previews below
plt.figure(figsize=(20,40))
columns = 16
n = 1

# Randomly choose images to display (with label)
for image_filename in random.sample(data,0):
    img = Image.open(os.path.join(data_dir, image_filename))
    plt.subplot(20, columns, n)
    n+=1
    plt.imshow(img)
    letter = image_filename[:2]
    letter = chr(int(letter)+65)
    plt.title(letter)

## Building the model

## Training the model

## Evaluating the model