##Recognizing ASL Fingerspelling Using Machine Learning
####Dataset Name: ASL Alphabet
####[Kaggle Data Soruce](https://www.kaggle.com/datasets/grassknoted/asl-alphabet)
####Goal: Develop a machine learning model to recognize ASL fingerspelling from images.

####Dataset Overview:
- American Sign Language alphabet dataset from Kaggle
- Contains 87,000 training images across 29 classes
- 26 classes represent letters A-Z and 3 additional classes

###Dataset Characteristics:
- Images show hand gestures against backgrounds
- Images 200x200 pixels
- Each image shows static hand position corresponding to a letter

###Class Selection:
- Despite having 29 classes in the full dataset, our analysis focuses on 18 letters

###The Objective
- To accurately classify ASL alphabet images into their respective letter classes

In [282]:
import glob
import os
import pandas as pd

In [284]:
dataset_path = "./asl_alphabet_data"

In [286]:
#grabs all the folder names in the dataset and sorts them
all_classes = sorted([os.path.basename(folder) for folder in glob.glob(os.path.join(dataset_path, "*"))])

def print_classes(title, class_list, rows, cols):
    #prints the title along with the number of items in the class list
    print(f"{title}: {len(class_list)}")
    #prints a header for the dataset classes
    print("\nDataset Classes:")
    index = 0 #keep track of the current position
    for row in range(rows):
        row_classes = [] #stores the items for the current row
        for col in range(cols):
            row_classes.append(class_list[index]) #adds the class name to the row list
            index += 1 #moves to the next class in the list
        print(" | ".join(row_classes)) #prints the row

#calls print_class function
print_classes("Total Dataset Classes", all_classes, 3, 9)

Total Dataset Classes: 27

Dataset Classes:
A | B | C | D | E | F | G | H | I
J | K | L | M | N | O | P | Q | R
S | T | U | V | W | X | Y | Z | nothing


---
###Selected Letters 

I'm picking a smaller set of ASL letters to make the dataset easier to work with and keep things running smoothly on my computer. Some letters, like "J" and "Z," need movement to sign, so they don’t work as well with static images.

In [289]:
#list of ASL letters we selected for this project
asl_keep = ["A", "B", "C", "D", "E", "F", "H", "I", "L", "O", "Q", "R", "S", "U", "V", "W", "X", "Y"]

#calls print_class function
print_classes("Selected ASL Letters", asl_keep, 2, 9)

Selected ASL Letters: 18

Dataset Classes:
A | B | C | D | E | F | H | I | L
O | Q | R | S | U | V | W | X | Y


In [291]:
#list to store image file paths for selected asl letters
asl_paths_train = []

for letter in asl_keep:
    #creates the full path to the letters folder
    letter_path = os.path.join(dataset_path, letter)
    #grabs all jpg images inside the letters folders
    letter_images = glob.glob(os.path.join(letter_path, "*.jpg"))
    #adds up to 500 images from this letter to the training list
    asl_paths_train.extend(letter_images[:500])

#extracts the letter label from each image
asl_labels_train = [os.path.basename(os.path.dirname(img)) for img in asl_paths_train]

#creates a df with the image paths and their labels
train_asl_df = pd.DataFrame({"image_path": asl_paths_train, "letter_label": asl_labels_train})

#prints total number of images added to the dataset
print(f"\nTotal images loaded: {len(train_asl_df)}")
#counts how many images belong to each letter
class_distribution = train_asl_df['letter_label'].value_counts()
#sort alphabetically instead of by count
class_distribution = class_distribution.sort_index()

#prints a breakdwon of how many images there are per class
print("\nImages per class:")
print("-" * 20)
for letter, count in class_distribution.items():
    print(f"Letter {letter}: {count:4d} images")
print("-" * 20)


Total images loaded: 9000

Images per class:
--------------------
Letter A:  500 images
Letter B:  500 images
Letter C:  500 images
Letter D:  500 images
Letter E:  500 images
Letter F:  500 images
Letter H:  500 images
Letter I:  500 images
Letter L:  500 images
Letter O:  500 images
Letter Q:  500 images
Letter R:  500 images
Letter S:  500 images
Letter U:  500 images
Letter V:  500 images
Letter W:  500 images
Letter X:  500 images
Letter Y:  500 images
--------------------
