# COGS 118A - Final Project

# Insert title here

## Group members

- Rahul Ravi
- Rachel Doron
- Yohan Kim
- Brian Ripley
- Tianze Zhang

# Abstract 

Our goal is to classify hand signs from American Sign Language (ASL) alphabet. We will be comparing machine learning model accuracy in the classification. This has potential application to those with a hearing disablity. We will be using the Sign Language MNIST dataset consisting of 34000 images which are 28x28 pixels. K-NN, Decision Tree, SVM and CNN will be used and compared using prexisting library implementations. We plan to divide our dataset into 27000+ training cases and about 7200 test cases to achieve our desired results.

# Background

American Sign Language is expressed with hand signs. It is the primary language in the United States for people who are hard of hearing or deaf. In the United States, there are around 10 million people who are hard of hearing and around 1 million people who are functionally deaf1. Creating a machine learning algorithm that can recognize ASL hand signs can help these people. Some prior work that has occurred with American Sign Language recognition includes an implementation of an ASL translator on a web application based on a convolutional neural network classifier2. Due to a lack of variation in their dataset they were unable to reproduce the validation accuracies they observed during training when they were testing. They hypothesize that with a more robust dataset, their models would be able to generalize more accurately. Creating a machine learning algorithm that can accurately recognize American Sign Language signs can allow people who use ASL to communicate with people who do not know sign language. This will lead them to be able to communicate with a wider range of people that they were not able to do so previously.

# Problem Statement

We want to solve the problem of classifying ASL hand signals with an optimal accuracy of 85% or higher. To achieve this we are going to train our model on the 24 different characters of the ASL alphabet that look like the following:

In [4]:
# import image module
from IPython.display import Image
  
# get the image
Image(url="images/american_sign_language.png", width=600, height=600)

# Data

We are going to be using Sign Language MNIST dataset, located at https://www.kaggle.com/datasets/datamunge/sign-language-mnist. This dataset comprises of 34000 oberservations of various hand signs of the ASL that look this:


In [5]:
Image(url="images/asl_color.png", width=600, height=600)

The data distributor has preprocessed the data, yielding in 784 gray-scaled pixels to analyze. Grayscale images may look like the following:

In [7]:
Image(url="images/asl_gray.png", width=600, height=600)

# Proposed Solution

Although we have not experimented with the data yet, we may need to resize the images and convert them to grayscale before using them in our models. We will also normalize the input data by dividing the pixel values by 255. One-hot encoding will be used for the class labels.

We will be implementing several supervised machine learning models to weigh the pros and cons of using such models for image classification. To begin, we will be using the scikit learn implementation of the following: K-Nearest Neighbors (KNN), Decision Tree and Support Vector Machine (SVM). Using sklearn GridSearch, we will optimize the parameters for these models. We will also build a Convolutional Neural Network (CNN) using PyTorch and TensorFLow. We may remove certain models and add new ones as we progress throughout this project, as we will be comparing the accuracies along the way. Other possibilities include RandomForest and Naive Bayes from sklearn.

Each of these models have been previously used for image classification tasks. Although fairly rudimentary, KNN may prove to have decent accuracy by taking a majority vote of the k nearest neighbors according to pixel values in a high dimensional space. A decision tree may prove useful by looking at the edges in the grayscale image at each node to determine which hand signal is present. SVM constructs a set of hyperplanes in a high dimensional space to separate the classes. Due to the complexity of image classification, we expect deep learning to result in the highest accuracy. We will experiment with the number of Conv2D and Dense layers and use Dropout for regualarization.

# Evaluation Metrics

For the evaluation metrics, we plan to use the multi-classes error measures. Instead of having positive/negative as our label values, we plan to have each sign alphabet (all 24 of them), then use suitable variables to calculate specificity and recall for each alphabet. Using this method, we will be able to achieve various insights such as how similar alphabet sign letters are affecting its accuracy/precision of the model, and how different model complexities affect our accuracy/precision of the model.

# Results

You may have done tons of work on this. Not all of it belongs here. 

Reports should have a __narrative__. Once you've looked through all your results over the quarter, decide on one main point and 2-4 secondary points you want us to understand. Include the detailed code and analysis results of those points only; you should spend more time/code/plots on your main point than the others.

If you went down any blind alleys that you later decided to not pursue, please don't abuse the TAs time by throwing in 81 lines of code and 4 plots related to something you actually abandoned.  Consider deleting things that are not important to your narrative.  If its slightly relevant to the narrative or you just want us to know you tried something, you could keep it in by summarizing the result in this report in a sentence or two, moving the actual analysis to another file in your repo, and providing us a link to that file.

### Subsection 1

You will likely have different subsections as you go through your report. For instance you might start with an analysis of the dataset/problem and from there you might be able to draw out the kinds of algorithms that are / aren't appropriate to tackle the solution.  Or something else completely if this isn't the way your project works.

### Subsection 2

Another likely section is if you are doing any feature selection through cross-validation or hand-design/validation of features/transformations of the data

### Subsection 3

Probably you need to describe the base model and demonstrate its performance.  Maybe you include a learning curve to show whether you have enough data to do train/validate/test split or have to go to k-folds or LOOCV or ???

### Subsection 4

Perhaps some exploration of the model selection (hyper-parameters) or algorithm selection task. Validation curves, plots showing the variability of perfromance across folds of the cross-validation, etc. If you're doing one, the outcome of the null hypothesis test or parsimony principle check to show how you are selecting the best model.

### Subsection 5 

Maybe you do model selection again, but using a different kind of metric than before?



# Discussion

### Interpreting the result

OK, you've given us quite a bit of tech informaiton above, now its time to tell us what to pay attention to in all that.  Think clearly about your results, decide on one main point and 2-4 secondary points you want us to understand. Highlight HOW your results support those points.  You probably want 2-5 sentences per point.

### Limitations

Are there any problems with the work?  For instance would more data change the nature of the problem? Would it be good to explore more hyperparams than you had time for?   

### Ethics & Privacy

As all scientific projects do have little if not, some ethics and privacy concerns for their project data collection / analysis, our project will most likely to have some ethics / privacy concerns too. In regards to the "deon.drivendata.org"'s checklist, one major ethics concern is Analysis section. For data collection and data storage, since the dataset comes from another researcher's project, our data reachability relies on the original project's mind.

Once we use dataset and analyze according to our standards however, we can violate some analysis checklists in drivendata's checklist. For example, we most likely not to read through all individual data and check biases such as stereotype perpetuation, and imbalanced classes since the dataset contains more than 30,000 individual data in it. In regards to this issue, we plan mitigate this issue by randomly pick couple of data from dataset then manually see the data and its values of it to remove possible bias in dataset.

### Conclusion

Reiterate your main point and in just a few sentences tell us how your results support it. Mention how this work would fit in the background/context of other work in this field if you can. Suggest directions for future work if you want to.

# Footnotes
<a name="deafStats"></a>1.(#deafStats): Mitchell R. E. (2006). How many deaf people are there in the United States? Estimates from the Survey of Income and Program Participation. Journal of deaf studies and deaf education, 11(1), 112–119. https://pubmed.ncbi.nlm.nih.gov/16177267/<br>
<a name="ASLpaper"></a>2.(#ASL): Garcia, B., & Viesca, S. A. (2016). Real-time American sign language recognition with convolutional neural networks. Convolutional Neural Networks for Visual Recognition, 2, 225-232. http://cs231n.stanford.edu/reports/2016/pdfs/214_Report.pdf<br> 