# COGS 118B - Project Proposal

# Names

- Allen Phu
- Kevin
- Saksham
- Rodrigo Lizaran-Molina

# Abstract 

Being provided with a dataset of many characters in American Sign Language, we thought it would be interesting to utilize many of the clustering techniques taught in this course to generate clusters of the ASL characters and from there, have our models be optimized enough to be able to use our web camera and create our own signs, the model should be able to correctly assign it to a cluster and return the character. Our data is represented as a vector of pixel intensities that range from 0-255 (single dimensional as it is greyscale). With this, we would run a clustering algorithm on the greyscale images. Next, assuming the clusters are accurate, we would transform our webcam image with a student forming a sign, and input the greyscale image into our clustering algorithm where it would then identify the proper character. To measure performance, we would allocate a portion of the dataset for testing and based off the proportion of correct classifications vs incorrect classifications, we would return an accuracy percentage.

# Background

Our group originally came across Google’s “American Sign Language Fingerspelling Recognition” Kaggle competition <a name="Ashleynote"></a>[<sup>[1]</sup>](#Ashley) while brainstorming for ideas for our project. We were intrigued by this idea as we were all interested in ML image processing in the first place, but the combination of making advancements in accessibility and image processing only solidified this topic as something we wanted to pursue for our final project. 

After some further research, we realized that although we were intrigued by the idea of mixing ML and ASL together, only one of us had prior experience with the MediaPipe library. As many prior writeups regarding the utilization of ML utilized the aforementioned MediaPipe library <a name="ElMoujahid"></a>[<sup>[2]</sup>](#ElMoujahid) <a name="Garimella"></a>[<sup>[3]</sup>](#GarimellaNote), we decided to pivot towards the Sign Language MNIST dataset <a name="tecperson"></a>[<sup>[4]</sup>](#tecpersonNote) in order to make the project more digestible for ourselves. Previous studies have shown that using image recognition platforms in order to recognize ASL have already been successful, with an October 2023 study achieving 98.98% test accuracy <a name="Pathan"></a>[<sup>[5]</sup>](#PathanNote) and an August 2022 study utilizing MediaPipe, Keras, and the Sign Language MNIST dataset achieving 95% training accuracy <a name="Garimella"></a>[<sup>[3]</sup>](#GarimellaNote). 

Regarding advancements in what’s been done for machine learning and ASL detection, Google has launched Project Shuwa in the past in order to bring awareness and teach more people about ASL <a name="ElMoujahid"></a>[<sup>[2]</sup>](#ElMoujahid). One (of many components) of Project Shuwa is SignTown, an “interactive game that utilizes webcams and a web browser to help people learn about sign language and Deaf culture” <a name="ElMoujahid"></a>[<sup>[2]</sup>](#ElMoujahid). Google has also made it easier to learn about both ASL and machine learning through the utilization of their “Teachable Machine” tool, where people can use a no-code approach to leverage machine learning to test a model’s ability to recognize ASL samples <a name="Chen"></a>[<sup>[6]</sup>](#ChenNote).

# Problem Statement

The problem we are aiming to solve is the classification of ASL characters to provide an ability of communication from deaf people to those who do not understand ASL. With the 27,455 available training data samples, and 7172 samples allocated for testing, we can quantify the success rate by taking the proportion of correct classifications against incorrect classifications. Additionally, we plan to test our clusters by inputting test cases from making the signs on our webcams and observing if the results share a similar accuracy with the test data. This can be replicated as with the publicly available data, one can follow our methods of creating clusters for the data as well as our process in inputting the data from our webcamera.

# Data

- Our data will be composed of hand images. Each image will represent a letter of the American Sign Language.
- Sign_mnist_test and Sign_mnist_train
    - [Link](https://www.kaggle.com/datasets/datamunge/sign-language-mnist/data)
    - 1570 variables/columns (785 each one) with 27455 observations in training data and 7172 observations in test data.
    - Each of the 27455 observations represent an image and it is paired with a corresponding label on what hand sign the sample represents
    - The images are represented as 784 pixels and each pixel ranges from 0-255 and these images are represented as greyscale versions of themselves
    - There will not be any special handling/transformations for this data as we can immediately begin clustering with the numeric data values stored in each row


# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Why might your solution work? Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

# Evaluation Metrics

Propose at least one evaluation metric that can be used to quantify the performance of both the benchmark model and the solution model. The evaluation metric(s) you propose should be appropriate given the context of the data, the problem statement, and the intended solution. Describe how the evaluation metric(s) are derived and provide an example of their mathematical representations (if applicable). Complex evaluation metrics should be clearly defined and quantifiable (can be expressed in mathematical or logical terms).

# Ethics & Privacy

If your project has obvious potential concerns with ethics or data privacy discuss that here.  Almost every ML project put into production can have ethical implications if you use your imagination. Use your imagination. Get creative!

Even if you can't come up with an obvious ethical concern that should be addressed, you should know that a large number of ML projects that go into producation have unintended consequences and ethical problems once in production. How will your team address these issues?

Consider a tool to help you address the potential issues such as https://deon.drivendata.org

# Team Expectations 

As a team, we are committed to maintaining a high level of collaboration, professionalism, and respect among all members. Our shared expectations for one another are as follows:

* **Active and Respectful Communication**: We value open and active communication. Team members should feel comfortable expressing their thoughts, ideas, and concerns. We will listen actively and respectfully to each other.

* **Idea Exchange**: Everyone is welcome to contribute their thoughts and suggestions. We will consider each other's ideas with respect and appreciation.

* **Continuous Collaboration**: If any team member has ideas or encounters challenges, please share them directly through our Discord communication channel as soon as possible. We believe that open and timely communication is essential to address issues promptly and efficiently.

* **Timely Task Completion**: We understand the importance of meeting deadlines. Team members are expected to complete their assigned sections on time. If you foresee challenges in meeting a deadline, please communicate this at least two days in advance so that we can collectively find solutions.

* **Equal Work Distribution**: The workload will be distributed equitably. All team members are expected to contribute to the final project, ensuring that no one bears an undue burden.



**Rodrigo Lizaran-Molina**: Data, Proposed Solution

**Allen Phu**: Background

**Kevin Y**: Abstract, Data and Problem Statement

**Saksham Rai**: Evaluation Metrics, Ethics & Privacy

# Project Timeline Proposal

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 2/18  |  1 PM |  Brainstorm topics/questions (all)  | Determine best form of communication; Discuss and decide on final project topic; discuss hypothesis; begin background research | 
| 2/19  |  4 PM |  Do background research on topic | Discuss ideal dataset(s) and ethics; draft project proposal | 
| 2/20  |  4 PM | Edit, finalize, and submit proposal; Search for datasets  | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part   |
| 2/25  |  6 PM | Import & Wrangle Data ,do some EDA  | Review/Edit wrangling/EDA; Discuss Analysis Plan   |
| 3/1   | 12 PM | Finalize wrangling/EDA; Begin programming for project  | Discuss/edit project code; Complete project |
| 3/19  | 12 PM | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 3/20  | Before 11:59 PM  | NA | Turn in Final Project  |

# Footnotes

<a name="Ashleynote"></a>1.[^](#Ashley) Ashley Chow, Glenn Cameron, Manfred Georg, Mark Sherwood, Phil Culliton, Sam Sepah, Sohier Dane, Thad Starner. (2023). Google - American Sign Language Fingerspelling Recognition. Kaggle. https://kaggle.com/competitions/asl-fingerspelling<br> 
<a name="ElMoujahid"></a>2.[^](#ElMoujahid) El Moujahid, K. (2021, December 1). Machine learning to make sign language more accessible. Google. https://blog.google/outreach-initiatives/accessibility/ml-making-sign-language-more-accessible/<br> 
<a name="Garimella"></a>3.[^](#GarimellaNote) Garimella, M. (2022, August 23). Sign Language Recognition with Advanced Computer Vision. Medium. https://towardsdatascience.com/sign-language-recognition-with-advanced-computer-vision-7b74f20f3442<br> 
<a name="tecperson"></a>4.[^](#tecpersonNote) tecperson. (October 2017). Sign Language MNIST. Kaggle. https://www.kaggle.com/datasets/datamunge/sign-language-mnist<br> 
<a name="Pathan"></a>5.[^](#PathanNote) Pathan, R. K., Biswas, M., Yasmin, S., Khandaker, M. U., Salman, M., & Youssef, A. A. F. (2023). Sign language recognition using the fusion of image and hand landmarks through multi-headed convolutional neural network. Scientific Reports, 13(1), 16975. https://doi.org/10.1038/s41598-023-43852-x<br> 
<a name="Chen"></a>6.[^](#ChenNote) Chen, Y. (2023, December 29). Learning American Sign Language (ASL) with Google’s Teachable Machine: A No-Code Experiment. Medium. https://medium.com/@dynotes/breaking-barriers-using-googles-no-code-approach-for-sign-language-recognition-and-learning-fc92ae16522c#bypass