Skip to content
Ankur Gupta edited this page Sep 3, 2017 · 7 revisions

Iteration 6

asl-a

Table of Contents

  1. Team Info
  2. Vision Statement
  3. Features
  4. UI Sketches
  5. Domain Analysis
  6. Use-cases
  7. Architecture
  8. Machine Learning Analysis
  9. Current Iteration
  10. Next Iteration
  11. Long Term
  12. Resources and References

1. Team info

  • Trevor Bonjour
  • Srinivas Suresh Kumar
  • Ankur Gupta
  • Iskandar Atakhodjaev

2. Vision Statement

Create a software package that provides services for people with hearing and speech impairment. The core service will be the conversion of American Sign Language(ASL), read using a Leap Motion, to text/audio. Applying machine learning classification algorithms to data extrapolated from a 3D Leap Motion Sensor, we plan to create a ASL recognition system with an instant text and audio output. For the purpose of the project we wish to incorporate 26 english letters. On top of the core service, the package may include:

  • Assistance in learning ASL using games/quizzes/cue-cards.
  • Support for simple words and phrases
  • Capability to add and train on user defined gestures
  • Additional training service with feedback stating how far/close the user is from the expected hand gesture

3.Features

Website

  • User can login/sign-up
  • User can download the software as a binary package

Binary package

  • Upon running the software, user shall see the welcome page with username and password text fields
  • User can login
  • User can edit his/her account settings
  • User can delete his/her profile
  • User can sign out the software program
  • User can see the schematic picture of his gesture on the screen
  • User can change his computer and his settings will be preserved
  • Computations are done on the user's machine (locally) for fast prediction turn around time
  • User should calibrate the sensor once to adjust it to user's hand size
  • Prediction Models can be updated over the internet for better experience (OTA)
  • The software can be used offline, but needs internet access to train new gestures and receive updated models
  • Live textual and audio responses are provided for all predictions to a gesture performed by the user
  • Software allows ASL experts to make additions to the global ASL model
  • (Extended) User-defined gestures
  • (Extended) Tutorial/Interactive Quiz for people to quickly learn ASL
  • (Extended) Prediction model can be trained to give more complex output, e.g. basic phrases, words
  • (Extended) Physical ASL training provided using Leap Motion

4. UI Sketches

###Website : Login, Signup, Download

User


Home
home
signup-user
signup-user
user-expert-login

Expert


expert_signup
expert_signup

Shared User-Expert UI


Expert-User Logged-in Screen
user-expert-login
thankyou
thankyou

###Native Binary UI:

User


User Login Window
user-login-binary
Leap NOT Connected
home
Spacebar to Start Recording
home


gesture-mode-binary
[gesture-mode-binary](../../(wikiresources/snapshots/bin-gesture.PNG)
profile-binary
profile-binary

Expert


expert_login_binary
expert_login_binary
expert-training-binary
expert-training-binary

5.Domain Analysis


UML for UI

UML for Native Backend

UML for Server Backend

6. Use-cases

User

SignUp(Website)
  1. User accesses the website
  2. User is presented with an introduction, a signup button and a login button
  3. User clicks on the Signup button on the website
  4. Wesbsite Signup Form is displayed, User fills out
    1. Username
    2. Password
    3. EmailID
    4. Display Name
  5. User hits Submit
  6. Verification Email is sent to the user's email
  7. User clicks on the verify link in his inbox
  8. User is served a 'Email Verified' webpage.
  9. User is now served a link to 'asla-user-application-binary' for downloading
  10. User is redirected to a Thank You for Downloading page
Logging in(Native Binary)
  1. User runs the dowloaded asla-user-application-binary
  2. User fills in the username and password he chose while signing up
  3. Clicks on LogIn Button
  4. The application checks if user's email was verified
    1. If the email is verified, user can now use the application
    2. The Global Model is updated from the server
    3. User Custom Model is synced
    4. User's info is downloaded and synced
  5. User is taken to Gesture Mode by default
Update User Profile (Native binary)
  1. User Logs In
  2. User clicks on the 'Profile'
  3. Application displays Profile Form
  4. User's profile is fetched from the server and pre-populated in the form
  5. User can Edit profile data, except username
  6. User clicks on 'Apply'
  7. Profile data is sent to the Server
Mode Selection(Native Binary)
  1. User logs in
  2. User selects one of the following modes:
    1. Gesture Mode (Default)
    2. Quiz Mode (Extended)
    3. Custom Gesture Mode (Extended)
    4. Learn-ASL Mode (Extended)
Gesture Mode(Native Binary)
  1. User logs in
  2. User is asked to connect the Leap Motion sensor if not connected
  3. If logged in for the first time, calibration of sensor is initiated
  4. User's hand skeleton graphics is displayed on the screen
  5. User hits spacebar to start the recognition process
  6. User makes the gesture in the sensor's view
    1. If recognized:
      1. Label is displayed on the screen
      2. Audio output
    2. If not recognized:
      1. User is asked to make the gesture again
  7. User removes the hand from the sensor's view
  8. User proceeds to make remaining gestures
  9. User hits spacebar to stop the recognition process
Calibration (Native Binary)
  1. User places the hand in the sensor's view and follows the directions
  2. If calibration is not successful, user needs to repeat the process
  3. If calibration is successful, user proceeds to recognition process
Sync with Server(Native Binary)
  1. User Logs in
  2. Application checks the latest Global Model from Server
  3. If Global Model has changed, update the local copy of Global Model
  4. User selects mode
Re-installing the Binary
  1. User logs in to the website to download the binary
  2. User installs the binary
  3. User logs in to his account from the native application.
  4. All data is synced from the server

Expert

SignUp (Website)
  1. Expert keys in his authentication token(sent via email), which prompts a download for the expert binary
Logging in (Native Binary)
  1. Expert keys in his authentication key to log in
Model Training
  1. Expert logs in
  2. Expert is asked to connect the Leap Motion sensor if not connected
  3. If logged in for the first time, calibration of sensor is initiated
  4. Expert hits spacebar to start training
  5. Expert chooses label to train on, training process consists of:
    1. Expert makes a gesture in the sensor's view
    2. Expert holds the gesture for pre-defined time
    3. Expert removes the hand from sensor's view
    4. Expert repeats steps i-iii for pre-defined number of times
  6. Expert can proceed to add another gesture
  7. Expert hits spacebar to stop training
  8. This data containing label, gesture data and authentication token is sent to the server
  9. Server authenticates expert token
    1. If the authentication token is invalid:
      1. Data packet is rejected
      2. Notification is sent to expert
    2. If the authentication token is valid:
      1. Server adds the data from expert to global data
      2. Success notification is sent to expert
    3. Server updates the Global Model
Calibration (Native Binary)
  1. Expert places the hand in the sensor's view and follows the directions
  2. If calibration is not successful, expert needs to repeat the process
  3. If calibration is successful, expert proceeds to training process

7. Architecture

Overview

Client-Server Communication

Our project predominantly uses HTTP to facilitate communication between the client binary and the server. The potential uses cases include updating the global models and training to use the custom gesture model. These requests are HTTP posts which include a JSON payload of authentication tokens and either raw Leap Motion data or trained model eg:
{
user: "Batman"
userid: "80085"
model: [a1, a2, a3]
}

User - Binary Interaction

The user interacts with a UI generated by QT and powered by Python. The binary includes Leap Motion libraries for Python and Javascript that will facilitate communication with the Leap Motion device. The Javascript Leap Motion libraries in conjunction with Three.js allow a visualizer that renders a live skeletal view of the user's ongoing interaction with Leap Motion. The skeletal structure of the hand is rendered by performing calculations with the orientation of the hand and subjecting it to the Phong model of shading. The Python portion of the binary is responsible for i. handling requesting JSON data via HTTP outlined in the paragraph above ii. interfaces with the Leap Motion to capture data for prediction of gestures iii. predicting labels from the model for gestures performed by processing live Leap Motion data using numpy and scipy

Server Model Training and Data processing

The server persists the raw data that was received into a database, MongoDB. This data is then parsed by the Machine learning toolkit (scikit-learn) with the help of data processing libraries like numpy and scipy and a model is created. This model is then sent out to users so that they have access to the latest gestures and persisted in the database. This is all facilitated via the communication protocols outlined in paragraph 1.

Website

All website communication happens via the methods and tactics outlined in paragraph 1

8. Machine Learning Analysis:

8.1 Data Collection

The data for training was collected from all of the team members. Each memeber used the expert module to collect the data.

Data collection steps:

  1. Chose an alphabet
  2. Make the respective sign in front of the leap motion

    2.1 If satisfied by the visualizer's feedback, continue holding the sign for 5 seconds
    2.2 If not, remove hand from view of the leap. This flushed the data collected for this iteration. 3. Repeat the process for a fixed number of iterations

In total 2080 data points were collected, with each alphabet having 80 rows.

8.2 Machine Learning Algorithms

The following Machine Learning algorithms were used:

  • k-Nearest Neighbor
  • Linear SVM
  • Decision Trees
  • Random Forest

We tried several other models, but the performance were not at par with these.

8.3 Model Selection

8.3.1 Leave One Group Out Cross-Validation

For the purpose of selecting the model, we went with Leave One Group Out cross-validation accuracy as the drivig factor. The groups here corresponded to each team members data. The model was trained on data from three members, and tested on the fourth members data. This was repeated for all four members data as test data and the average cross validation accuracy was calculated.

8.3.2 Parameter Tuning

Parameter tuning was performed using grid search cross validation over a range of parameter vaules. The parameter values that gave the least cross-validation error were then chosen for model selection.

  • k-Nearest Neighbor

* Random Forest
* Linear SVM
* Decision Trees

8.3.3 Model Comparison

The best accuracy(~84%) we got was for a Random Forest model, however we went with the Linear SVM classifier, mainly due to its stability and the time taken for training as compared to Random Forest. The bar plot below shows the comparison based on the cv-accuracy.
<img src="https://github.com/ankurgupta7/asla-pub/blob/master/ml_analysis/plot/class_comp.png", width=800>

The final model used for prediction was trained on all 2080 data points.

8.3 Alphabet Prediction

The following table gives the f1 score for each letter

Letter F1 Score Letter F1 Score
A 0.91 N 0.38
B 0.96 O 0.34
C 0.77 P 0.94
D 0.93 Q 1.00
E 0.48 R 0.76
F 1.00 S 0.64
G 0.99 T 0.36
H 0.99 U 0.60
I 0.99 V 0.86
J 0.94 W 0.99
K 0.89 X 0.92
L 0.99 Y 1.00
M 0.20 Z 0.95

It can be seen that, while the scores are high for most of the signs that are easily distinguished, signs that are similar to each other have a lower score. This is a limitation of the Leap Motion controller. It's very hard for the leap to distinguish between similar signs like 'M', 'N' and 'T'.

Confusion matrix for specific letters with low accuracy using Linear SVM

Letter Classified as
E E(0.60), O(0.30)
M M(0.15), N(0.50), S(0.25), O(0.10)
N M(0.20), N(0.65), T(0.10), O(.05)
O E(0.40),N(0.10),O(0.50)
T M(0.10),N(0.50),T(0.40)
U R(0.30),U(0.70)

Signs with low scores

Group 1 Group 2 Group 3
<img src="https://github.com/ankurgupta7/asla-pub/blob/master/wikiresources/signs/group1.png", width=170> <img src="https://github.com/ankurgupta7/asla-pub/blob/master/wikiresources/signs/group2.png", width=100> <img src="https://github.com/ankurgupta7/asla-pub/blob/master/wikiresources/signs/group3.png", width=100>

All the code related to the analysis is placed here.

9. Current Iteration

  • Fixed the thread-related bugs to show correct messages in QT windows
  • The model files with time stamps are fetched and updated correctly
  • Full ML analysis was collected and presented. The accuracy of various classification algorithms were calculated with respective cross-validation curves. (results below)
  • Use cases of app were tested and confirmed from both standpoints, users as well as experts

10. Next Iteration

  • Implement the 'User defined signs' feature
  • Collect more data from 'ASL experts'
  • Secure storing of the model file, currently pickle(prone to security concerns)
  • More stringent testing and fixing of minor bugs

11. Long Term

  • Make the application cross-platform, currently only runs on Linux
  • Implement for words/phrases
  • Incorporate the use of a web camera along with the leap motion controller

12. Resources and references