Skip to content
/ spobet Public

Pose-based word-level sign language recognition with BERT-styled transformer in Keras

License

Notifications You must be signed in to change notification settings

Kojk-AI/spobet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPOBET

Pose-based word-level sign language recognition with BERT-style transformer in Keras

About The Project

This repository implements, using Keras, a pose-based, word-level sign language recognition with BERT-style transformer.

  • Model is trained on WLASL 2D pose data, on the ASL100 split. https://github.com/dxli94/WLASL
  • Model is built with Keras layers; highly transferable and configurable
  • Comparable* accuracy levels achieved on the ASL100 split as compared to other pose-based word-level sign language recognition models

Further details on the implementation and results discussion can be found in https://medium.com/@kennethong.ai/spobet-d9d952836c48

(back to top)

Using the Repo

Getting Started

  1. Clone this repository
  2. Install the required packages using the requirements.txt.
  3. Download the dataset from the WLASL website. We just need the keypoints files and the split files.
  4. Place the keypoint folders in dataset/annotations and the split files in dataset.
  5. The model, dataset and training parameters are controlled by the config files found in the config folder

Training

In the root folder, run

python main.py --run train
  • Tensorboard logs will be saved in the logs directory.
  • Masked encoder weights will be saved as weights/pretrain.
  • Model weights will be saved as weights/spobet

Evaluation

In the root folder, run

python main.py --run evaluation
  • The accuracy scores for the Top 1, Top 5 and Top 10 will be printed at the end.

Inferencing

This repo does not include the implementation of OpenPose to retrieve the keypoints needed for inferencing. To do inferencing, you will need to:

  1. Retrieve keypoints usng OpenPose
  2. Format the results similar to those in WLASL
  3. Create a "split" file with the neccesary information. I.e. video_id (annotation folder must be of the same name), start_frame and end_frame (each annotation file is named according to the frame number), and the train/test split be equals to "test"
  4. In dataconfig.cfg, set SHOW_RES = 1. This will print out the inference results at the end of evaluation. The res is shown as a list of predicted labels, from the lowest probability to the highest. I.e the label with the highest probability is res[-1].
  5. Run evaluation as per normal

(back to top)

Trained Weights

Top 1 Top 5 Top 10
SPOBET (ASL100), BERT encoder 63.95% 87.98% 91.86%

(back to top)

License

The code is published under the Apache License 2.0.

The accompanying data of the WLASL dataset used for training and experiments, however, allow only non-commercial usage. This, therefore, extends the terms of non-commmerical usage to the uploaded model weights and its derivatives.

(back to top)

References

(back to top)