Skip to content

Enminxo/3D-Jointsformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer

This repository hosts our PyTorch implementation of 3D-Jointsformer, a novel approach for real-time hand gesture recognition in video sequences. Traditional methods struggle with managing temporal dependencies while maintaining real-time performance. To address this, we propose a hybrid approach combining 3D-CNNs and Transformers. Our method utilizes a 3D-CNN to compute high-level semantic skeleton embeddings, capturing local spatial and temporal characteristics. A Transformer network with self-attention then efficiently captures long-range temporal dependencies. Evaluation of the Briareo and Multimodal Hand Gesture datasets yielded accuracy scores of 95.49% and 97.25%. Importantly, our approach achieves real-time performance on standard CPUs, distinguishing it from GPU-dependent methods. The hybrid 3D-CNN and Transformer approach outperforms existing methods in both accuracy and speed, effectively addressing real-time recognition challenges.

Installation

conda create -n 3DJointsformer python=3.9 -y
conda activate 3DJointsformer
conda install pytorch=1.11.0 torchvision=0.12.0 cudatoolkit=11.3 -c pytorch -y
pip install 'mmcv-full==1.5.0' -f https://download.openmmlab.com/mmcv/dist/cu113/torch1.11.0/index.html
pip install mmaction2  # tested mmaction2 v0.24.0

Data Preparation

In this work we have tested the proposed model on two datasets : the Briareo and Multi-Modal Hand Gesture Dataset . The hand keypoints are obtained by Mediapipe, we have also included code to generate these hand keypoints ( see data_preprocessing ).

Train

You can use the following command to train a model.

./tools/run.sh ${CONFIG_FILE} ${GPU_IDS} ${SEED}

Example: train the model on the joint data of Briareo dataset using 2 GPUs with seed 0.

./tools/run.sh configs/transformer/jointsformer3d_briareo.py 0,1 0

Test

You can use the following command to test a model.

python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]

Example: inference on the joint data of Briareo dataset.

python tools/test.py configs/transformer/jointsformer3d_briareo.py \
    work_dirs/jointsformer3d/best_top1_acc_epoch_475.pth \
    --eval top_k_accuracy --cfg-options "gpu_ids=[0]"

Bibtex

If this project is useful for you, please consider citing our paper.

@Article{s23167066,
AUTHOR = {Zhong, Enmin and del-Blanco, Carlos R. and Berjón, Daniel and Jaureguizar, Fernando and García, Narciso},
TITLE = {Real-Time Monocular Skeleton-Based Hand Gesture Recognition Using 3D-Jointsformer},
JOURNAL = {Sensors},
VOLUME = {23},
YEAR = {2023},
NUMBER = {16},
ARTICLE-NUMBER = {7066},
URL = {https://www.mdpi.com/1424-8220/23/16/7066},
PubMedID = {37631602},
ISSN = {1424-8220},
DOI = {10.3390/s23167066}
}

Acknowledgements

Our code is based on SkelAct , MMAction2 , SlowFast Sincere thanks to their wonderful works.

License

This project is released under the Apache 2.0 license.

About

Official repository of 3D-Jointsformer for real-time handgesture recognition 👌

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published