- This repository implements a deep learning framework for continuous sign language recognition (CSLR). It integrates spatio-temporal feature extraction, cross-layer dynamic fusion, temporal convolution, bidirectional recurrent modeling, and sequence-level knowledge distillation.
- **Notice:**This is experimental code to demonstrate the overall structure and realism of the framework. Due to the protection of the training data, the trained model will be published after the paper is received.
-
This project is implemented in Pytorch (better >=1.13 to be compatible with ctcdecode or these may exist errors). Thus please install Pytorch first.
-
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode. (ctcdecode is only supported on the Linux platform.)
-
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
mkdir ./softwareln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/scliteYou may use the python version evaluation tool for convenience (by setting 'evaluate_tool' as 'python' in line 16 of ./configs/baseline.yaml), but sclite can provide more detailed statistics.
-
You can install other required modules by conducting
pip install -r requirements.txt
You can choose any one of following datasets to verify the effectiveness of CDF.
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014 -
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess.py --process-image --multiprocessing
-
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T -
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess-T.py --process-image --multiprocessing
If you get an error like on the PHOENIX2014-T dataset, you may refer to this issue to tackle the problem.IndexError: list index out of range
-
Request the CSL-Daily Dataset from this website [download link]
-
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL-Daily -
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing