A Simple Multi-modality Transfer Learning Baseline for Sign Language Translation (SingleStream-SLT Baseline)
Dataset | R | B1 | B2 | B3 | B4 | Model | Training |
---|---|---|---|---|---|---|---|
Phoenix-2014T | 53.08 | 54.48 | 41.93 | 33.97 | 28.57 | ckpt | config |
CSL-Daily | 53.35 | 53.53 | 40.68 | 31.04 | 24.09 | ckpt | config |
The general-domain pretraining is already done by loading the pretrained checkpoints, i.e. S3D and mBart. We then apply Sign2Gloss and Gloss2Text within-domain pretraining for the visual and language module respectively.
For Sign2Gloss pretraining, run
dataset=phoenix-2014t #phoenix-2014t / csl-daily
python -m torch.distributed.launch -nproc_per_node 8 --use_env training.py --config experiments/configs/SingleStream/${dataset}_s2g.yaml
For Gloss2Text pretraining, run
python -m torch.distributed.launch -nproc_per_node 8 --use_env training.py --config experiments/configs/SingleStream/${dataset}_g2t.yaml
First, to extract features output by the pretrained S3D, run
python -m torch.distributed.launch --nproc_per_node 8 --use_env extract_feature.py --config experiments/configs/SingleStream/${dataset}_s2g.yaml
We provide our pre-extracted features for Phoenix-2014T and CSL-Daily.
For multi-modal joint training, run
python -m torch.distributed.launch --nproc_per_node 1 --use_env training.py --config experiments/configs/SingleStream/${dataset}_s2t.yaml
To evaluate Sign2Text performance, run
python -m torch.distributed.launch --nproc_per_node 1 --use_env prediction.py --config experiments/configs/SingleStream/${dataset}_s2t.yaml
We provide checkpoints trained by each stage here.