Skip to content

This is the official code repository for the paper 'Cross-modality Data Augmentation for End-to-End Sign Language Translation'. Accepted at Findings EMNLP 2023

Atrewin/SignXmDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cross-modality Data Augmentation for Sign Language Translation

Implementation of our paper XmDA: Cross-modality Data Augmentation for End-to-End Sign Language Translation.

Brief Introduction

we propose a novel Cross-modality Data Augmentation (XmDA) approach to improve the end-to-end SLT performance. The main idea of XmDA is to leverage the powerful gloss-to-text translation capabilities (unimode, i.e., text-to-text) to end-to-end sign language translation (cross mode, i.e., video-to-text). Specifically, XmDA integrates two techniques, namely Cross-modality Mix-up and Cross-modality Knowledge Distillation (KD). The Cross-modality Mix-up technique combines sign language video features with gloss embeddings extracted from the gloss-to-text teacher model to generate mixed-modal augmented samples. Concurrently, the Cross-modality KD utilizes diversified spoken language texts generated by the powerful gloss-to-text teacher models to soften the target labels, thereby further diversifying and enhancing the augmented samples.

Figure 1: The overall framework of cross-modality data augmentation methods for SLT in this work. Components in gray indicate frozen parameters.

Reference Performance

We conduct evaluations for the proposed XmDA approach on End-to-end SLT performance and analysis the sign representation distributions on PHOENIX-2014T dataset.

End-to-end SLT performance on PHOENIX-2014T dataset

The topological structure of input embeddings

Implementation Guidelines

  • This code is based on Sign Language Transformers but modified to realize Cross-modality KD and Cross-modality mix-up.
  • For baseline end-to-end SLT, you can use the Sign Language Transformers.
  • For gloss-to-text tearchers model, you can follow the PGen or use the original text-to-text Joey NMT framework.
  • For put them to one, we expend Sign Language Transformers framework with Joey NMT and allow the new one can forward gloss-to-text and mix-to-text (i.e., forward_type in [sign, gloss, mixup]).

Requirements

  • Create environment follow Sign Language Transformers.
  • Reproduce PGen to obtain multi-references as sentence-level guidance from gloss-to-text teachers (or using forward_type = gloss).
  • Reproduce SMKD to pre-process the sign video.
  • Pre-process dataset and put them into ./data/DATA-NAME/ (ref the format to https://github.com/neccam/slt)

Usage

python -m signjoey train_XmDA configs/Sign_XmDA.yaml

! Note that the default data directory is ./data. If you download them to somewhere else, you need to update the data_path parameters in your config file.

ToDo:

  • Initial code release.
  • Release Pre-process dataset.
  • Share extensive qualitative and quantitative results & config files to generate them.

Reference

Please cite the paper below if you use this code in your research:

@article{ye2023cross,
  title={Cross-modality Data Augmentation for End-to-End Sign Language Translation},
  author={Ye, Jinhui and Jiao, Wenxiang and Wang, Xing and Tu, Zhaopeng and Xiong, Hui},
  journal={arXiv preprint arXiv:2305.11096},
  year={2023}
}

About

This is the official code repository for the paper 'Cross-modality Data Augmentation for End-to-End Sign Language Translation'. Accepted at Findings EMNLP 2023

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published