This is the code implementation of the paper Transcribing Natural Languages for the Deaf via Neural Editing Programs using pytorch based on the transformer library.
create conda environment and install dependence package
conda create -n editatt python=3.8
conda activate editatt
pip3 install torch torchvision torchaudio
pip install transformers
pip install tensorboardX
pip install tokenizers
pip install nltk
pip install rouge
download CSL dataset in ./
(under the current project directory) from huggingface.
The dataset file directory structure is as follows:
CSL_data
|-- CSL-Daily.txt
|-- CSL-Daily_editing_chinese.txt
|-- CSL-Daily_editing_chinese_past.txt
|-- CSL-Daily_editing_chinese_test.txt
train with the editing casual mask and the Executor:
python train.py --share_target_embeddings --use_pre_trained_embedding
train without the editing casual mask and the Executor:
python train_wo_edit_casual_mask.py --share_target_embeddings --use_pre_trained_embedding
download model checkpoints trained on CSL dataset in ./
(under the current project directory) from huggingface
The checkpoints file directory structure is as follows:
|--output_wo_mask
|-- models
|-- best_model.pt
|-- global_step.pt
|-- last_model.pt
|--output
|-- models
|-- best_model.pt
|-- global_step.pt
|-- last_model.pt
then you can use the model checkpoints to inference any input sentence:
python inference.py --input=<the input sentence you want to inference> --max_output_len=<the max output length of predicted editing program>
python inference_wo_edit_casual_mask.py --input=<the input sentence you want to inference> --max_output_len=<the max output length of predicted editing program>