Pipelines for Training and Evaluation

Data-to-text (DART)

Folder: /dart/
Datasets: dart (Google drive link)
Teacher model
- Init: bart-large
- Fine-tune: scripts/run_finetune_teacher.sh (Google drive link)
Student init model:
- Pre-train init student model: scripts/run_pretrain_student_distill.sh (Google drive link)
Generate pseudo-target with Teacher model (useful for Seqkd, JS, TVD)
- scripts/run_teacher_label_all.sh (Google drive link), replace train.json
Run KD methods (/scripts/)
- Seqkd: run_seqkd.sh
- ENGINE: run_engine.sh
- RKL: run_rkl.sh
- KL: run_kl_sample.sh
- JS: run_js.sh
- TVD: run_tvd_symm.sh
Decode (/scripts/) (Google drive link)
- run_eval.sh
Eval
- cd /evaluation/ (Google drive link)
- sh ./run_eval_on_dart.sh (need to modify the $OUTPUT_FILE and download bert-base-uncased model)
Calculate coverage loss (PPL of teacher)
- run python3 dart/run_calc_ppl.py --reference_path [teacher_output_path] --input_path [input_path] --model_name [student_model_path] --save_path /tmp/
Calculate likelihood loss (PPL of student)
- run python3 dart/run_calc_ppl.py --model_name [teacher_model_path] --input_path [input_path] --reference_path [student_output_path] --save_path /tmp/

Summarization

Folder: /summa/
Dataset: xsum (wget https://cdn-datasets.huggingface.co/summarization/xsum.tar.gz)
Teacher model: https://huggingface.co/facebook/bart-large-xsum
Student init model:
- Pre-train init student model: run_pretrain_student_distill.sh (Google drive link)
Generate pseudo-target with Teacher model (useful for Seqkd, JS, TVD)
- run_teacher_label.sh (Google drive link), replace train.target
Run KD methods (/scripts/)
- Seqkd: run_seqkd.sh
- ENGINE: run_engine.sh
- RKL: run_rkl.sh
- KL: run_kl.sh
- JS: run_js.sh
- TVD: run_tvd_symm.sh
Decode and Evaluate
- run eval.sh
Calculate coverage loss (PPL of teacher)
- run python3 summa/run_calc_ppl.py --reference_path [teacher_output_path] --input_path [input_path] --model_name [student_model_path] --save_path /tmp/
Calculate likelihood loss (PPL of student)
- run python3 summa/run_calc_ppl.py --model_name [teacher_model_path] --input_path [input_path] --reference_path [student_output_path] --save_path /tmp/

Machine Translation (WMT16 EN-RO, 100k training data)

Folder: /t5mt/
Dataset: wmt_en_ro_100k (Google drive link)
Teacher model
- Init: t5-base
- Fine-tune: scripts_sm/run_finetune_teacher.sh
Student init model:
- Pre-train init student model: scripts_sm/run_pretrain_student_distill.sh (Google drive link)
Generate pseudo-target with Teacher model (useful for Seqkd, JS, TVD)
- scripts_sm/run_teacher_label.sh (Google drive link), replace train.target
Run KD methods (/scripts/)
- Seqkd: run_seqkd.sh
- ENGINE: run_engine.sh
- RKL: run_rkl.sh
- KL: run_kl.sh
- JS: run_js.sh
- TVD: run_tvd_symm.sh
Decode and eval
- sh scripts_sm/run_eval.sh
Calculate coverage loss (PPL of teacher)
- run python3 t5mt/run_calc_ppl.py --reference_path [teacher_output_path] --input_path [input_path] --model_name [student_model_path] --save_path /tmp/
Calculate likelihood loss (PPL of student)
- run python3 t5mt/run_calc_ppl.py --model_name [teacher_model_path] --input_path [input_path] --reference_path [student_output_path] --save_path /tmp/

Chat

Folder: /chat/
Dataset: Commonsense-Dialogues (Google drive link)
Teacher model
- Init: microsoft/DialoGPT-medium
- Fine-tune: scripts/run_finetune_teacher.sh (Google drive link)
Student init model:
- Pre-train init student model: scripts/run_pretrain_student_distill.sh (Google drive link)
Generate pseudo-target with Teacher model (useful for Seqkd, JS, TVD) (Google drive link), replace train.target
- scripts/run_teacher_label.sh
Run KD methods (/scripts/)
- Seqkd: run_seqkd.sh
- ENGINE: run_engine.sh
- RKL: run_rkl.sh
- KL: run_kl.sh
- JS: run_js.sh
- TVD: run_tvd_symm.sh
Decode and eval
- sh scripts/run_eval.sh (need to download bert-base-uncased model)
Calculate coverage loss (PPL of teacher)
- run python3 chat/run_calc_ppl.py --reference_path [teacher_output_path] --input_path [input_path] --model_name [student_model_path] --save_path /tmp/
Calculate likelihood loss (PPL of student)
- run python3 chat/run_calc_ppl.py --model_name [teacher_model_path] --input_path [input_path] --reference_path [student_output_path] --save_path /tmp/

Acknowledgements

The methods in our codebase are mainly implemented with PyTorch and Huggingface's Transformers libararies
The pre-distillation part is based on the method proposed in Shleifer & Rush (2020) and their implementation
We use Maluuba's nlg-eval to measure the BLEU score for the dialogue task.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
chat		chat
dart		dart
summa		summa
t5mt		t5mt
.gitignore		.gitignore
README.md		README.md
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chat

chat

dart

dart

summa

summa

t5mt

t5mt

.gitignore

.gitignore

README.md

README.md

requirement.txt

requirement.txt

Repository files navigation

Pipelines for Training and Evaluation

Data-to-text (DART)

Summarization

Machine Translation (WMT16 EN-RO, 100k training data)

Chat

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

MANGA-UOFA/fdistill

Folders and files

Latest commit

History

Repository files navigation

Pipelines for Training and Evaluation

Data-to-text (DART)

Summarization

Machine Translation (WMT16 EN-RO, 100k training data)

Chat

Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages