Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The problem of training and generation scripts of CMLM #16

Open
JasmineChen123 opened this issue Sep 28, 2023 · 3 comments
Open

The problem of training and generation scripts of CMLM #16

JasmineChen123 opened this issue Sep 28, 2023 · 3 comments

Comments

@JasmineChen123
Copy link

Hi, thank you for releasing the code!
I have a question about the given bash scripts of training and inference.

The training scripts of the CMLM+DSLP
python3 train.py data-bin/wmt14.en-de_kd --source-lang en --target-lang de --save-dir checkpoints --eval-tokenized-bleu \ --keep-interval-updates 5 --save-interval-updates 500 --validate-interval-updates 500 --maximize-best-checkpoint-metric \ --eval-bleu-remove-bpe --eval-bleu-print-samples --best-checkpoint-metric bleu --log-format simple --log-interval 100 \ --eval-bleu --eval-bleu-detok space --keep-last-epochs 5 --keep-best-checkpoints 5 --fixed-validation-seed 7 --ddp-backend=no_c10d \ --share-all-embeddings --decoder-learned-pos --encoder-learned-pos --optimizer adam --adam-betas "(0.9,0.98)" --lr 0.0005 \ --lr-scheduler inverse_sqrt --stop-min-lr 1e-09 --warmup-updates 10000 --warmup-init-lr 1e-07 --apply-bert-init --weight-decay 0.01 \ --fp16 --clip-norm 2.0 --max-update 300000 --task translation_lev --criterion nat_loss --arch glat_sd --noise full_mask \ --concat-yhat --concat-dropout 0.0 --label-smoothing 0.1 \ --activation-fn gelu --dropout 0.1 --max-tokens 8192 \ --length-loss-factor 0.1 --pred-length-offset

The "--arch glat_sd" is weird. Is it "cmlm_sd" or "cmlm_transformer"?

Another question is, could you please give us the generation scripts for CMLM (iter>1), when setting "--iter-decode-max-iter 5/10"? I find that the BLEU under iter=5/10 is much worse than that of iter=1.

@cecilialeo77
Copy link

Hello I have the same question, which of the three should be used in the training script of CMLM+DSLP, cmlm_sd, cmlm_sd_ss, cmlm_transformer? Is it clear to you now? Thanks!

@chenyangh
Copy link
Owner

@cecilialeo77 Hi, CMLM+DSLP should be cmlm_sd. cmlm_sd_ss is DSLP + Mixed Training

@chenyangh
Copy link
Owner

@JasmineChen123 Hi. Yes you are right, it should be cmlm_sd. My mistake on the script. Will fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants