Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Params for Training en-indic model #23

Closed
TarunTater opened this issue Aug 24, 2021 · 4 comments
Closed

Params for Training en-indic model #23

TarunTater opened this issue Aug 24, 2021 · 4 comments

Comments

@TarunTater
Copy link

TarunTater commented Aug 24, 2021

We are trying to replicate the results from samantar indictrans paper. We are training the model for only en-hi translations. We are currently using these params following the paper :
fairseq-train ../en_hi_4x/final_bin --max-source-positions=210 --max-target-positions=210 --save-interval-updates=10000 --arch=transformer_4x --criterion=label_smoothed_cross_entropy --source-lang=SRC --lr-scheduler=inverse_sqrt --target-lang=TGT --label-smoothing=0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 1.0 --warmup-init-lr 1e-07 --lr 0.0005 --warmup-updates 4000 --dropout 0.2 --save-dir ../en_hi_4x/model --keep-last-epochs 5 --patience 5 --skip-invalid-size-inputs-valid-test --fp16 --user-dir model_configs --wandb-project 'train_1' --max-tokens 300"

Can you please share the params you have used for training the en-indic model or specifically if you have tried en-hi separately?

@gowtham1997
Copy link
Member

gowtham1997 commented Aug 26, 2021

hello,

We use the following command for the en-indic training.

fairseq-train <exp_dir folder>/final_bin \
--max-source-positions=210 \
--max-target-positions=210 \
--max-update=1000000 \
--save-interval=1 \
--arch=transformer_4x \
--criterion=label_smoothed_cross_entropy \
--source-lang=SRC \
--lr-scheduler=inverse_sqrt \
--target-lang=TGT \
--label-smoothing=0.1 \
--optimizer adam \
--adam-betas "(0.9, 0.98)" \
--clip-norm 1.0 \
--warmup-init-lr 1e-07 \
--lr 0.0005 \
--warmup-updates 4000 \
--dropout 0.2 \
--tensorboard-logdir <exp_dir folder>/tensorboard-wandb \
--save-dir <exp_dir folder>/model \
--keep-last-epochs 5 \
--patience 5 \
--skip-invalid-size-inputs-valid-test \
--fp16 \
--user-dir model_configs \
--wandb-project <project name> \
--update-freq=1 \
--distributed-world-size 4 \
--max-tokens 16384

^ for results in our paper, we ensured the effective batch size (max_tokens * distributed_world_size * update_freq) = ~64K. We haven't tried training 4x model only for en-hi

@TarunTater
Copy link
Author

TarunTater commented Aug 26, 2021

@gowtham1997 - thanks for sharing the params. any specific reason for this ?
we ensured max_tokens * distributed_world_size * update_freq = ~64K. for the memory constrains ?

@gowtham1997
Copy link
Member

gowtham1997 commented Aug 27, 2021

Sorry, I missed replying to this yesterday.

We observed that larger effective batch sizes utilized the GPUs fully and also showed better results in our initial experiments and hence, we chose ~64K. Effective batch sizes > 64K would also help but with time constraints in mind, we choose to use ~64K for our paper.

@TarunTater
Copy link
Author

ohk.. got it. thank you for the info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants