Params for Training en-indic model #23

TarunTater · 2021-08-24T14:53:00Z

We are trying to replicate the results from samantar indictrans paper. We are training the model for only en-hi translations. We are currently using these params following the paper :
fairseq-train ../en_hi_4x/final_bin --max-source-positions=210 --max-target-positions=210 --save-interval-updates=10000 --arch=transformer_4x --criterion=label_smoothed_cross_entropy --source-lang=SRC --lr-scheduler=inverse_sqrt --target-lang=TGT --label-smoothing=0.1 --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 1.0 --warmup-init-lr 1e-07 --lr 0.0005 --warmup-updates 4000 --dropout 0.2 --save-dir ../en_hi_4x/model --keep-last-epochs 5 --patience 5 --skip-invalid-size-inputs-valid-test --fp16 --user-dir model_configs --wandb-project 'train_1' --max-tokens 300"

Can you please share the params you have used for training the en-indic model or specifically if you have tried en-hi separately?

The text was updated successfully, but these errors were encountered:

gowtham1997 · 2021-08-26T06:46:31Z

hello,

We use the following command for the en-indic training.

fairseq-train <exp_dir folder>/final_bin \
--max-source-positions=210 \
--max-target-positions=210 \
--max-update=1000000 \
--save-interval=1 \
--arch=transformer_4x \
--criterion=label_smoothed_cross_entropy \
--source-lang=SRC \
--lr-scheduler=inverse_sqrt \
--target-lang=TGT \
--label-smoothing=0.1 \
--optimizer adam \
--adam-betas "(0.9, 0.98)" \
--clip-norm 1.0 \
--warmup-init-lr 1e-07 \
--lr 0.0005 \
--warmup-updates 4000 \
--dropout 0.2 \
--tensorboard-logdir <exp_dir folder>/tensorboard-wandb \
--save-dir <exp_dir folder>/model \
--keep-last-epochs 5 \
--patience 5 \
--skip-invalid-size-inputs-valid-test \
--fp16 \
--user-dir model_configs \
--wandb-project <project name> \
--update-freq=1 \
--distributed-world-size 4 \
--max-tokens 16384

^ for results in our paper, we ensured the effective batch size (max_tokens * distributed_world_size * update_freq) = ~64K. We haven't tried training 4x model only for en-hi

TarunTater · 2021-08-26T08:19:50Z

@gowtham1997 - thanks for sharing the params. any specific reason for this ?
we ensured max_tokens * distributed_world_size * update_freq = ~64K. for the memory constrains ?

gowtham1997 · 2021-08-27T08:20:59Z

Sorry, I missed replying to this yesterday.

We observed that larger effective batch sizes utilized the GPUs fully and also showed better results in our initial experiments and hence, we chose ~64K. Effective batch sizes > 64K would also help but with time constraints in mind, we choose to use ~64K for our paper.

TarunTater · 2021-08-27T08:30:42Z

ohk.. got it. thank you for the info.

TarunTater closed this as completed Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Params for Training en-indic model #23

Params for Training en-indic model #23

TarunTater commented Aug 24, 2021 •

edited

Loading

gowtham1997 commented Aug 26, 2021 •

edited

Loading

TarunTater commented Aug 26, 2021 •

edited

Loading

gowtham1997 commented Aug 27, 2021 •

edited

Loading

TarunTater commented Aug 27, 2021

Params for Training en-indic model #23

Params for Training en-indic model #23

Comments

TarunTater commented Aug 24, 2021 • edited Loading

gowtham1997 commented Aug 26, 2021 • edited Loading

TarunTater commented Aug 26, 2021 • edited Loading

gowtham1997 commented Aug 27, 2021 • edited Loading

TarunTater commented Aug 27, 2021

TarunTater commented Aug 24, 2021 •

edited

Loading

gowtham1997 commented Aug 26, 2021 •

edited

Loading

TarunTater commented Aug 26, 2021 •

edited

Loading

gowtham1997 commented Aug 27, 2021 •

edited

Loading