Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the training details and asking for model checkpoint #2

Closed
ChangLee0903 opened this issue Apr 7, 2021 · 6 comments
Closed

about the training details and asking for model checkpoint #2

ChangLee0903 opened this issue Apr 7, 2021 · 6 comments

Comments

@ChangLee0903
Copy link

Hi @archiki,
I appreciate this work very much, and thanks for providing the implementation. Could you please tell me that how long does the training cost? BTW, did you take the model checkpoint trained from clean corpus as the initial parameters to train the robust ASR? May I ask your checkpoints?

best,
Chi-Chang Lee

@archiki
Copy link
Owner

archiki commented Apr 7, 2021

Hey @ChangLee0903 !

Thanks for the feedback! I am not sure what you mean by training cost. Personally, I did not use state-of-the-art GPUs (NVIDIA GeForce GTX 1080 Ti ones), so most experiments took me about 12-15 hours on a single GPU depending on the experiment. Yes, I started with a clean speech checkpoint to train the robust ASR (to save compute effort and to start with a reasonable ASR system). This link will point you to the training command used for the base clean speech model along with the download link for the checkpoint. Hope this clarifies your questions.

@ChangLee0903
Copy link
Author

Thx 10 billions for your fast reply, all of my questions have been answered. I'll cite you if my paper publish.

@ChangLee0903
Copy link
Author

Hi @archiki,
I test your pre-train model with the LM model, which you point.
Since I cannot find the LM model link, I download one from librispeech's official website.
Here is the link: https://www.openslr.org/resources/11/4-gram.arpa.gz
Then the WER(7.334) and CER(3.028), which I got, are different from ur results.
BTW, your trainEnhance.py's arguments are different from the pre-train checkpoint, and I guess all the arguments I need can follow the pre-train checkpoint's settings, right?

best,
Chi-Chang Lee

@archiki
Copy link
Owner

archiki commented Apr 8, 2021

The difference between the WER and CER numbers must be due to the difference in beam-decoding parameters like --beam-width, --alpha, --beta. Let me know which arguments you are talking about, if I remember correctly, only the noise-injection-related arguments must be different. In that case, please use the arguments in the trainEnhanced.py. Remaining things like --hidden-size and --hidden-layers are inherited from the pretrained checkpoint.

@ChangLee0903
Copy link
Author

ChangLee0903 commented Apr 8, 2021

Hi @archiki,

Thx for ur reply, the arguments I concern is about "--learning-anneal 1.01 --batch-size 64 --
no-sortaGrad --opt-level O1 --loss-scale 1". Are such arguments the same as the pretrained checkpoints? BTW, did u leave the log files of ur training process? I noticed that the loss in trainTLNNoisy would increase in the beginning. Is it normal?

best,
Chi-Chang Lee

@archiki
Copy link
Owner

archiki commented Apr 8, 2021

Yes, @ChangLee0903, these arguments are taken from the checkpoint. Note: --learning-anneal and --batch-size depend on your dataset, GPU memory, and type of compute (especially the latter, it can be changed to 32 or 16 if need be). The former depends on the training profile and might be different for a different dataset.

@archiki archiki closed this as completed Apr 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants