Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you share your pretrained ngram LM? #14

Closed
davyuan opened this issue May 15, 2022 · 2 comments
Closed

Could you share your pretrained ngram LM? #14

davyuan opened this issue May 15, 2022 · 2 comments

Comments

@davyuan
Copy link

davyuan commented May 15, 2022

Hello,

I'm following the link below to train my 6-gram LM for decoding. I use the downloaded the LibriSpeech corpus and use NeMo's CTC conformer medium model to train it. However I'm not seeing any improvement in WER compared to greedy search. The results actually became worse.

https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html

If you could share your detailed steps for training the 6-gram LM, or your pretrained model, it would be most helpful!

David

@burchim
Copy link
Owner

burchim commented May 15, 2022

Hi David,

I added the missing 6-gram to the shared folders.
https://drive.google.com/drive/folders/1ZhevurjySBT_WMD6Q79g86XUJnTQ8VPa

It was also trained using the NeMo's ngram script by encoding the LibriSpeech corpus with special characters to support byte pair encoding.
You should be able to improve the model score by tuning the alpha and beta hyper-parameters!
Default params used in the paper are in the configs.

Best,
Maxime

@davyuan
Copy link
Author

davyuan commented May 16, 2022

thanks Maxime!

@burchim burchim closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants