Could you share your pretrained ngram LM? #14

davyuan · 2022-05-15T16:49:01Z

Hello,

I'm following the link below to train my 6-gram LM for decoding. I use the downloaded the LibriSpeech corpus and use NeMo's CTC conformer medium model to train it. However I'm not seeing any improvement in WER compared to greedy search. The results actually became worse.

https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/asr_language_modeling.html

If you could share your detailed steps for training the 6-gram LM, or your pretrained model, it would be most helpful!

David

burchim · 2022-05-15T19:01:54Z

Hi David,

I added the missing 6-gram to the shared folders.
https://drive.google.com/drive/folders/1ZhevurjySBT_WMD6Q79g86XUJnTQ8VPa

It was also trained using the NeMo's ngram script by encoding the LibriSpeech corpus with special characters to support byte pair encoding.
You should be able to improve the model score by tuning the alpha and beta hyper-parameters!
Default params used in the paper are in the configs.

Best,
Maxime

davyuan · 2022-05-16T13:39:53Z

thanks Maxime!

burchim closed this as completed Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you share your pretrained ngram LM? #14

Could you share your pretrained ngram LM? #14

davyuan commented May 15, 2022 •

edited

burchim commented May 15, 2022

davyuan commented May 16, 2022

Could you share your pretrained ngram LM? #14

Could you share your pretrained ngram LM? #14

Comments

davyuan commented May 15, 2022 • edited

burchim commented May 15, 2022

davyuan commented May 16, 2022

davyuan commented May 15, 2022 •

edited