-
Notifications
You must be signed in to change notification settings - Fork 2.5k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Citrinet model with LM to reduce the WER for microphone recorded audio #2039
Comments
You could finetune Citrinet using the same Tokenizer on the specific domain (if there is sufficient data). If you have some noise files, noise robust training, the same method as QuartzNet can be applied to Citrinet. For preprocessing, there should be monochannel 16Khz wav files. We find that attempting signal denoising before inference will generally not do much better, and sometimes will do worse based on the artifacts introduced. For language modelling with Citrinet (and BPE models in general), we plan to release code snippets to build custom KenLM model and run beam search through similar steps as the offline asr notebook. However there are some significant differences and we have not compiled a clean script for such a task yet. I will try to prioritize that in the coming weeks. There is also Transformer based rescoring that can further boost offline WER reduction, though that pipeline is not ready yet. @AlexGrinch is there any ETA (within some months?) that you expect to have the pipeline for transformer based rescoring? |
Thanks you for response. |
@VahidooX If you have a rough draft, could you create a gist and share here when it's ready ? We can clean it up in the actual PR |
Can someone please share some details on this ? waiting for response. |
Created a PR for adding the feature of training and evaluating n-gram KenLM on top of BPE-base ASR models. It still needs the documentations. #2066 |
The PR to support N-gram LM for ASR models is merged : #2066 You need to install the beam search decoders and KenLM to use this feature. |
Thank you very much. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Hi
I am using stt_en_citrinet_1024 model and able to get good transcript , I am using the recorded audios with microphone and WER is varying from 3.5% to 15%. This has names of person and place, how to include the words in the model.
any suggestions with following aspects
looking for inputs .
The text was updated successfully, but these errors were encountered: