Training Fastconformer-CTC on another language. #8123
-
Hi, Also, what's the config that's best suited for this case? And, do I go for subword tokenization for a language such as this or does the standard bpe tokenizer provided in the scripts suffice? Any guidance regarding this would be of great help Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 12 replies
-
@nithinraok wrote this post for a Telegu speech model here - https://blogs.nvidia.com/blog/speech-ai-telugu-language-breakthrough/ He could provide you with specific advice |
Beta Was this translation helpful? Give feedback.
Overall yes, use subwords (spe unigram or just spe bpe). Config - Fast Conformer large - try to finetune and not train from scratch, it will always lead to better results. The ASR CTC finetune tutorial has most of the tricks for fine-tuning that @nithinraok applied