Training Fastconformer-CTC on another language. #8123

bharathraj-v · 2024-01-04T10:33:36Z

bharathraj-v
Jan 4, 2024

Hi,
I'm new to NeMo and looking to train the Fastconformer-CTC model on Telugu datasets. Telugu is an Indian language which is highly inflectional and somewhat agglutinative. I have around 800 hours of data, what's the best approach to achieve good results here? Do I fine-tune a pre-trained model or train one from scratch?

Also, what's the config that's best suited for this case? And, do I go for subword tokenization for a language such as this or does the standard bpe tokenizer provided in the scripts suffice? Any guidance regarding this would be of great help

Thanks!

Answered by titu1994

Jan 4, 2024

Overall yes, use subwords (spe unigram or just spe bpe). Config - Fast Conformer large - try to finetune and not train from scratch, it will always lead to better results. The ASR CTC finetune tutorial has most of the tricks for fine-tuning that @nithinraok applied

View full answer

titu1994 · 2024-01-04T17:25:10Z

titu1994
Jan 4, 2024
Maintainer

@nithinraok wrote this post for a Telegu speech model here - https://blogs.nvidia.com/blog/speech-ai-telugu-language-breakthrough/

He could provide you with specific advice

12 replies

titu1994 Jan 4, 2024
Maintainer

Overall yes, use subwords (spe unigram or just spe bpe). Config - Fast Conformer large - try to finetune and not train from scratch, it will always lead to better results. The ASR CTC finetune tutorial has most of the tricks for fine-tuning that @nithinraok applied

Answer selected by bharathraj-v

bharathraj-v Jan 5, 2024
Author

Thanks!

nithinraok Jan 8, 2024
Maintainer

Yes as @titu1994 mentioned, create a spe bpe based tokenizer from telugu words and finetune a XL/L En ASR model.
L: https://huggingface.co/nvidia/stt_en_fastconformer_ctc_large
XL: https://huggingface.co/nvidia/stt_en_fastconformer_ctc_xlarge

bharathraj-v Jan 9, 2024
Author

Okay, thanks! So for Telugu's spe bpe tokenizer, what's the ideal vocab size I should go for? @titu1994 @nithinraok

nithinraok Jan 26, 2024
Maintainer

You may experiment with various sizes (256,512, 1024). 256 should be good enough based on our prior experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Fastconformer-CTC on another language. #8123

{{title}}

Replies: 1 comment 12 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Training Fastconformer-CTC on another language. #8123

bharathraj-v Jan 4, 2024

Replies: 1 comment · 12 replies

titu1994 Jan 4, 2024 Maintainer

titu1994 Jan 4, 2024 Maintainer

bharathraj-v Jan 5, 2024 Author

nithinraok Jan 8, 2024 Maintainer

bharathraj-v Jan 9, 2024 Author

nithinraok Jan 26, 2024 Maintainer

bharathraj-v
Jan 4, 2024

Replies: 1 comment 12 replies

titu1994
Jan 4, 2024
Maintainer

titu1994 Jan 4, 2024
Maintainer

bharathraj-v Jan 5, 2024
Author

nithinraok Jan 8, 2024
Maintainer

bharathraj-v Jan 9, 2024
Author

nithinraok Jan 26, 2024
Maintainer