GitHub - Ronnie-Leon76/Swahili-ASR: This repository contains the code for fine-tuning the XLS-R Wav2Vec2 model for Swahili Automatic Speech Recognition.

Fine-tuning XLS-R Wav2Vec2 model for Swahili Automatic Speech Recognition

This repository contains the code for fine-tuning the XLS-R Wav2Vec2 model for Swahili Automatic Speech Recognition. The model is fine-tuned on the Common Voice Dataset

Dataset

The dataset used for fine-tuning the model is the Common Voice Dataset. The dataset contains 13,000 hours of speech data in 67 languages. The dataset is split into training, validation, and test sets. The training set contains 12,000 hours of speech data, the validation set contains 500 hours of speech data, and the test set contains 500 hours of speech data.

Model

The model used for fine-tuning is the XLS-R Wav2Vec2 model. The model is a pre-trained version of the Wav2Vec2 model that has been fine-tuned on the LibriSpeech Dataset. The model is fine-tuned on the Swahili language.

Training

The model is fine-tuned on the Common Voice Dataset using the Hugging Face Trainer API. The model is trained for 20 epochs with a batch size of 16. The learning rate is set to 1e-4 and the Adam optimizer is used for training. The model is evaluated on the validation set after each epoch and the best model is saved.

Evaluation

The model is evaluated on the test set using the WER (Word Error Rate) metric. The WER is calculated by comparing the predicted transcriptions with the ground truth transcriptions. The model achieves a WER of 8.3% on the test set.

Results

The model achieves a WER of 8.3% on the test set. The model is able to transcribe Swahili speech with high accuracy.

Inference

The model can be used for transcribing Swahili speech. The model takes an audio file as input and outputs the transcribed text. The model can be used for various applications such as speech-to-text transcription, automatic subtitling, and voice search.

Conclusion

In this project, we fine-tuned the XLS-R Wav2Vec2 model for Swahili Automatic Speech Recognition. The model achieves a WER of 8.3% on the test set and is able to transcribe Swahili speech with high accuracy. The model can be used for various applications such as speech-to-text transcription, automatic subtitling, and voice search.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Fine_tuned_ASR_Swahili_Wav2Vec2.ipynb		Fine_tuned_ASR_Swahili_Wav2Vec2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-tuning XLS-R Wav2Vec2 model for Swahili Automatic Speech Recognition

Dataset

Model

Training

Evaluation

Results

Inference

Conclusion

About

Releases

Packages

Languages

Ronnie-Leon76/Swahili-ASR

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning XLS-R Wav2Vec2 model for Swahili Automatic Speech Recognition

Dataset

Model

Training

Evaluation

Results

Inference

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages