Skip to content

drnic/openai_whisper_finetuning

 
 

Repository files navigation

This is an unofficial code for finetune Whisper model with your own dataset

[Original Repo] [Example Colab]

In this setup we use a small part of the LibriSpeech Dataset for finetuning the English model, the other option is using the Vivos dataset for finetuning the Vietnamese model. In case you want to finetune in either another dataset or another language, check the "dataset.py". You are also able to change the hyperparameters by using other setup file base on the file "config/vn_base_example.yaml". The path to config file must be define in .env

Experiment on Vietnamese with Vivos Dataset, WER of the base Whisper model dropped from 45.56% to 24.27% after finetuning 5 epochs.

Python version: 3.8

Setup:

pip install -r requirements.txt
cp .env.copy .env

In case you want to finetune model in Vietnamese, run this command to download the dataset:

python data/download_data_vivos.py
tar -xvf vivos.tar.gz vivos
mv vivos data

Run demo page by running, it will take a while to download the model:

streamlit run interface.py

alt text

To Finetune (with only speech-to-text-task):

python finetune.py

In case you want to finetune Whisper for both tasks STT and translate (ex: using google api to translate Vietnamese text to English), you can see the example dataset at link

To evaluate the model:

python evaluate_wer.py

To inference:

You are able to record your own audio file and convert it from speech to text using "record.py" and "inference.py"

Todo list

  • Add python argument parser and refactor code
  • Add dockerfile for deploy
  • Add Vietnamese Text normalization / Postprocessing
  • Add streamlit interface to record and inference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 61.2%
  • Jupyter Notebook 38.7%
  • Shell 0.1%