# Train a 🐸 Kiswahili STT model with Mozilla Common Voice data 💫

👋 Hello and welcome to this Mozilla Common Voice and Coqui (🐸) STT Coding Challenge

This notebook shows a **typical workflow** for **training** and **testing** an 🐸 STT model on Kiswahili data from Mozilla Common Voice.

In this notebook, we will:

1. Download Mozilla Common Voice data (pre-formatted for 🐸 STT)
2. Configure the training and testing runs
3. Train a new model
4. Test the model and display its performance

So, let's jump right in!


In [None]:
from google.colab import drive
drive.mount('/content/drive/')

In [None]:
!sudo apt-get update -y

!sudo apt-get install python3.7

!sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1

!sudo update-alternatives --config python3

!apt-get install python3-pip python3.7-distutils

In [None]:
!python --version

In [None]:
## Install Coqui STT
! pip install -U pip
! pip install coqui_stt_training
## Install opus tools
! apt-get install libopusfile0 libopus-dev libopusfile-dev

In [None]:
## Install Coqui STT
# dependencies
! sudo apt-get install sox libsox-fmt-mp3 libopusfile0 libopus-dev libopusfile-dev libsox-fmt-all libssl-doc -y
! pip install --upgrade pip
# the Coqui training package
! pip install coqui_stt_training==1.4.0
! pip uninstall -y tensorflow; pip install "tensorflow-gpu==1.15"
# code with importer scripts
# ! git clone --depth=1 https://github.com/coqui-ai/STT.git

In [None]:
from coqui_stt_training.util.config import initialize_globals_from_args

## ✅ Download & format sample data for Kiswahili

**First things first**: we need some data.

We're training a Speech-to-Text model, so we want _speech_ and we want _text_. Specificially, we want _transcribed speech_. Let's download some audio and transcripts.

To focus on model training, we formatted the Mozilla Common Voice data for you already, and you will find CSV files for `{train,test,dev}.csv` in the data directory.

Let's download some data for Kiswahili 😊

You can access the pre-formatted data via this Google Drive link (https://drive.google.com/drive/folders/1sEBmonkwNu65w1zIp8PYutI71mBHmdra?usp=sharing)

This tutorial assumes that you will add this data to your own Google Drive, so we shall go ahead and have the notebook access the data from there by connecting it to Google Drive.

Outside of this tutorial, be sure to check the full Kiswahili dataset as well as all the other language datasets available for STT on Mozilla Common Voice(https://commonvoice.mozilla.org/datasets)


In [None]:
from google.colab import drive

drive.mount('/content/gdrive/', force_remount=True)

Edit the cell below to reflect the directory where you have placed your data.

In [None]:
%cd gdrive/MyDrive/Events/swahilipot_hackathon


### 👀 Take a look at the data

In [None]:
! ls

## ✅ Configure & set hyperparameters

Coqui STT comes with a long list of hyperparameters you can tweak. We've set default values, but you will often want to set your own. You can use `initialize_globals_from_args()` to do this.

You must **always** configure the paths to your data, and you must **always** configure your alphabet. Additionally, here we show how you can specify the size of hidden layers (`n_hidden`), the number of epochs to train for (`epochs`), and to initialize a new model from scratch (`load_train="init"`).

There are many other hyperparameters not included in this tutorial, if you would like to further configure these to boost the performance of your model, be sure to check out the Coqui STT documentation.(https://stt.readthedocs.io/en/latest/index.html)

If you're training on a GPU, you can uncomment the (larger) training batch sizes for faster training.

In [None]:
from coqui_stt_training.util.config import initialize_globals_from_args

initialize_globals_from_args(
    train_files=["train.csv"],
    dev_files=["dev.csv"],
    test_files=["test.csv"],
    checkpoint_dir="checkpoints/",
    load_train="init",
    n_hidden=200,
    epochs=1,
    beam_width=1,
    #train_batch_size=128,
    #dev_batch_size=128,
    #test_batch_size=128,
)

### 👀 View all config settings

In [None]:
from coqui_stt_training.util.config import Config

print(Config.to_json())

## ✅ Train a new model

Let's kick off a training run 🚀🚀🚀 (using the configure you set above).

In [None]:
from coqui_stt_training.train import train

train()

## ✅ Test the model

We made it! 🙌

Let's kick off the testing run, which displays performance metrics.

The settings we used here are for demonstration purposes, so you don't want to deploy this model into production. In this notebook we're focusing on the workflow itself, so it's forgivable 😇

You can still train a more State-of-the-Art model by finding better hyperparameters, so go for it 💪

In [None]:
from coqui_stt_training.evaluate import test

test()

## ✅ Submit to this Coding Challenge

Once you have a well-performing model, you can go ahead and submit your results in order to stand a chance to win some prizes from the coding challenge. Use this form(https://forms.gle/sfJVDrb6n6HrmYTn8) which will require you to submit 2 screenshots.
First one with the hyper-parameters cell from your notebook. This is the first code cell under the 'Configure & set hyperparameters' section. (If the cell contents are long, zoom out and ensure to capture the *ENTIRE* contents of the cell)
Second is a screenshot of the results of testing your model. This should be the output from the cell under the 'Test the Model' section. Include only the very *TOP* of the output where the best CER, WER and loss values are visible.

Finally, you will also need to manually input these values(WER, CER, loss) in some fields provided in the form.

All the best!

## ✅ Take it further with Mozilla Common Voice

We hope you enjoyed this coding challenge!

If you want to take your model to the next level, and stand a chance to win USD$ 2000, check out our latest Model and Methods Competition(https://foundation.mozilla.org/en/blog/announcing-our-voices-a-new-competition-by-mozilla-to-fight-bias-in-voice-technology/)