<a href="https://colab.research.google.com/github/Aquib-Nawaz/End-to-End-ASR-using-Transfer-Learning/blob/main/Task2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Easy transfer learning with 🐸 STT ⚡

You want to train a Coqui (🐸) STT model, but you don't have a lot of data. What do you do?

The answer 💡: Grab a pre-trained model and fine-tune it to your data. This is called `"Transfer Learning"` ⚡

🐸 STT comes with transfer learning support out-of-the box.

You can even take a pre-trained model and fine-tune it to _any new language_, even if the alphabets are completely different. Likewise, you can fine-tune a model to your own data and improve performance if the language is the same.

In this notebook, we will:

1. Download a pre-trained English STT model.
2. Download data for the Marathi language.
3. Fine-tune the English model to Russian language.
4. Test the new Marathi model and display its performance on train and test set.
5. Train the model after channging parameters to get better CER on test set.
6. Display the new result.
So, let's jump right in!

*PS - If you just want a working, off-the-shelf model, check out the [🐸 Model Zoo](https://www.coqui.ai/models)*

In [None]:
## Install Coqui STT
! pip install -U pip
! pip install coqui_stt_training

[0m

## ✅ Download pre-trained English model

We're going to download a very small (but very accurate) pre-trained STT model for English. This model was trained to only transcribe the English words "yes" and "no", but with transfer learning we can train a new model which could transcribe any words in any language. In this notebook, we will turn this "constrained vocabulary" English model into an "open vocabulary" Russian model.

Coqui STT models as typically stored as checkpoints (for training) and protobufs (for deployment). For transfer learning, we want the **model checkpoints**.


In [None]:
### Download pre-trained model

import os
import tarfile
from coqui_stt_training.util.downloader import maybe_download

def download_pretrained_model():
    model_dir="english/"
    if not os.path.exists("english/coqui-yesno-checkpoints"):
        maybe_download("model.tar.gz", model_dir, "https://github.com/coqui-ai/STT-models/releases/download/english%2Fcoqui%2Fyesno-v0.0.1/coqui-yesno-checkpoints.tar.gz")
        print('\nNo extracted pre-trained model found. Extracting now...')
        tar = tarfile.open("english/model.tar.gz")
        tar.extractall("english/")
        tar.close()
    else:
        print('Found "english/coqui-yesno-checkpoints" - not extracting.')

# Download + extract pre-trained English model
download_pretrained_model()

Found "english/coqui-yesno-checkpoints" - not extracting.


## ✅ Download data for Russian

**First things first**: we need some data.

We're training a Speech-to-Text model, so we need some _speech_ and we need some _text_. Specificially, we want _transcribed speech_. Let's download a Russian audio file and its transcript, pre-formatted for 🐸 STT. 

**Second things second**: we want a Russian alphabet. The output layer of a typical* 🐸 STT model represents letters in the alphabet. Let's download a Russian alphabet from Coqui and use that.

*_If you are working with languages with large character sets (e.g. Chinese), you can set `bytes_output_mode=True` instead of supplying an `alphabet.txt` file. In this case, the output layer of the STT model will correspond to individual UTF-8 bytes instead of individual characters._

In [None]:
### Download sample data
from coqui_stt_training.util.downloader import maybe_download

def download_sample_data():
#     data_dir="data/"
        #https://www.cse.iitb.ac.in/~pjyothi/cs753/data.tgz
    maybe_download("ml-data.tar.gz", "./", "https://www.cse.iitb.ac.in/~pjyothi/cs753/ml-data.tgz")
    tar = tarfile.open("ml-data.tar.gz")
    tar.extractall("./")

# Download sample Russian data
download_sample_data()

Found archive "./ml-data.tar.gz" - not downloading.


In [None]:
import tarfile
def download_generatescorer():
#     data_dir="data/"
        #https://www.cse.iitb.ac.in/~pjyothi/cs753/data.tgz
    maybe_download("native_client.tflite.Linux.tar.xz3.07", "./", "https://github.com/coqui-ai/STT/releases/download/v1.3.0/native_client.tflite.Linux.tar.xz")
    tar = tarfile.open("native_client.tflite.Linux.tar.xz3.07")
    tar.extractall("native_client.tflite.Linux/")
    tar.close()


# Download sample Russian data
download_generatescorer()

Found archive "./native_client.tflite.Linux.tar.xz3.07" - not downloading.


In [None]:


f = open('ml-data/telugu-alphabet.txt', 'w', encoding="utf8")
f.write(' \nఀ\nఁ\n ం\n ః	\nఅ\nఆ\nఇ\nఈ\nఉ\nఊ\nఋ\nఌ\nఎ\nఏ\nఐ\nఒ\nఓ\nఔ\nక\nఖ\nగ\nఘ\nఙ\nచ\nఛ\nజ\nఝ\nఞ\nట\nఠ\nడ\nఢ\nణ\nత\nథ\nద\nధ\nన\nప\nఫ\nబ\nభ\nమ\nయ\nర\nఱ\nల\nళ\nఴ\nవ\nశ\nష\nస\nహ\nఽ\nా\nి\nీ\nు\nూ\nృ\nౄ\nె\nే\nై\nొ\nో\nౌ\n్\nౕ\nౖ\nౘ\nౙ\nౚ\nౠ\nౡ\nౢ\nౣ\n౦\n౧\n౨\n౩\n౪\n౫\n౬\n౭\n౮\n౯\n౸\n౹\n౺\n౻\n౼\n౽\n౾\n౿\nం\n\\\nn')

f.close()

In [None]:


f = open('ml-data/kannada-alphabet.txt', 'w', encoding="utf8")
! mkdir ml-data/checkpoints
f1 = open('ml-data/checkpoints/alphabet.txt','w',encoding="utf8")
f.write(' \nಅ\nಆ\nಇ\nಈ\nಉ\nಊ\nಋ\nೠ\nಎ\nಏ\nಐ\nಒ\nಓ\nಔ\nಅಂ\nಅಃ\nಕ\nಖ\nಗ\nಘ\nಙ\nಚ\nಛ\nಜ\nಝ\nಞ\nಜ಼\nಟ\nಠ\nಡ\nಢ\nಣ\nತ\nಥ\nದ\nಧ\nನ\nಪ\nಫ\nಬ\nಭ\nಮ\nಫ಼\nಯ\nರ\nಱ\nಲ\nಳ\nೞ\nವ\nಶ\nಷ\nಸ\nಹ\nಾ\nಿ\nೀ\nು\nೂ\nೃ\nೄ\nೆ\nೇ\n ೈ\nೊ\nೋ\nೌ\nಂ\nಃ\n್\n.\nೈ\n')
f1.write(' \nಅ\nಆ\nಇ\nಈ\nಉ\nಊ\nಋ\nೠ\nಎ\nಏ\nಐ\nಒ\nಓ\nಔ\nಅಂ\nಅಃ\nಕ\nಖ\nಗ\nಘ\nಙ\nಚ\nಛ\nಜ\nಝ\nಞ\nಜ಼\nಟ\nಠ\nಡ\nಢ\nಣ\nತ\nಥ\nದ\nಧ\nನ\nಪ\nಫ\nಬ\nಭ\nಮ\nಫ಼\nಯ\nರ\nಱ\nಲ\nಳ\nೞ\nವ\nಶ\nಷ\nಸ\nಹ\nಾ\nಿ\nೀ\nು\nೂ\nೃ\nೄ\nೆ\nೇ\n ೈ\nೊ\nೋ\nೌ\nಂ\nಃ\n್\n.\nೈ\n')
f.close()
f1.close()

mkdir: cannot create directory ‘ml-data/checkpoints’: File exists


## ✅ Configure the training run

Making `ma.csv` training file and `matest.csv` test file from `marathi.tsv`

In [None]:
f = open('ml-data/tg.csv','w')
f.write('wav_filename,wav_filesize,transcript\n')
f2 = open('ml-data/tgdev.csv', 'w')
f2.write('wav_filename,wav_filesize,transcript\n')
with open('ml-data/train.tsv', 'r') as rfile:
    lines = rfile.readlines()
    for l in lines[10:15]:
        l_ = l.split('\t')
        f.write('train/telugu/'+l_[0]+'.wav,0,'+l_[1])
    for l in lines[15:20]:
        l_ = l.split('\t')
        f2.write('train/telugu/'+l_[0]+'.wav,0,'+l_[1])
f.close()
f2.close()

Initialize the model configuration. It is same as given with changed file name.

In [None]:
from coqui_stt_training.util.config import initialize_globals_from_args

initialize_globals_from_args(
    n_hidden=64,
    load_checkpoint_dir="english/coqui-yesno-checkpoints",
    save_checkpoint_dir="ml-data/telugu/checkpoints",
    drop_source_layers=1,
    alphabet_config_path="ml-data/telugu-alphabet.txt",
    train_files=["ml-data/tg.csv"],
    dev_files=["ml-data/tgdev.csv"],
    epochs=100,
    load_cudnn=True,

)

### View all Config settings (*Optional*) 

In [None]:
from coqui_stt_training.util.config import Config
# Config.max_to_keep = 10
print(Config.to_json())

{
    "train_files": [
        "ml-data/tg.csv"
    ],
    "dev_files": [
        "ml-data/tgdev.csv"
    ],
    "test_files": [],
    "metrics_files": [],
    "auto_input_dataset": "",
    "vocab_file": "",
    "read_buffer": 1048576,
    "feature_cache": "",
    "cache_for_epochs": 0,
    "shuffle_batches": false,
    "shuffle_start": 1,
    "shuffle_buffer": 1000,
    "feature_win_len": 32,
    "feature_win_step": 20,
    "audio_sample_rate": 16000,
    "normalize_sample_rate": true,
    "augment": null,
    "epochs": 100,
    "dropout_rate": 0.05,
    "dropout_rate2": 0.05,
    "dropout_rate3": 0.05,
    "dropout_rate4": 0.0,
    "dropout_rate5": 0.0,
    "dropout_rate6": 0.05,
    "relu_clip": 20.0,
    "beta1": 0.9,
    "beta2": 0.999,
    "epsilon": 1e-08,
    "learning_rate": 0.001,
    "train_batch_size": 1,
    "dev_batch_size": 1,
    "test_batch_size": 1,
    "export_batch_size": 1,
    "inter_op_parallelism_threads": 0,
    "intra_op_parallelism_threads": 0,
    "use_allow

## ✅ Train a new Marathi model

Let's kick off a training run 🚀🚀🚀 (using the configure you set above).


In [None]:
from coqui_stt_training.train import train
import os
# use maximum one GPU
learning_rates = [0.0001]
dropout = [0.2,0.4,0.6]
spec = [True, False]
# Config.epochs = 10
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# train()
# for l in learning_rates:
#   for d in dropout:
#     for s in spec:
#       Config.learning_rate = l
#       Config.dropout_rate = d
#       Config.augment = ["frequency_mask[p=0.8, n=2:4, size=2:4]","time_mask[p=0.8, n=2:4, size=10:50, \
#       domain=spectrogram]"]
#       train()
#       print(f"l={l}:::: d = {d}:::: s = {s}\n")



train()

I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Loading best validating checkpoint from english/coqui-yesno-checkpoints/best_dev-1909
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel

## ✅ Configure the testing run

Let's add the path to our testing data and update `load_checkpoint_dir` to our new model checkpoints.

## ✅ Test the new Russian model

We made it! 🙌

Let's kick off the testing run on **training set**, which displays performance metrics.


Test on **test set**

In [None]:
# f = open('data/ma.csv', 'w')
# f.write('wav_filename,wav_filesize,transcript\n')
# f1 = open('data/matest.csv', 'w')
# f1.write('wav_filename,wav_filesize,transcript\n')
# f2 = open('data/madev.csv', 'w')
# f2.write('wav_filename,wav_filesize,transcript\n')
# with open('data/marathi.tsv', 'r') as rfile:
#     lines = rfile.readlines()
#     for l in lines[:10]:
#         l_ = l.split('\t')
#         f.write('wavs/train/'+l_[0]+'.wav'+',0,'+l_[1])
#     # for l in lines[8:10]:
#     #     l_ = l.split('\t')
#     #     f2.write('wavs/train/'+l_[0]+'.wav'+',0,'+l_[1])
#     for l in lines[10:]:
#         l_ = l.split('\t')
#         f1.write('wavs/test/'+l_[0]+'.wav'+',0,'+l_[1])
# f.close()
# f1.close()
# f2.close()
f = open('ml-data/kannada.csv','w')
f.write('wav_filename,wav_filesize,transcript\n')
f1 = open('ml-data/kdtest.csv', 'w')
f1.write('wav_filename,wav_filesize,transcript\n')
f2 = open('ml-data/kddev.csv', 'w')
f2.write('wav_filename,wav_filesize,transcript\n')
with open('ml-data/train.tsv', 'r') as rfile:
    lines = rfile.readlines()
    for l in lines[30:35]:
        l_ = l.split('\t')
        f.write('train/kannada/'+l_[0]+'.wav,0,'+l_[1])
    for l in lines[35:40]:
        l_ = l.split('\t')
        f2.write('train/kannada/'+l_[0]+'.wav,0,'+l_[1])
f.close()
f1.close()
f2.close()

In [None]:
ft = open('ml-data/kdtest.csv', 'w')
ft.write('wav_filename,wav_filesize,transcript\n')
with open('ml-data/test/test.tsv', 'r') as rfile:
    lines = rfile.readlines()
    for l in lines[:11]:
        l_ = l.split('\t')
        ft.write('test/'+l_[0]+'.wav,0,'+l_[1])

ft.close()

In [None]:
from coqui_stt_training.util.config import initialize_globals_from_args

initialize_globals_from_args(
    n_hidden=64,
    load_checkpoint_dir="ml-data/telugu/checkpoints",
    save_checkpoint_dir="ml-data/checkpoints",
    drop_source_layers=1,
    alphabet_config_path="ml-data/kannada-alphabet.txt",
    train_files=["ml-data/kannada.csv"],
    dev_files=["ml-data/kddev.csv"],
    epochs=100,
    load_cudnn=True,
    augment = ["reverb[p=0.1,delay=50.0~30.0,decay=10.0:2.0~1.0]", "resample[p=0.1,rate=12000:8000~4000]", "codec[p=0.1,bitrate=48000:16000]"]
)

Parsed augmentations: [Reverb(p=0.1, delay=ValueRange(start=50.0, end=50.0, r=30.0), decay=ValueRange(start=10.0, end=2.0, r=1.0)), Resample(p=0.1, rate=ValueRange(start=12000, end=8000, r=4000)), Codec(p=0.1, bitrate=ValueRange(start=48000, end=16000, r=0))]


In [None]:
from coqui_stt_training.train import train
import os
# use maximum one GPU
learning_rates = [0.0001]
dropout = [0.2,0.4,0.6]
spec = [True, False]
# Config.epochs = 10
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# train()
# for l in learning_rates:
#   for d in dropout:
#     for s in spec:
#       Config.learning_rate = l
#       Config.dropout_rate = d
#       Config.augment = ["frequency_mask[p=0.8, n=2:4, size=2:4]","time_mask[p=0.8, n=2:4, size=10:50, \
#       domain=spectrogram]"]
#       train()
#       print(f"l={l}:::: d = {d}:::: s = {s}\n")



train()

I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Loading best validating checkpoint from ml-data/checkpoints/best_dev-2214
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lst

In [None]:
from coqui_stt_training.util.config import Config

Config.test_files=["ml-data/kdtest.csv"]
Config.load_checkpoint_dir="ml-data/checkpoints"

In [None]:
from coqui_stt_training.evaluate import test

test()

I Loading best validating checkpoint from ml-data/checkpoints/best_dev-2224
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on ml-data/kdtest.csv
Test epoch | Steps: 10 | Elapsed Time: 0:01:29                                 
Test on ml-data/kdtest.csv - WER: 1.000000,