<a href="https://colab.research.google.com/github/divyankiitb/Asr_files/blob/main/final_of_easy_transfer_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Easy transfer learning with 🐸 STT ⚡

You want to train a Coqui (🐸) STT model, but you don't have a lot of data. What do you do?

The answer 💡: Grab a pre-trained model and fine-tune it to your data. This is called `"Transfer Learning"` ⚡

🐸 STT comes with transfer learning support out-of-the box.

You can even take a pre-trained model and fine-tune it to _any new language_, even if the alphabets are completely different. Likewise, you can fine-tune a model to your own data and improve performance if the language is the same.

In this notebook, we will:

1. Download a pre-trained English STT model.
2. Download data for the Russian language.
3. Fine-tune the English model to Russian language.
4. Test the new Russian model and display its performance.

So, let's jump right in!

*PS - If you just want a working, off-the-shelf model, check out the [🐸 Model Zoo](https://www.coqui.ai/models)*

In [None]:
## Install Coqui STT
! pip install -U pip
! pip install coqui_stt_training

Collecting pip
  Downloading pip-22.0.4-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 7.2 MB/s 
[?25hInstalling collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.0.4
Collecting coqui_stt_training
  Downloading coqui_stt_training-1.3.0-py3-none-any.whl (87 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m87.2/87.2 KB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting coqpit
  Downloading coqpit-0.0.15-py3-none-any.whl (13 kB)
Collecting sox
  Downloading sox-1.4.1-py2.py3-none-any.whl (39 kB)
Collecting optuna
  Downloading optuna-2.10.0-py3-none-any.whl (308 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m308.2/308.2 KB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting opuslib==2.0.0
  Downloading opuslib-2.0.0.tar.gz (7.3 kB)
  Preparing metadata (setup.

## ✅ Download pre-trained English model

We're going to download a very small (but very accurate) pre-trained STT model for English. This model was trained to only transcribe the English words "yes" and "no", but with transfer learning we can train a new model which could transcribe any words in any language. In this notebook, we will turn this "constrained vocabulary" English model into an "open vocabulary" Russian model.

Coqui STT models as typically stored as checkpoints (for training) and protobufs (for deployment). For transfer learning, we want the **model checkpoints**.


In [None]:
### Download pre-trained model
import os
import csv
import re
import tarfile
import pandas as pd
import wave
import contextlib

from coqui_stt_training.util.downloader import maybe_download

def download_pretrained_model():
    model_dir="english/"
    if not os.path.exists("english/coqui-yesno-checkpoints"):
        maybe_download("model.tar.gz", model_dir, "https://github.com/coqui-ai/STT-models/releases/download/english%2Fcoqui%2Fyesno-v0.0.1/coqui-yesno-checkpoints.tar.gz")
        print('\nNo extracted pre-trained model found. Extracting now...')
        tar = tarfile.open("english/model.tar.gz")
        tar.extractall("english/")
        tar.close()
    else:
        print('Found "english/coqui-yesno-checkpoints" - not extracting.')

# Download + extract pre-trained English model
download_pretrained_model()

No path "english/" - creating ...
No archive "english/model.tar.gz" - downloading...


100%|██████████| 1160502/1160502 [00:00<00:00, 29793408.91it/s]


No extracted pre-trained model found. Extracting now...





## ✅ Download data for Russian

**First things first**: we need some data.

We're training a Speech-to-Text model, so we need some _speech_ and we need some _text_. Specificially, we want _transcribed speech_. Let's download a Russian audio file and its transcript, pre-formatted for 🐸 STT. 

**Second things second**: we want a Russian alphabet. The output layer of a typical* 🐸 STT model represents letters in the alphabet. Let's download a Russian alphabet from Coqui and use that.

*_If you are working with languages with large character sets (e.g. Chinese), you can set `bytes_output_mode=True` instead of supplying an `alphabet.txt` file. In this case, the output layer of the STT model will correspond to individual UTF-8 bytes instead of individual characters._

In [None]:
### Download sample data
from coqui_stt_training.util.downloader import maybe_download


def download_sample_data():
    data_dir="marathi/"
    maybe_download("data.tgz",data_dir, "https://www.cse.iitb.ac.in/~pjyothi/cs753/data.tgz")
    !tar -xzvf "marathi/data.tgz" -C "marathi/"
  
    with open("marathi/data/marathi.tsv", 'r') as myfile:  
      with open("marathi/data/wavs/train/marathi.csv", 'w') as csv_file:
        
        count=0
        for line in myfile:
          
          
          count=count+1
          if(count==11):
            break
          # Replace every tab with comma
          b=os.path.getsize('marathi/data/wavs/train/'+line[0:21]+'.wav')
          
          fileContent = re.sub("\t", ".wav,"+str(b)+",", line)
        
          # Writing into csv file
          csv_file.write(fileContent)

    with open('marathi/data/wavs/train/marathi.csv',newline='') as f:
      r = csv.reader(f)
      data = [line for line in r]
    with open('marathi/data/wavs/train/marathi.csv','w',newline='') as f:
      w = csv.writer(f)
      w.writerow(['wav_filename', 'wav_filesize', 'transcript'])
      w.writerows(data)     


    with open("marathi/data/marathi.tsv", 'r') as myfile:  
      with open("marathi/data/wavs/test/marathi_test.csv", 'w') as csv_file:
        
        count=0
        for line in myfile:
          
          
          count=count+1
          if(count<11):
            continue
          if(count==16):
            break  
          # Replace every tab with comma
          b=os.path.getsize('marathi/data/wavs/test/'+line[0:21]+'.wav')
          
          fileContent = re.sub("\t", ".wav,"+str(b)+",", line)
        
          # Writing into csv file
          csv_file.write(fileContent)

    with open('marathi/data/wavs/test/marathi_test.csv',newline='') as f:
      r = csv.reader(f)
      data = [line for line in r]
    with open('marathi/data/wavs/test/marathi_test.csv','w',newline='') as f:
      w = csv.writer(f)
      w.writerow(['wav_filename', 'wav_filesize', 'transcript'])
      w.writerows(data)
    #fname = 'marathi/data/wavs/train/mrt_01523_00028548203.wav'
    
    #alpha="अऄआइईउऊऋऌऍऎएऐऑऒओऔअंअःाोंे्ीूोौॏॐिुृॄॆॉॊॉॎैःँॅऀऻ़ॕॖॗॢॣ॰ॱ_क़ख़ग़ज़ड़ढ़फ़य़ॠॡ।॥०१२३४५६७८९ॲॳॴॵॶॷॸॹॺॻॼॽॾॿ?कखगघङचछजझञटठडढणतथदधनऩपफबभमयरऱलवशषसहळऴक्षज्ञ"
     
    
    #maybe_download("ru.wav", data_dir, "https://raw.githubusercontent.com/coqui-ai/STT/main/data/smoke_test/russian_sample_data/ru.wav")
    #maybe_download("ru.csv", data_dir, "https://raw.githubusercontent.com/coqui-ai/STT/main/data/smoke_test/russian_sample_data/ru.csv")
    #maybe_download("alphabet.txt", "marathi/data/wavs/train/", "https://github.com/divyankiitb/Asr_files/blob/main/alphabet.txt")
    !wget https://github.com/divyankiitb/Asr_files/raw/main/files.zip

    !unzip files.zip
# Download sample Russian data
download_sample_data()

No path "marathi/" - creating ...
No archive "marathi/data.tgz" - downloading...


100%|██████████| 5796505/5796505 [00:01<00:00, 3518189.47it/s]


data/
data/wavs/
data/marathi.tsv
data/wavs/test/
data/wavs/train/
data/wavs/train/._mrt_02624_00000391676.wav
data/wavs/train/mrt_02624_00000391676.wav
data/wavs/train/._mrt_02436_00013484215.wav
data/wavs/train/mrt_02436_00013484215.wav
data/wavs/train/._mrt_03349_00062847458.wav
data/wavs/train/mrt_03349_00062847458.wav
data/wavs/train/._mrt_02484_00002806507.wav
data/wavs/train/mrt_02484_00002806507.wav
data/wavs/train/._mrt_02484_00007602377.wav
data/wavs/train/mrt_02484_00007602377.wav
data/wavs/train/._mrt_02624_00007390408.wav
data/wavs/train/mrt_02624_00007390408.wav
data/wavs/train/._mrt_02436_00013089849.wav
data/wavs/train/mrt_02436_00013089849.wav
data/wavs/train/._mrt_01523_00028548203.wav
data/wavs/train/mrt_01523_00028548203.wav
data/wavs/train/._mrt_03349_00062047674.wav
data/wavs/train/mrt_03349_00062047674.wav
data/wavs/train/._mrt_01523_00029882518.wav
data/wavs/train/mrt_01523_00029882518.wav
data/wavs/test/._mrt_03397_02119986802.wav
data/wavs/test/mrt_03397_02119

## ✅ Configure the training run

Coqui STT comes with a long list of hyperparameters you can tweak. We've set default values, but you can use `initialize_globals_from_args()` to set your own. 

You must **always** configure the paths to your data, and you must **always** configure your alphabet. For transfer learning, it's good practice to define different `load_checkpoint_dir` and `save_checkpoint_dir` paths so that you keep your new model (Russian STT) separate from the old one (English STT). The parameter `drop_source_layers` allows you to remove layers from the original (aka "source") model, and re-initialize them from scratch. If you are fine-tuning to a new alphabet you will have to use _at least_ `drop_source_layers=1` to remove the output layer and add a new output layer which matches your new alphabet.

We are fine-tuning a pre-existing model, so `n_hidden` should be the same as the original English model.

In [None]:
from coqui_stt_training.util.config import initialize_globals_from_args

initialize_globals_from_args(
    n_hidden=64,
    load_checkpoint_dir="english/coqui-yesno-checkpoints",
    save_checkpoint_dir="marathi/data/wavs/train/checkpoints",
    drop_source_layers=3,
    alphabet_config_path="files/alphabet.txt",
    train_files=["marathi/data/wavs/train/marathi.csv"],
    dev_files=["marathi/data/wavs/train/marathi.csv"],
    epochs=100,
    learning_rate=.0001,
    dropout_rate=.05,
    early_stop=True,
    load_cudnn=True,
)

### View all Config settings (*Optional*) 

In [None]:
from coqui_stt_training.util.config import Config

print(Config.to_json())

{
    "train_files": [
        "marathi/data/wavs/train/marathi.csv"
    ],
    "dev_files": [
        "marathi/data/wavs/train/marathi.csv"
    ],
    "test_files": [],
    "metrics_files": [],
    "auto_input_dataset": "",
    "vocab_file": "",
    "read_buffer": 1048576,
    "feature_cache": "",
    "cache_for_epochs": 0,
    "shuffle_batches": false,
    "shuffle_start": 1,
    "shuffle_buffer": 1000,
    "feature_win_len": 32,
    "feature_win_step": 20,
    "audio_sample_rate": 16000,
    "normalize_sample_rate": true,
    "augment": null,
    "epochs": 100,
    "dropout_rate": 0.08,
    "dropout_rate2": 0.08,
    "dropout_rate3": 0.08,
    "dropout_rate4": 0.0,
    "dropout_rate5": 0.0,
    "dropout_rate6": 0.08,
    "relu_clip": 20.0,
    "beta1": 0.9,
    "beta2": 0.999,
    "epsilon": 1e-08,
    "learning_rate": 0.001,
    "train_batch_size": 1,
    "dev_batch_size": 1,
    "test_batch_size": 1,
    "export_batch_size": 1,
    "inter_op_parallelism_threads": 0,
    "intra_op_

## ✅ Train a new Russian model

Let's kick off a training run 🚀🚀🚀 (using the configure you set above).

This notebook should work on either a GPU or a CPU. However, in case you're running this on _multiple_ GPUs we want to only use one, because the sample dataset (one audio file) is too small to split across multiple GPUs.

In [None]:
from coqui_stt_training.train import train

# use maximum one GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

train()

I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Loading best validating checkpoint from english/coqui-yesno-checkpoints/best_dev-1909
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel

## ✅ Configure the testing run

Let's add the path to our testing data and update `load_checkpoint_dir` to our new model checkpoints.

In [None]:
from coqui_stt_training.util.config import Config

Config.test_files=["marathi/data/wavs/test/marathi_test.csv"]
Config.load_checkpoint_dir="marathi/data/wavs/train/checkpoints"

## ✅ Test the new Russian model

We made it! 🙌

Let's kick off the testing run, which displays performance metrics.

We're committing the cardinal sin of ML 😈 (aka - testing on our training data) so you don't want to deploy this model into production. In this notebook we're focusing on the workflow itself, so it's forgivable 😇

You can see from the test output that our tiny model has overfit to the data, and basically memorized this one sentence.

When you start training your own models, make sure your testing data doesn't include your training data 😅

In [None]:
from coqui_stt_training.evaluate import test

test()

I Loading best validating checkpoint from marathi/data/wavs/train/checkpoints/best_dev-2849
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on marathi/data/wavs/test/marathi_test.csv
Test epoch | Steps: 5 | Elapsed Time: 0:01:14                                  
Test o