<a href="https://colab.research.google.com/github/IndraLukasTjahaja/Speech_Text/blob/main/DeepSpeech_train_a_model%2C_CV_Indonesia.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial: Training a Indonesia speech-to-text model
## Using DeepSpeech and Common Voice, based on DeepSpeech 0.9.3

### Updated as of 5 May 2022

*This tutorial based on https://colab.research.google.com/github/acabunoc/Tutorial-train-dutch-model/blob/master/DeepSpeech_train_a_model%2C_CV_Dutch.ipynb with revisions.*

*If you would like to implement this in your local computer, you can follow the guideline here to prepare your system https://docs.google.com/document/d/1sg4bm6xx9ZJc4GVPaVpUAQJ4GyPHPMk1Jp5W_AfqjrE/edit?usp=sharing However I strongly suggest you to use cloud solutions. I spent a week trying to prepare the system and it was not successful as the complex combination of hardware/linux/python/dependencies/CUDA alywas conflict with one another*

*In this tutorial, we're going to use Mozilla's DeepSpeech and Common Voice to train a Dutch speech-to-text model. The instructions are taken directly from the DeepSpeech documentation https://deepspeech.readthedocs.io/en/r0.9/TRAINING.html, linked in each section. Any changes from the docs or comments made will be writen in italics or highlighted in a comment on Colab*

*This uses the free tier on Google Colab. I turned on GPU hardware accelerator in Notebook settings. I did not add any additional file storage.*

##Training Your Own Model
Taken from the [DeepSpeech docs - Training Your Own Model](https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html#training-your-own-model)

For new version of document: https://deepspeech.readthedocs.io/en/r0.9/TRAINING.html

### Prerequisites for training a model


* Python 3.6
* CUDA 10.0 / CuDNN v7.6 per Dockerfile. (for local)
* Mac or Linux environment (for local)
* Git large file (for Google Colab)

In [None]:
# Checking Python version
!python --version
import sys
sys.version

Python 3.7.13


'3.7.13 (default, Apr 24 2022, 01:04:09) \n[GCC 7.5.0]'

In [None]:
# Downgrade to Python 3.6 as per the following guideline https://colab.research.google.com/github/Dene33/mlcourse.ai/blob/master/jupyter_english/tutorials/Useful_Google_Colab_snippets.ipynb
# Test whether 3.7 works or not

In [None]:
import sys
! sudo apt-get install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
git-lfs is already the newest version (2.3.4-1).
The following packages were automatically installed and are no longer required:
  libnvidia-common-460 nsight-compute-2020.2.0
Use 'sudo apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 42 not upgraded.


### Get the training code

Then clone the DeepSpeech repository and run `git lfs pull`.

In [None]:
! git clone https://github.com/mozilla/DeepSpeech --branch v0.9.3
# if error is fatal: destination path exist, then you already copy it to your GoogleColab

Cloning into 'DeepSpeech'...
remote: Enumerating objects: 23888, done.[K
remote: Total 23888 (delta 0), reused 0 (delta 0), pack-reused 23888[K
Receiving objects: 100% (23888/23888), 49.36 MiB | 22.86 MiB/s, done.
Resolving deltas: 100% (16417/16417), done.
Note: checking out 'f2e9c85880dff94115ab510cde9ca4af7ee51c19'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>



In [None]:
%cd /content/DeepSpeech
! git lfs pull

/content/DeepSpeech


### Creating a virtual environment

In creating a virtual environment you will create a directory containing a python3 binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on $HOME/tmp/deepspeech-train-venv. 

In [None]:
! pip3 install virtualenv
! virtualenv -p python3 $HOME/tmp/deepspeech-train-venv/

Collecting virtualenv
  Downloading virtualenv-20.14.1-py2.py3-none-any.whl (8.8 MB)
[K     |████████████████████████████████| 8.8 MB 14.4 MB/s 
Collecting distlib<1,>=0.3.1
  Downloading distlib-0.3.4-py2.py3-none-any.whl (461 kB)
[K     |████████████████████████████████| 461 kB 68.4 MB/s 
[?25hCollecting platformdirs<3,>=2
  Downloading platformdirs-2.5.2-py3-none-any.whl (14 kB)
Installing collected packages: platformdirs, distlib, virtualenv
Successfully installed distlib-0.3.4 platformdirs-2.5.2 virtualenv-20.14.1
created virtual environment CPython3.7.13.final.0-64 in 753ms
  creator CPython3Posix(dest=/root/tmp/deepspeech-train-venv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
    added seed packages: pip==22.0.4, setuptools==62.1.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivato

Once this command completes successfully, the environment will be ready to be activated.

####Activating the environment

Each time you need to work with DeepSpeech, you have to activate this virtual environment. This is done with this simple command:

In [None]:
! source $HOME/tmp/deepspeech-train-venv/bin/activate

### Installing DeepSpeech Training Code and its dependencies

Install the required dependencies using pip3:

In [None]:
%cd /content/DeepSpeech
! pip3 install --upgrade pip==20.2.2 wheel==0.34.2 setuptools==49.6.0

/content/DeepSpeech
Collecting pip==20.2.2
  Downloading pip-20.2.2-py2.py3-none-any.whl (1.5 MB)
[K     |████████████████████████████████| 1.5 MB 12.3 MB/s 
[?25hCollecting wheel==0.34.2
  Downloading wheel-0.34.2-py2.py3-none-any.whl (26 kB)
Collecting setuptools==49.6.0
  Downloading setuptools-49.6.0-py3-none-any.whl (803 kB)
[K     |████████████████████████████████| 803 kB 66.4 MB/s 
[?25hInstalling collected packages: wheel, setuptools, pip
  Attempting uninstall: wheel
    Found existing installation: wheel 0.37.1
    Uninstalling wheel-0.37.1:
      Successfully uninstalled wheel-0.37.1
  Attempting uninstall: setuptools
    Found existing installation: setuptools 57.4.0
    Uninstalling setuptools-57.4.0:
      Successfully uninstalled setuptools-57.4.0
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
[31mERROR: pip's dependency resolver does not currently take into account all t

In [None]:
%cd /content/DeepSpeech
! pip3 install --upgrade -e .

/content/DeepSpeech
Obtaining file:///content/DeepSpeech
Collecting pyxdg
  Downloading pyxdg-0.27-py2.py3-none-any.whl (49 kB)
[K     |████████████████████████████████| 49 kB 4.9 MB/s 
[?25hCollecting attrdict
  Downloading attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
Collecting opuslib==2.0.0
  Downloading opuslib-2.0.0.tar.gz (7.3 kB)
Collecting optuna
  Downloading optuna-2.10.0-py3-none-any.whl (308 kB)
[K     |████████████████████████████████| 308 kB 28.4 MB/s 
[?25hCollecting sox
  Downloading sox-1.4.1-py2.py3-none-any.whl (39 kB)
Collecting numba==0.47.0
  Downloading numba-0.47.0-cp37-cp37m-manylinux1_x86_64.whl (3.7 MB)
[K     |████████████████████████████████| 3.7 MB 53.9 MB/s 
[?25hCollecting llvmlite==0.31.0
  Downloading llvmlite-0.31.0-cp37-cp37m-manylinux1_x86_64.whl (20.2 MB)
[K     |████████████████████████████████| 20.2 MB 1.2 MB/s 
Collecting ds_ctcdecoder==0.9.3
  Downloading ds_ctcdecoder-0.9.3-cp37-cp37m-manylinux1_x86_64.whl (2.1 MB)
[K     |███████████

#### Recommendations

If you have a capable (NVIDIA, at least 8GB of VRAM) GPU, it is highly recommended to install TensorFlow with GPU support. Training will be significantly faster than using the CPU. To enable GPU support, you can do:

In [None]:
! yes | pip3 uninstall tensorflow
! pip3 install 'tensorflow-gpu==1.15.4'

Found existing installation: tensorflow 1.15.4
Uninstalling tensorflow-1.15.4:
  Would remove:
    /usr/local/bin/estimator_ckpt_converter
    /usr/local/bin/freeze_graph
    /usr/local/bin/saved_model_cli
    /usr/local/bin/tensorboard
    /usr/local/bin/tf_upgrade_v2
    /usr/local/bin/tflite_convert
    /usr/local/bin/toco
    /usr/local/bin/toco_from_protos
    /usr/local/lib/python3.7/dist-packages/tensorflow-1.15.4.dist-info/*
    /usr/local/lib/python3.7/dist-packages/tensorflow/*
    /usr/local/lib/python3.7/dist-packages/tensorflow_core/*
Proceed (y/n)?   Successfully uninstalled tensorflow-1.15.4
Collecting tensorflow-gpu==1.15.4
  Downloading tensorflow_gpu-1.15.4-cp37-cp37m-manylinux2010_x86_64.whl (411.0 MB)
[K     |████████████████████████████████| 411.0 MB 26 kB/s 
Collecting numpy<1.19.0,>=1.16.0
  Downloading numpy-1.18.5-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
[K     |████████████████████████████████| 20.1 MB 1.3 MB/s 
Installing collected packages: numpy, tensor

### Common Voice Training Data
The Common Voice corpus consists of voice samples that were donated through Mozilla’s [Common Voice Initiative](https://voice.mozilla.org/). You can download individual CommonVoice v2.0 language data sets from [here](https://voice.mozilla.org/en/datasets). After extraction of such a data set, you’ll find the following contents:

* the `*.tsv` files output by CorporaCreator for the downloaded language
* the mp3 audio files they reference in a `clips` sub-directory.

*To retrieve the Dutch dataset, I went to the [CommonVoice 2.0 language datasets](https://voice.mozilla.org/en/datasets), selected Dutch from the dropdown, provided my email then right-clicked to get the gz file.*

In [None]:
%cd /content
%mkdir in
%cd in
! wget https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-9.0-2022-04-27/cv-corpus-9.0-2022-04-27-id.tar.gz

/content
/content/in
--2022-05-07 00:36:17--  https://mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com/cv-corpus-9.0-2022-04-27/cv-corpus-9.0-2022-04-27-id.tar.gz
Resolving mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com (mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com)... 52.218.177.225, 2600:1fa0:406c:1109:34da:ed09::
Connecting to mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com (mozilla-common-voice-datasets.s3.dualstack.us-west-2.amazonaws.com)|52.218.177.225|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1253048208 (1.2G) [application/octet-stream]
Saving to: ‘cv-corpus-9.0-2022-04-27-id.tar.gz’


2022-05-07 00:37:27 (17.1 MB/s) - ‘cv-corpus-9.0-2022-04-27-id.tar.gz’ saved [1253048208/1253048208]



In [None]:
%cd /content/in
! tar xvzf cv-corpus-9.0-2022-04-27-id.tar.gz

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457844.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457845.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457846.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457847.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457848.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457849.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457859.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457860.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457861.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457862.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457863.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457864.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457865.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_27457866.mp3
cv-corpus-9.0-2022-04-27/id/clips/common_voice_id_274

In [None]:
! rm cv-corpus-9.0-2022-04-27-id.tar.gz

*Install sox before running the CommonVoice Importer.*

In [None]:
! sudo apt-get install sox libsox-fmt-mp3

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libnvidia-common-460 nsight-compute-2020.2.0
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  libid3tag0 libmad0 libmagic-mgc libmagic1 libopencore-amrnb0
  libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base libsox3
Suggested packages:
  file libsox-fmt-all
The following NEW packages will be installed:
  libid3tag0 libmad0 libmagic-mgc libmagic1 libopencore-amrnb0
  libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base libsox-fmt-mp3 libsox3
  sox
0 upgraded, 11 newly installed, 0 to remove and 42 not upgraded.
Need to get 872 kB of archives.
After this operation, 7,087 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libopencore-amrnb0 amd64 0.1.3-2.1 [92.0 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/univer

To bring this data into a form that DeepSpeech understands, you have to run the CommonVoice v2.0 importer (bin/import_cv2.py):

#### Preparing Alphabet filter
Because the transcript quality can be of varying quality. Any words that are not part of the alphabet list will be discarded.

Upload this file in the path of id/alphabet.txt

The files can be downloaded from here https://drive.google.com/file/d/19X2Phj1Ckgon6Tnpj9EEENsgdrqbggw7/view?usp=sharing 

or mount your GDrive

In [None]:
from google.colab import drive
drive.mount('/content/drive')
# Saving it to Google Drive so we don't lose progress

Mounted at /content/drive


In [None]:
# If error check, for the path of alphabet
%cd /content/DeepSpeech/
! bin/import_cv2.py ../in/cv-corpus-9.0-2022-04-27/id --filter_alphabet /content/drive/MyDrive/DS_Project_NLP/alphabet.txt

/content/DeepSpeech
Loading TSV file:  /content/in/cv-corpus-9.0-2022-04-27/id/test.tsv
Importing mp3 files...
Progress |######################################################| 100% completedImported 3608 samples.
Skipped 14 samples that failed on transcript validation.
Final amount of imported audio: 4:04:53 from 4:06:01.
Saving new DeepSpeech-formatted CSV file to:  /content/in/cv-corpus-9.0-2022-04-27/id/clips/test.csv
Writing CSV file for DeepSpeech.py as:  /content/in/cv-corpus-9.0-2022-04-27/id/clips/test.csv
Progress |######################################################| 100% completed
Loading TSV file:  /content/in/cv-corpus-9.0-2022-04-27/id/dev.tsv
Importing mp3 files...
Progress |##################################################### |  98% completedImported 3184 samples.
Skipped 31 samples that failed on transcript validation.
Skipped 3 samples that were longer than 10 seconds.
Final amount of imported audio: 3:40:15 from 3:43:53.
Saving new DeepSpeech-formatted CSV file t

### Simple Training of Model using only CV

In [None]:
# To update with latest checkpoint, you can download latest checkpoint and upload it in the right folder
# put it into load_checkpoint


In [None]:
%cd /content/DeepSpeech/
! python3 DeepSpeech.py \
  --train_files /content/in/cv-corpus-9.0-2022-04-27/id/clips/train.csv \
  --dev_files /content/in/cv-corpus-9.0-2022-04-27/id/clips/dev.csv \
  --test_files /content/in/cv-corpus-9.0-2022-04-27/id/clips/test.csv \
  --train_batch_size 1 \
  --test_batch_size 1 \
  --n_hidden 100 \
  --epochs 20 \
  --checkpoint_dir /content/drive/MyDrive/DS_Project_NLP/checkpoint \
  --export_dir /content/drive/MyDrive/DS_Project_NLP/model

/content/DeepSpeech
I Loading best validating checkpoint from /content/drive/MyDrive/DS_Project_NLP/checkpoint/best_dev-315575
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable f

### STOP HERE