<a href="https://colab.research.google.com/github/acabunoc/Tutorial-train-dutch-model/blob/master/DeepSpeech_train_a_model%2C_CV_Dutch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tutorial: Training a Dutch speech-to-text model
## Using DeepSpeech and Common Voice

*In this tutorial, we're going to use Mozilla's DeepSpeech and Common Voice to train a Dutch speech-to-text model. The instructions are taken directly from the DeepSpeech documentation, linked in each section. Any changes from the docs or comments made will be writen in italics or highlighted in a comment on Colab*

*This uses the free tier on Google Colab. I turned on GPU hardware accelerator in Notebook settings. I did not add any additional file storage.*

*I chose to use Dutch since I wanted to use only the available disk space on Google Colab's free offerings and the English Common Voice dataset was too large. Also, [MozFest](https://mozillafestival.org/) will be Amsterdam in 2021! This seemed like a good time to pick up some Dutch.*

##Training Your Own Model
*Taken from the [DeepSpeech docs - Training Your Own Model](https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html#training-your-own-model)*

### Prerequisites for training a model


* Python 3.6
* Git Large File Storage
* Mac or Linux environment

Install [Git Large File Storage ](https://https://git-lfs.github.com/).

In [1]:
import sys
! sudo apt-get install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove and 33 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (2,544 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-p

### Get the training code

Then clone the DeepSpeech repository and run `git lfs pull`.

In [2]:
! git clone https://github.com/mozilla/DeepSpeech --branch v0.7.4

Cloning into 'DeepSpeech'...
remote: Enumerating objects: 145, done.[K
remote: Counting objects: 100% (145/145), done.[K
remote: Compressing objects: 100% (85/85), done.[K
remote: Total 19625 (delta 87), reused 100 (delta 59), pack-reused 19480[K
Receiving objects: 100% (19625/19625), 48.01 MiB | 27.89 MiB/s, done.
Resolving deltas: 100% (13372/13372), done.
Note: checking out 'fcd9563fcd8b47ee5719b24a9d7f0d9a4eaf372f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>



In [3]:
%cd /content/DeepSpeech
! git lfs pull

/content/DeepSpeech
Git LFS: (1 of 1 files) 909.20 MB / 909.20 MB


### Creating a virtual environment

In creating a virtual environment you will create a directory containing a python3 binary and everything needed to run deepspeech. You can use whatever directory you want. For the purpose of the documentation, we will rely on $HOME/tmp/deepspeech-train-venv. 

In [4]:
! pip3 install virtualenv
! virtualenv -p python3 $HOME/tmp/deepspeech-train-venv/

Collecting virtualenv
[?25l  Downloading https://files.pythonhosted.org/packages/9f/8f/cc19852c5636db0c4b1f1d8f6b7c3ebabffe89afa411ffe915cf9005c918/virtualenv-20.0.26-py2.py3-none-any.whl (4.9MB)
[K     |████████████████████████████████| 4.9MB 660kB/s 
Collecting importlib-resources>=1.0; python_version < "3.7"
  Downloading https://files.pythonhosted.org/packages/ba/03/0f9595c0c2ef12590877f3c47e5f579759ce5caf817f8256d5dcbd8a1177/importlib_resources-3.0.0-py2.py3-none-any.whl
Collecting appdirs<2,>=1.4.3
  Downloading https://files.pythonhosted.org/packages/3b/00/2344469e2084fb287c2e0b57b72910309874c3245463acd6cf5e3db69324/appdirs-1.4.4-py2.py3-none-any.whl
Collecting distlib<1,>=0.3.1
[?25l  Downloading https://files.pythonhosted.org/packages/f5/0a/490fa011d699bb5a5f3a0cf57de82237f52a6db9d40f33c53b2736c9a1f9/distlib-0.3.1-py2.py3-none-any.whl (335kB)
[K     |████████████████████████████████| 337kB 11.5MB/s 
Installing collected packages: importlib-resources, appdirs, distlib, virt

Once this command completes successfully, the environment will be ready to be activated.

####Activating the environment

Each time you need to work with DeepSpeech, you have to activate this virtual environment. This is done with this simple command:

In [5]:
! source $HOME/tmp/deepspeech-train-venv/bin/activate

### Installing DeepSpeech Training Code and its dependencies

Install the required dependencies using pip3:

In [1]:
%cd /content/DeepSpeech
! pip3 install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3

/content/DeepSpeech
Requirement already up-to-date: pip==20.0.2 in /usr/local/lib/python3.6/dist-packages (20.0.2)
Requirement already up-to-date: wheel==0.34.2 in /usr/local/lib/python3.6/dist-packages (0.34.2)
Requirement already up-to-date: setuptools==46.1.3 in /usr/local/lib/python3.6/dist-packages (46.1.3)


In [2]:
%cd /content/DeepSpeech
! pip3 install --upgrade -e .

/content/DeepSpeech
Obtaining file:///content/DeepSpeech
Collecting tensorflow==1.15.2
  Using cached tensorflow-1.15.2-cp36-cp36m-manylinux2010_x86_64.whl (110.5 MB)
Installing collected packages: tensorflow, deepspeech-training
  Attempting uninstall: deepspeech-training
    Found existing installation: deepspeech-training 0.7.4
    Can't uninstall 'deepspeech-training'. No files were found to uninstall.
  Running setup.py develop for deepspeech-training
Successfully installed deepspeech-training tensorflow-1.15.2


#### Recommendations

If you have a capable (NVIDIA, at least 8GB of VRAM) GPU, it is highly recommended to install TensorFlow with GPU support. Training will be significantly faster than using the CPU. To enable GPU support, you can do:

In [3]:
! pip3 uninstall tensorflow
! pip3 install 'tensorflow-gpu==1.15.2'

Found existing installation: tensorflow 1.15.2
Uninstalling tensorflow-1.15.2:
  Would remove:
    /usr/local/bin/estimator_ckpt_converter
    /usr/local/bin/freeze_graph
    /usr/local/bin/saved_model_cli
    /usr/local/bin/tensorboard
    /usr/local/bin/tf_upgrade_v2
    /usr/local/bin/tflite_convert
    /usr/local/bin/toco
    /usr/local/bin/toco_from_protos
    /usr/local/lib/python3.6/dist-packages/tensorflow-1.15.2.dist-info/*
    /usr/local/lib/python3.6/dist-packages/tensorflow/*
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/*
Proceed (y/n)? y
  Successfully uninstalled tensorflow-1.15.2
Collecting tensorflow-gpu==1.15.2
  Downloading tensorflow_gpu-1.15.2-cp36-cp36m-manylinux2010_x86_64.whl (411.0 MB)
[K     |████████████████████████████████| 411.0 MB 35 kB/s 
Installing collected packages: tensorflow-gpu
Successfully installed tensorflow-gpu-1.15.2


### Common Voice Training Data
The Common Voice corpus consists of voice samples that were donated through Mozilla’s [Common Voice Initiative](https://voice.mozilla.org/). You can download individual CommonVoice v2.0 language data sets from [here](https://voice.mozilla.org/en/datasets). After extraction of such a data set, you’ll find the following contents:

* the `*.tsv` files output by CorporaCreator for the downloaded language
* the mp3 audio files they reference in a `clips` sub-directory.

*To retrieve the Dutch dataset, I went to the [CommonVoice 2.0 language datasets](https://voice.mozilla.org/en/datasets), selected Dutch from the dropdown, provided my email then right-clicked to get the gz file.*

In [4]:
%cd /content
%mkdir nl
%cd nl
! wget https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-5-2020-06-22/nl.tar.gz

/content
/content/nl
--2020-07-12 02:26:28--  https://voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com/cv-corpus-5-2020-06-22/nl.tar.gz
Resolving voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com (voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com)... 52.218.232.219
Connecting to voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com (voice-prod-bundler-ee1969a6ce8178826482b88e843c335139bd3fb4.s3.amazonaws.com)|52.218.232.219|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1260671455 (1.2G) [application/octet-stream]
Saving to: ‘nl.tar.gz’


2020-07-12 02:27:02 (35.8 MB/s) - ‘nl.tar.gz’ saved [1260671455/1260671455]



In [5]:
%cd /content/nl
! tar xvzf nl.tar.gz

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105774.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105775.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105776.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105777.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105783.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105784.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105785.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105786.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21105787.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21108271.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21108272.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21108273.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21108274.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21108275.mp3
cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21135508.mp3
cv-corpus-5-2020-06-

In [6]:
! rm nl.tar.gz

*Install sox before running the CommonVoice Importer.*

In [7]:
! sudo apt-get install sox libsox-fmt-mp3

Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'sudo apt autoremove' to remove it.
The following additional packages will be installed:
  libid3tag0 libmad0 libmagic-mgc libmagic1 libopencore-amrnb0
  libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base libsox3
Suggested packages:
  file libsox-fmt-all
The following NEW packages will be installed:
  libid3tag0 libmad0 libmagic-mgc libmagic1 libopencore-amrnb0
  libopencore-amrwb0 libsox-fmt-alsa libsox-fmt-base libsox-fmt-mp3 libsox3
  sox
0 upgraded, 11 newly installed, 0 to remove and 33 not upgraded.
Need to get 872 kB of archives.
After this operation, 7,087 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libopencore-amrnb0 amd64 0.1.3-2.1 [92.0 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libopencore-amrwb0 a

To bring this data into a form that DeepSpeech understands, you have to run the CommonVoice v2.0 importer (bin/import_cv2.py):

In [8]:
%cd /content/DeepSpeech/
! bin/import_cv2.py ../nl/cv-corpus-5-2020-06-22/nl

/content/DeepSpeech
Loading TSV file:  /content/nl/cv-corpus-5-2020-06-22/nl/test.tsv
Importing mp3 files...
Progress |##################################################### |  98% completedImported 2196 samples.
Skipped 2 samples that were longer than 10 seconds.
Final amount of imported audio: 2:38:07 from 2:38:28.
Saving new DeepSpeech-formatted CSV file to:  /content/nl/cv-corpus-5-2020-06-22/nl/clips/test.csv
Writing CSV file for DeepSpeech.py as:  /content/nl/cv-corpus-5-2020-06-22/nl/clips/test.csv
Progress |######################################################| 100% completed
Loading TSV file:  /content/nl/cv-corpus-5-2020-06-22/nl/dev.tsv
Importing mp3 files...
Progress |##################################################### |  98% completedImported 2196 samples.
Skipped 1 samples that failed on transcript validation.
Skipped 1 samples that were longer than 10 seconds.
Final amount of imported audio: 2:33:50 from 2:34:07.
Saving new DeepSpeech-formatted CSV file to:  /content/n

Providing a filter alphabet is optional. It will exclude all samples whose transcripts contain characters not in the specified alphabet. Running the importer with -h will show you some additional options.

In [9]:
! python3 training/deepspeech_training/util/check_characters.py -alpha -unicode -csv ../nl/cv-corpus-5-2020-06-22/nl/clips/train.csv,../nl/cv-corpus-5-2020-06-22/nl/clips/dev.csv,../nl/cv-corpus-5-2020-06-22/nl/clips/test.csv >> /content/nl/alphabet.txt

###Create a Language Model / Scorer
*Taken from [DeepSpeech documentation - External scorer scripts](https://deepspeech.readthedocs.io/en/v0.7.4/Scorer.html)*

The LibriSpeech LM training text used by our scorer is around 4GB uncompressed, which should give an idea of the size of a corpus needed for a reasonable language model for general speech recognition. For more constrained use cases with smaller vocabularies, you don’t need as much data, but you should still try to gather as much as you can.

*We're using data from the [Europarl corpus](https://www.statmt.org/europarl/) for our Dutch scorer. This was the largest Dutch language dataset already formatted that I could find.*

In [10]:
%cd /content/DeepSpeech/data/lm
! wget https://www.statmt.org/europarl/v7/nl-en.tgz
! tar -xzvf nl-en.tgz

/content/DeepSpeech/data/lm
--2020-07-12 02:39:19--  https://www.statmt.org/europarl/v7/nl-en.tgz
Resolving www.statmt.org (www.statmt.org)... 129.215.197.184
Connecting to www.statmt.org (www.statmt.org)|129.215.197.184|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 199077856 (190M) [application/x-gzip]
Saving to: ‘nl-en.tgz’


2020-07-12 02:40:05 (4.18 MB/s) - ‘nl-en.tgz’ saved [199077856/199077856]

europarl-v7.nl-en.en
europarl-v7.nl-en.nl


Build [KenLM](https://github.com/kpu/kenlm) first.

In [11]:
%cd /content
!wget -O - https://kheafield.com/code/kenlm.tar.gz |tar xz
%mkdir -p /content/kenlm/build


/content
--2020-07-12 02:40:12--  https://kheafield.com/code/kenlm.tar.gz
Resolving kheafield.com (kheafield.com)... 35.196.63.85
Connecting to kheafield.com (kheafield.com)|35.196.63.85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 490441 (479K) [application/x-gzip]
Saving to: ‘STDOUT’


2020-07-12 02:40:12 (16.4 MB/s) - written to stdout [490441/490441]



In [12]:
%cd /content/kenlm/build
! cmake ..

/content/kenlm/build
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Boost version: 1.65.1
-- Found the follow

In [13]:
! make -j 4

[35m[1mScanning dependencies of target kenlm_filter[0m
[35m[1mScanning dependencies of target kenlm_util[0m
[  1%] [32mBuilding CXX object lm/filter/CMakeFiles/kenlm_filter.dir/arpa_io.cc.o[0m
[  2%] [32mBuilding CXX object lm/filter/CMakeFiles/kenlm_filter.dir/vocab.cc.o[0m
[  3%] [32mBuilding CXX object lm/filter/CMakeFiles/kenlm_filter.dir/phrase.cc.o[0m
[  5%] [32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum-dtoa.cc.o[0m
[  6%] [32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum.cc.o[0m
[  7%] [32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/cached-powers.cc.o[0m
[  8%] [32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/diy-fp.cc.o[0m
[ 10%] [32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/double-conversion.cc.o[0m
[ 11%] [32mBuilding CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fast-dtoa.cc.o[0m
[ 12%] [32mBuilding

Then use the `generate_lm.py` script to generate `lm.binary` and `vocab-500000.txt`.

As input you can use a plain text (e.g. file.txt) or gzipped (e.g. file.txt.gz) text file with one sentence in each line.

Pass the KenLM build directory to the script.

In [14]:
%cd /content/DeepSpeech/data/lm
! python3 generate_lm.py --input_txt europarl-v7.nl-en.nl --output_dir . \
  --top_k 500000 --kenlm_bins /content/kenlm/build/bin/ \
  --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" \
  --binary_a_bits 255 --binary_q_bits 8 --binary_type trie

/content/DeepSpeech/data/lm

Converting to lowercase and counting word occurrences ...
| |                       #                      | 1997774 Elapsed Time: 0:01:02

Saving top 500000 words ...

Calculating word statistics ...
  Your text file has 50632932 words in total
  It has 496345 unique words
  Your top-500000 words are 100.0000 percent of all words
  Your most common word "de" occurred 3846593 times
  The least common word in your top-k is "nee"," with 1 times
  The first word with 2 occurrences is "tadzjikische" at place 224563

Creating ARPA file ...
=== 1/5 Counting and sorting n-grams ===
Reading /content/DeepSpeech/data/lm/lower.txt.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
tcmalloc: large alloc 2043600896 bytes == 0x55aef9778000 @  0x7f5d4b8a91e7 0x55aef7f87772 0x55aef7f1b358 0x55aef7efa290 0x55aef7ee6096 0x7f5d49a42b97 0x55aef7ee7ada
tcmalloc: large alloc 9536774144 bytes == 0x55af73466000 @  0x7f5d4b8a91e7

Afterwards you can use `generate_package.py` to generate the scorer package using the `lm.binary` and `vocab-500000.txt` files:

In [15]:
%cd /content/DeepSpeech/data/lm/
! python3 ./generate_package.py --alphabet /content/nl/alphabet.txt --lm lm.binary --vocab vocab-500000.txt \
  --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284

/content/DeepSpeech/data/lm
496345 unique words read from vocabulary file.
Doesn't look like a character based model.
Using detected UTF-8 mode: False
Package created in kenlm.scorer


Note that we have a `lm_optimizer.py` script which can be used to find good default values for alpha and beta. To use it, you must first generate a package with any value set for default alpha and beta flags. For this step, it doesn’t matter what values you use, as they’ll be overridden by `lm_optimizer.py`. Then, use `lm_optimizer.py` with this scorer file to find good alpha and beta values. Finally, use `generate_package.py` again, this time with the new values.

*// NOTE: I coudln't get `lm_optimizer.py` to run, but the scorer was good enough to move on to the next step.*

In [16]:
%cd /content/DeepSpeech/data/lm/
! python3 ../../lm_optimizer.py \
  --test_files /content/nl/clips/validated.csv \
  --checkpoint_dir /content/checkpoint-lm \
  --load_evaluate init \
  --scorer kenlm.scorer \
  --alphabet_config_path /content/nl/alphabet.txt

/content/DeepSpeech/data/lm
I0712 02:45:41.706854 139992918103936 utils.py:141] NumExpr defaulting to 2 threads.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
E All initialization methods failed (['best', 'last']).
[0m

###Training a Model
*Taken from [DeepSpeech documentation - Training a Model](https://deepspeech.readthedocs.io/en/v0.7.4/TRAINING.html#training-a-model)*

The central (Python) script is DeepSpeech.py in the project’s root directory. For its list of command line options, you can call:

```python3 DeepSpeech.py --helpfull```

In [17]:
%cd /content/DeepSpeech/
! python3 DeepSpeech.py \
  --train_files /content/nl/cv-corpus-5-2020-06-22/nl/clips/train.csv \
  --dev_files /content/nl/cv-corpus-5-2020-06-22/nl/clips/dev.csv \
  --test_files /content/nl/cv-corpus-5-2020-06-22/nl/clips/test.csv \
  --train_batch_size 1 \
  --test_batch_size 1 \
  --n_hidden 100 \
  --epochs 100 \
  --checkpoint_dir ../checkpoint \
  --export_dir ../model \
  --alphabet_config_path ../nl/alphabet.txt \
  --scorer data/lm/kenlm.scorer

/content/DeepSpeech
I0712 02:45:47.845843 139746079238016 utils.py:141] NumExpr defaulting to 2 threads.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:02:38 | Steps: 2528 | Loss: 144.328282   
Epoch 0 | Validation | Elapsed Time: 0:01:01 | Steps: 2196 | Loss: 122.380548 | Dataset: /content/nl/cv-corpus-5-2020-06-22/nl/clips/dev.csv
I Saved new best validating model with loss 122.380548 to: ../checkpoint/best_dev-2528
--------------------------------------------------------------------------------
Epoch 1 |   Training | Elapsed Time: 0:02:35 | Steps: 2528 | Loss: 116.832910   
Epoch 1 | Validation | Elapsed Time: 0:01:47 | Steps: 2196 | Loss: 107.090517 | Dataset: /content/nl/cv-corpus-5-2020-06-22/nl/clips/dev.csv
I Saved new best validating model with loss 107.090517 to: ../checkpoint/best_dev-5056
--------------------------------------------------------

####Making a mmap-able model for inference

The `output_graph.pb` model file generated in the above step will be loaded in memory to be dealt with when running inference. This will result in extra loading time and memory consumption. One way to avoid this is to directly read data from the disk.

TensorFlow has tooling to achieve this: it requires building the target `//tensorflow/contrib/util:convert_graphdef_memmapped_format` (binaries are produced by our TaskCluster for some systems including Linux/amd64 and macOS/amd64), use `util/taskcluster.py` tool to download:

In [18]:
%cd /content/DeepSpeech/
! python3 util/taskcluster.py --source tensorflow --artifact convert_graphdef_memmapped_format --branch r1.15 --target .

/content/DeepSpeech
Downloading https://community-tc.services.mozilla.com/api/index/v1/task/project.deepspeech.tensorflow.pip.r1.15.cpu/artifacts/public/convert_graphdef_memmapped_format ...
Downloading: 100%



Producing a mmap-able model is as simple as:

In [19]:
! ./convert_graphdef_memmapped_format --in_graph=/content/model/output_graph.pb --out_graph=/content/model/output_graph.pbmm

2020-07-12 10:07:26.357845: I tensorflow/contrib/util/convert_graphdef_memmapped_format_lib.cc:171] Converted 6 nodes


Upon sucessfull run, it should report about conversion of a non-zero number of nodes. If it reports converting `0` nodes, something is wrong: make sure your model is a frozen one, and that you have not applied any incompatible changes (this includes `quantize_weights`).

## Using a Trained Model
*Taken from [DeepSpeech documentation - Using a Pre-trained Model](https://deepspeech.readthedocs.io/en/v0.7.4/USING.html#usage-docs)*

*Now that we've trained a model, let's use it!*

## Installing DeepSpeech Python bindings

Once your environment has been set-up and loaded, you can use `pip3` to manage packages locally. On a fresh setup of the `virtualenv`, you will have to install the DeepSpeech wheel. You can check if `deepspeech` is already installed with `pip3 list`.

To perform the installation, just use `pip3` as such:

In [20]:
! pip3 install deepspeech-gpu

Collecting deepspeech-gpu
  Downloading deepspeech_gpu-0.7.4-cp36-cp36m-manylinux1_x86_64.whl (19.2 MB)
[K     |████████████████████████████████| 19.2 MB 5.3 kB/s 
Installing collected packages: deepspeech-gpu
Successfully installed deepspeech-gpu-0.7.4


*Using the model and scorer we created on one of the Common Voice test files. "ik ben tegen de doodstraf"*

In [21]:
! deepspeech --model /content/model/output_graph.pbmm --audio /content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_19505086.wav --scorer /content/DeepSpeech/data/lm/kenlm.scorer

2020-07-12 10:07:34.401550: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Loading model from file /content/model/output_graph.pbmm
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.4-0-gfcd9563
2020-07-12 10:07:34.550456: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-12 10:07:34.563643: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-12 10:07:34.578508: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-12 10:07:34.579119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID

*Let's try the model on a new audio file we haven't used yet! Here's a YouTube clip saying 'Stroopwafel'.*

In [22]:
%cd /content
! pip install youtube-dl
yt="https://www.youtube.com/watch?v=oXcvvuO1C7c"
! youtube-dl --extract-audio --audio-format wav {yt}

/content
Collecting youtube-dl
  Downloading youtube_dl-2020.6.16.1-py2.py3-none-any.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 2.8 MB/s 
[?25hInstalling collected packages: youtube-dl
Successfully installed youtube-dl-2020.6.16.1
[youtube] oXcvvuO1C7c: Downloading webpage
[youtube] oXcvvuO1C7c: Downloading MPD manifest
[dashsegments] Total fragments: 2
[download] Destination: How to pronounce the word stroopwafel - pronunciation_prononciation_pronunciación-oXcvvuO1C7c.m4a
[K[download] 100% of 80.66KiB in 00:00
[ffmpeg] Correcting container in "How to pronounce the word stroopwafel - pronunciation_prononciation_pronunciación-oXcvvuO1C7c.m4a"
[ffmpeg] Destination: How to pronounce the word stroopwafel - pronunciation_prononciation_pronunciación-oXcvvuO1C7c.wav
Deleting original file How to pronounce the word stroopwafel - pronunciation_prononciation_pronunciación-oXcvvuO1C7c.m4a (pass -k to keep)


In [23]:
from IPython.display import Audio
Audio("/content/How to pronounce the word stroopwafel - pronunciation_prononciation_pronunciación-oXcvvuO1C7c.wav")

In [24]:
! deepspeech --model /content/model/output_graph.pbmm --audio /content/How\ to\ pronounce\ the\ word\ stroopwafel\ -\ pronunciation_prononciation_pronunciación-oXcvvuO1C7c.wav --scorer /content/DeepSpeech/data/lm/kenlm.scorer

2020-07-12 10:09:06.472385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
Loading model from file /content/model/output_graph.pbmm
TensorFlow: v1.15.0-24-gceb46aa
DeepSpeech: v0.7.4-0-gfcd9563
2020-07-12 10:09:06.622512: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-12 10:09:06.623612: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-07-12 10:09:06.637065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-07-12 10:09:06.637719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID

*Not the most accurate model, but this is an example of what you can build using DeepSpeech, Common Voice, the Europarl corpus, and the free GPUs and space available on Google Colab.*

In [25]:
!zip -r /content/DeepSpeech.zip /content/DeepSpeech
!zip -r /content/checkpoint.zip /content/checkpoint
!zip -r /content/kenlm.zip /content/kenlm
!zip -r /content/model.zip /content/model
!zip -r /content/nl.zip /content/nl

!cp /content/*.zip /content/drive/My\ Drive/

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_19723345.mp3 (deflated 3%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_17694612.mp3 (deflated 3%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_22451980.mp3 (deflated 3%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_17797301.wav (deflated 15%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_21689859.mp3 (deflated 3%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_18883919.mp3 (deflated 4%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_20538218.wav (deflated 22%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_17702395.wav (deflated 36%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl_20223929.mp3 (deflated 3%)
  adding: content/nl/cv-corpus-5-2020-06-22/nl/clips/common_voice_nl