# Building a Coqui Speech to Text Client

The Coqui project uses Bazel as its main build system but it's quite a complex and daunting tool for most users. Happily the team also provide binary releases of all the libraries, so this notebook shows how you can use the existing makefile to build the example `stt` client executable, using just the binary release without Bazel.
The goal of this script is to demonstrate how you can easily integrate speech to text into your own programs.

## Download Binary Release

First we download the example executable, `stt`, and the shared library, `libstt.so`, that contains the compiled framework code, all parts of the native_client package.

In [None]:
!mkdir -p stt_binary_release
!wget --quiet https://github.com/coqui-ai/STT/releases/download/v1.1.0/native_client.tflite.Linux.tar.xz
!unxz native_client.tflite.Linux.tar.xz
!tar -xf native_client.tflite.Linux.tar --directory stt_binary_release

## Download Source Code

Here we're going to download the full source code package for STT, but we're only going to actually use a few files inside the native_client folder. I chose to go this route rather than forking the repo and cherry-picking just the files we need because this approach should be easier to keep in sync with code updates from Coqui.

In [None]:
!wget --quiet https://github.com/coqui-ai/STT/archive/refs/tags/v1.1.0.tar.gz
!tar -xzf v1.1.0.tar.gz
!mv STT-1.1.0 stt_source_code

## Audio Example Files

To demonstrate how the speech to text tool works, we need some WAV files to try it out on. Luckily Coqui provide some examples, together with transcripts of the expected output.

In [None]:
!wget --quiet https://github.com/coqui-ai/STT/releases/download/v1.1.0/audio-1.1.0.tar.gz
!tar -xzf audio-1.1.0.tar.gz

tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.macl'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.macl'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.macl'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.macl'
tar: Ignoring unknown extended header keyword 'LIBARCHIVE.xattr.com.apple.quarantine'


## Recognition Model Download

For this example I've chosen the English large vocabulary model, but there are over 80 different versions available for many languages at [coqui.ai/models](https://coqui.ai/models). Note that this is the *recognition* model, not the *language* model. Language models are used to post-process the results of the neural network, and are optional. To keep things simple, in this example we're just using the raw recognition model output, but there are lots of options to improve the quality for a particular application if you investigate things like language models and hotwords.

In [None]:
!wget --quiet https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v1.0.0-large-vocab/model.tflite

## SOX Library Installation

The example client program uses the SOX audio library to read WAV files and convert them to the expected sample rate. Since a Colab instance doesn't have SOX installed by default, we'll use `apt-get` to install it.

In [None]:
!apt-get -qq install -y libsox-dev

Selecting previously unselected package libopencore-amrnb0:amd64.
(Reading database ... 155632 files and directories currently installed.)
Preparing to unpack .../00-libopencore-amrnb0_0.1.3-2.1_amd64.deb ...
Unpacking libopencore-amrnb0:amd64 (0.1.3-2.1) ...
Selecting previously unselected package libopencore-amrwb0:amd64.
Preparing to unpack .../01-libopencore-amrwb0_0.1.3-2.1_amd64.deb ...
Unpacking libopencore-amrwb0:amd64 (0.1.3-2.1) ...
Selecting previously unselected package libmagic-mgc.
Preparing to unpack .../02-libmagic-mgc_1%3a5.32-2ubuntu0.4_amd64.deb ...
Unpacking libmagic-mgc (1:5.32-2ubuntu0.4) ...
Selecting previously unselected package libmagic1:amd64.
Preparing to unpack .../03-libmagic1_1%3a5.32-2ubuntu0.4_amd64.deb ...
Unpacking libmagic1:amd64 (1:5.32-2ubuntu0.4) ...
Selecting previously unselected package libao-common.
Preparing to unpack .../04-libao-common_1.2.2+20180113-1ubuntu1_all.deb ...
Unpacking libao-common (1.2.2+20180113-1ubuntu1) ...
Selecting previou

## Compiling the STT Client

Now we have all the dependencies, we can invoke `make` to build the `stt` client program. Because the makefile expects to be run from inside the `native_client` folder, we use the `-C` option to direct `make` to the right path. The only change we have to apply to the standard makefile is overriding the expected location of the libraries to point to the folder we downloaded containing the binary releases, rather than the default of looking for them in a `Bazel` build folder.

With v1.1.0 you'll see some log messages like `objdump: '/usr/lib/x86_64-linux-gnu/libmagic.so': No such file`, but these appear to be harmless warnings rather than true errors. This notebook's not official documentation, just an accompaniment to the blog post at [petewarden.com/2021/12/27/how-to-get-started-with-coquis-open-source-on-device-speech-to-text-tool/](https://petewarden.com/2021/12/27/how-to-get-started-with-coquis-open-source-on-device-speech-to-text-tool/).


In [None]:
!make -C stt_source_code/native_client stt LINK_PATH_STT=-L../../stt_binary_release

make: Entering directory '/content/stt_source_code/native_client'
objdump: '/usr/lib/x86_64-linux-gnu/libmagic.so': No such file
objdump: '/usr/lib/x86_64-linux-gnu/libmagic.so': No such file
c++   -std=c++11 -o stt -I/content/stt_source_code/sox-build/include client.cc  -Wl,--no-as-needed -Wl,-rpath,\$ORIGIN -L../../stt_binary_release  -lstt -lkenlm -ltflitedelegates -ltensorflowlite  -L/content/stt_source_code/sox-build/lib -lsox
make: Leaving directory '/content/stt_source_code/native_client'


## Running the Tool

We should now have the `stt` example program built, so let's run it! For the purposes of keeping this notebook I've avoided installing the shared libraries from the binary in system paths, so we do have to tell the shell where to find our downloaded versions using the `LD_LIBRARY_PATH` environment variable. In typical deployment, these would have been copied to somewhere like `/usr/lib` during installation and so the variable wouldn't be needed.

If everything's worked successfully, you should see some version information followed by a transcript of the audio file, "why should one halt on the way". Now you can start modifying the `stt_source_code/native_client/client.cc` source file to build your own program using speech to text. There's a GitHub repo showing a minimal set of files at [github.com/petewarden/stt_standalone_client](https://github.com/petewarden/stt_standalone_client).

In [None]:
!LD_LIBRARY_PATH=stt_binary_release stt_source_code/native_client/stt --model ./model.tflite --audio ./audio/Recording_1.wav

TensorFlow: v2.3.0-14-g4bdd3955115
 Coqui STT: v1.1.0-0-gf3605e23
if everything works successfully you should see some vorshan in formation followed by a transcripe of dhogo pire why should one halt on doiy now your can start more difying the source wile the wild or own progrem using thes spitch to texts
