<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_asr_asr-python-advanced-finetune-am-citrinet-tao-deployment/nvidia_logo.png" style="width: 90px; float: right;">

# How to deploy a Riva Speech Recognition Pipeline
In this tutorial, you will learn how to deploy Riva speech recognition models - specifically the **Acoustic model (Citrinet)**, **Language model (ngram)**, and **Inverse Text Normalization (WSFT)** pre-trained models downloaded from NVIDIA NGC. 

This will serve as a primer for customization tutorials in this lab, which may require configuring the Riva speech pipeline.

---
## Prerequisites

Before we get started, ensure you have access to [**NVIDIA NGC**](https://ngc.nvidia.com/signin).


---
## Fetch ASR models from NGC
### Download CitriNet Acoustic Model

The CitriNet Acoustic Model is located on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechtotext_en_us_citrinet/files?version=deployable_v3.0). Let's download it to a local path.

In [None]:
# Imports
import os

# Create a local directory to save models
ASR_MODEL_DIR = os.path.join(os.getcwd(), "asr-models")
!mkdir -p $ASR_MODEL_DIR

In [None]:
!ngc registry model download-version "nvidia/tao/speechtotext_en_us_citrinet:deployable_v3.0" --dest $ASR_MODEL_DIR

In [None]:
# Inspect downloaded files
AM_PATH = os.path.join(ASR_MODEL_DIR, "speechtotext_en_us_citrinet_vdeployable_v3.0")
!ls $AM_PATH

### Download the n-gram Language Model

The n-gram LM is located on NGC [here]( https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechtotext_en_us_lm/files?version=deployable_v1.1). 

`NOTE:` This may take a couple of minutes to download.

In [None]:
!ngc registry model download-version "nvidia/tao/speechtotext_en_us_lm:deployable_v1.1" --dest $ASR_MODEL_DIR

In [None]:
# Inspect downloaded files
LM_PATH = os.path.join(ASR_MODEL_DIR, "speechtotext_en_us_lm_vdeployable_v1.1")
!ls $LM_PATH

### Download Inverse Text Normalization (ITN) Model

The ITN model is located on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/speechtotext_en_us_lm/files?version=deployable_v1.1)

In [None]:
!ngc registry model download-version "nvidia/tao/inverse_normalization_en_us:deployable_v1.0" --dest $ASR_MODEL_DIR

In [None]:
# Inspect downloaded files
ITN_PATH = os.path.join(ASR_MODEL_DIR, "inverse_normalization_en_us_vdeployable_v1.0")
!ls $ITN_PATH

---
## Riva ServiceMaker
Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components: `riva-build` and `riva-deploy`

### Riva-build

This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called an RMIR) of an end-to-end pipeline for the supported services within Riva. <br>

`riva-build` is responsible for the combination of one or more exported models (`.riva` files) into a single file containing an intermediate format called Riva Model Intermediate Representation (`.rmir`). This file contains a deployment-agnostic specification of the whole end-to-end pipeline along with all the assets required for the final deployment and inference. For more information, refer to the [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#pipeline-configuration).

In [None]:
# ServiceMaker Docker
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.2.1-servicemaker"

# Directory where the Acoustic .riva model is stored $MODEL_LOC/*.riva
MODEL_LOC = AM_PATH

# Name of the .riva file
MODEL_NAME = "citrinet-1024-Jarvis-asrset-3_0-encrypted.riva"

# Key that model is encrypted with, while exporting with TAO
KEY = "tlt_encode"

# Get the ServiceMaker docker
! docker pull $RIVA_SM_CONTAINER

Below, we execute Riva-build to create a pipeline configured for Offline Recognition. This command for reference is also present in the [pipeline configuration](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-customizing.html#pipeline-configuration) section of the docs. <br>
Information about the parameters to `riva-build` can be found in the [riva-build optional parameters](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-custom.html?highlight=riva%20build#riva-build-optional-parameters) documentation, or accessible through the `riva-build speech_recognition -h` command.

In [None]:
# We use the Riva servicemaker docker to run riva-build. 
! docker run --rm --gpus 0 -v $ASR_MODEL_DIR:/data $RIVA_SM_CONTAINER -- \
            riva-build speech_recognition /data/asr.rmir:$KEY /data/speechtotext_en_us_citrinet_vdeployable_v3.0/$MODEL_NAME:$KEY --offline \
            --streaming=False \
            --wfst_tokenizer_model=/data/inverse_normalization_en_us_vdeployable_v1.0/tokenize_and_classify.far \
            --wfst_verbalizer_model=/data/inverse_normalization_en_us_vdeployable_v1.0/verbalize.far \
            --name=citrinet-1024-en-US-asr-streaming \
            --ms_per_timestep=80 \
            --featurizer.use_utterance_norm_params=False \
            --featurizer.precalc_norm_time_steps=0 \
            --featurizer.precalc_norm_params=False \
            --vad.residue_blanks_at_start=-2 \
            --chunk_size=300 \
            --left_padding_size=0. \
            --right_padding_size=0. \
            --decoder_type=flashlight \
            --flashlight_decoder.asr_model_delay=-1 \
            --decoding_language_model_binary=/data/speechtotext_en_us_lm_vdeployable_v1.1/riva_asr_train_datasets_3gram.binary  \
            --decoding_vocab=/data/speechtotext_en_us_lm_vdeployable_v1.1/flashlight_decoder_vocab.txt  \
            --flashlight_decoder.lm_weight=0.2 \
            --flashlight_decoder.word_insertion_score=0.2 \
            --flashlight_decoder.beam_threshold=20. \
            --language_code=en-US

In [None]:
# Inspect the .rmir
!ls -lt $ASR_MODEL_DIR/*.rmir

### Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

`NOTE`: This step may take about 10 mins to complete

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $ASR_MODEL_DIR:/data $RIVA_SM_CONTAINER -- \
            riva-deploy -f  /data/asr.rmir:$KEY /data/models/

In [None]:
# Inspect the models directory
!ls -lt $ASR_MODEL_DIR

---
## Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. First, download the Riva Quick Start resources from NGC. 

### Download the Riva Quick Start guide
The Riva Quick Start guide contains easy-to-use scripts to download and deploy models. 

`NOTE:` The scripts in Quick Start can download and deploy the default models. We downloaded the ASR models above just to demonstrate how to use Riva ServiceMaker tools, which will be used during customization tutorials to re-deploy the pipeline.

In [None]:
# Downloads the quick start directory to a folder in the current directory and uncompresses it
!ngc registry resource download-version "nvidia/riva/riva_quickstart:2.2.1"

In [None]:
# Set the Riva Quick Start directory
RIVA_QSG = os.path.join(os.getcwd(), "riva_quickstart_v2.2.1")

### Configure Riva Quick Start 
This configures the scripts to deploy the ASR models we obtained as a result of Riva servicemaker tools in the previous section. <br>
For this, we modify the `config.sh` file to enable relevant Riva services (ASR for the Citrinet model), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. 

In [None]:
!ls $RIVA_QSG/config.sh

For example, if above the model repository is generated at `$MODEL_LOC/models`, then you can specify `riva_model_loc` as the same directory as `MODEL_LOC`. <br>

Pretrained versions of models specified in `models_asr/nlp/tts` are fetched from NGC. Since we are using our custom model, we can comment it in `models_asr` (and any others that are not relevant to your use case). <br>

#### config.sh snippet
```
# Enable or Disable Riva Services 
service_enabled_asr=true
service_enabled_nlp=false                                                      ## MAKE CHANGES HERE - SET TO FALSE
service_enabled_tts=false                                                     ## MAKE CHANGES HERE - SET TO FALSE

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo or TAO and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with ASR_MODEL_DIR)                      
```

**Make sure to do the following before moving forward:**
1. In the file navigator in Jupyter Lab, navigate to riva_quickstart_v2.* and open config.sh
2. Configure settings as shown in the snippet above
   - Set nlp and tts services to false
   - Configure the riva_model_loc path to where the models resulting from riva-deploy are stored

In [None]:
# set `riva-model-loc` to where the models resulting from riva-deploy are stored. In our case it is ASR_MODEL_DIR
!echo $ASR_MODEL_DIR

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_QSG && chmod +x ./riva_start.sh

In [None]:
# Run Riva Start to start the server. This will deploy your model(s).
! cd $RIVA_QSG && ./riva_start.sh config.sh

---
## Run Inference
Once the Riva server is up and running with the models, you can send inference requests querying the server. 

To send gRPC requests, you can install the Riva Python API bindings for the client. This is available as a `pip` `.whl` file with the Quick Start resources.

In [None]:
# Install the Client API Bindings
! cd $RIVA_QSG && pip3 install riva_api-2.2.1-py3-none-any.whl

### Connect to the Riva Server and Run Automatic Speech Recognition
The following cells queries the Riva server (using gRPC) with an input audio to yield a transcript.

In [None]:
import io
import IPython.display as ipd
import grpc
import time

try:
    import riva_api.riva_audio_pb2 as ra # RIVA 2.0.0 and above
except:
    import riva_api.audio_pb2 as ra
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv
import wave

In [None]:
# Load a sample audio file from local disk
# This example uses a .wav file with LINEAR_PCM encoding.
audio_file = "audio_samples/en-US_wordboosting_sample1.wav"
with io.open(audio_file, 'rb') as fh:
    content = fh.read()
    
# Listen to the sample audio we are looking to transcribe
ipd.Audio(audio_file)

In [None]:
server = "localhost:50051"

wf = wave.open(audio_file, 'rb')
with open(audio_file, 'rb') as fh:
    data = fh.read()

channel = grpc.insecure_channel(server)
client = rasr_srv.RivaSpeechRecognitionStub(channel)
config = rasr.RecognitionConfig(
    encoding=ra.AudioEncoding.LINEAR_PCM,
    sample_rate_hertz=wf.getframerate(),
    language_code="en-US",
    max_alternatives=1,
    enable_automatic_punctuation=False,
    audio_channel_count=1
)

request = rasr.RecognizeRequest(config=config, audio=data)

response = client.Recognize(request)
print(response)

Now, you are all set with a speech recognition pipeline!