<img src="http://developer.download.nvidia.com/notebooks/dlsw-notebooks/riva_asr_asr-python-advanced-finetune-am-citrinet-tao-deployment/nvidia_logo.png" style="width: 90px; float: right;">

# How to deploy custom Acoustic Model (Citrinet) trained with TAO Toolkit on Riva
This tutorial walks you through the deployment of custom acoustic model (Citrinet) trained with TAO Toolkit on Riva.

---
## Riva ServiceMaker
Riva ServiceMaker is a set of tools that aggregates all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. It has two main components:

### Riva-build

This step helps build a Riva-ready version of the model. It’s only output is an intermediate format (called an RMIR) of an end-to-end pipeline for the supported services within Riva. Let's consider an ASR Citrinet model. <br>

We'll use the n-gram language model (trained in 4th notebook) and the customized acoustic model (from the previous notebook) to deploy the Riva ASR pipeline.

Let's set the path to the language model (.binary file) and acoustic model (.riva) which will be used during `riva build`.<br>
We would also need the decoder vocabulary file which would be used by the language model. We had downloaded it in the first notebook and will re-use it.

In [None]:
# IMPORTANT: UPDATE AM_MODEL_LOC with `asr-model.riva` ABSOLUTE PATH
# IMPORTANT: UPDATE LM_MODEL_LOC with `exported-model.binary` ABSOLUTE PATH
 
import os
# ServiceMaker Docker
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:2.2.1-servicemaker"

# Directory where the asr-model.riva is stored $MODEL_LOC/*.riva
AM_WORKING_DIR = os.path.join(os.getcwd(), "asr_am_finetuning")
AM_MODEL_LOC = AM_WORKING_DIR + "/results/citrinet/riva/"

# Directory where the exported-model.binary is stored
LM_WORKING_DIR = os.path.join(os.getcwd(), "lm-pretraining-artifacts")
LM_MODEL_LOC = LM_WORKING_DIR + "/results/n_gram/export/"

# Directory where the decoder vocab is downloaded
VOCAB_DIR = os.path.join(os.getcwd(), "asr-models")
VOCAB_LOC = VOCAB_DIR + "/speechtotext_en_us_lm_vdeployable_v1.1/"

# Name of the model files
AM_MODEL_NAME = "asr-model.riva"
LM_MODEL_NAME = "exported-model.binary"
VOCAB_FILE = "flashlight_decoder_vocab.txt"

# Key that model is encrypted with, while exporting with TAO
KEY = "tlt_encode"

In [None]:
! docker run --rm --gpus 0 -v $AM_MODEL_LOC:/data_am -v $LM_MODEL_LOC:/data_lm \
            -v $VOCAB_LOC:/data_vocab $RIVA_SM_CONTAINER -- \
            riva-build speech_recognition /data_am/asr.rmir:$KEY /data_am/$AM_MODEL_NAME:$KEY --offline \
            --decoder_type=flashlight \
            --chunk_size=0.16 \
            --padding_size=1.92 \
            --ms_per_timestep=80 \
            --flashlight_decoder.asr_model_delay=-1 \
            --vad.residue_blanks_at_start=-2 \
            --featurizer.use_utterance_norm_params=False \
            --featurizer.precalc_norm_time_steps=0 \
            --featurizer.precalc_norm_params=False \
            --decoding_language_model_binary=/data_lm/$LM_MODEL_NAME \
            --decoding_vocab=/data_vocab/$VOCAB_FILE
            --force

### Riva-deploy

The deployment tool takes as input one or more Riva Model Intermediate Representation (RMIR) files and a target model repository directory. It creates an ensemble configuration specifying the pipeline for the execution and finally writes all those assets to the output model repository directory.

Be patient! This step could take 10-15 minutes!

In [None]:
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
! docker run --rm --gpus 0 -v $AM_MODEL_LOC:/data $RIVA_SM_CONTAINER -- \
            riva-deploy -f  /data/asr.rmir:$KEY /data/models/

---
## Start the Riva Server
After the model repository is generated, we are ready to start the Riva server. First, download the Riva Quick Start resource from NGC. 
Set the path to the directory here:

In [None]:
# Set the Riva Quick Start directory
RIVA_DIR = os.path.join(os.getcwd(), "riva_quickstart_v2.2.1")

# Checking if the quickstart exists, otherwise download it
if os.path.exists(RIVA_DIR):
    print("Quickstart scripts exists, skipping download")
else:
    print("Quickstart scripts does not exist, downloading")
    ! ngc registry resource download-version "nvidia/riva/riva_quickstart:2.2.1"

Next, we modify the `config.sh` file to enable relevant Riva services (ASR for the Citrinet model), provide the encryption key, and path to the model repository (`riva_model_loc`) generated in the previous step among other configurations. 

For example, if above the model repository is generated at `$MODEL_LOC/models`, then you can specify `riva_model_loc` as the same directory as `MODEL_LOC`. <br>

Pretrained versions of models specified in `models_asr/nlp/tts` are fetched from NGC. Since we are using our custom model, we can comment it in `models_asr` (and any others that are not relevant to your use case). <br>

#### config.sh snippet
```
# Enable or Disable Riva Services 
service_enabled_asr=true                                                      ## MAKE CHANGES HERE
service_enabled_nlp=false                                                      ## MAKE CHANGES HERE
service_enabled_tts=false                                                     ## MAKE CHANGES HERE

# Specify one or more GPUs to use
# specifying more than one GPU is currently an experimental feature, and may result in undefined behaviours.
gpus_to_use="device=0"

# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"

# Locations to use for storing models artifacts
#
# If an absolute path is specified, the data will be written to that location
# Otherwise, a docker volume will be used (default).
#
# riva_init.sh will create a `rmir` and `models` directory in the volume or
# path specified. 
#
# RMIR ($riva_model_loc/rmir)
# Riva uses an intermediate representation (RMIR) for models
# that are ready to deploy but not yet fully optimized for deployment. Pretrained
# versions can be obtained from NGC (by specifying NGC models below) and will be
# downloaded to $riva_model_loc/rmir by `riva_init.sh`
# 
# Custom models produced by NeMo or TAO and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="<add path>"                              ## MAKE CHANGES HERE (Replace with MODEL_LOC)                      
```

**Make sure to do the following before moving forward:**
1. In the file navigator in Jupyter Lab, navigate to riva_quickstart_v2.* and open config.sh
2. Configure settings as shown in the snippet above
   - Set nlp and tts services to false
   - Configure the riva_model_loc path to where the models resulting from riva-deploy are stored

In [None]:
# set `riva-model-loc` to where the models resulting from riva-deploy are stored. In our case it is AM_MODEL_LOC
!echo $AM_MODEL_LOC

In [None]:
# Ensure you have permission to execute these scripts
! cd $RIVA_DIR && chmod +x ./riva_stop.sh && chmod +x ./riva_start.sh

In [None]:
# Stop existing Riva deployments. 
! cd $RIVA_DIR && ./riva_stop.sh config.sh 
# Run Riva Start. This will deploy your model(s).
! cd $RIVA_DIR && ./riva_start.sh config.sh

---
## Run Inference
Once the Riva server is up-and-running with your models, you can send inference requests querying the server. 


### Connect to the Riva Server and Run Inference
Now we can actually query the Riva server. The following cell queries the Riva server (using gRPC) to yield a result.

In [None]:
import argparse
import grpc
import time
try:
    import riva_api.riva_audio_pb2 as ra # RIVA 2.0.0 and above
except:
    import riva_api.audio_pb2 as ra
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv
import wave

audio_file = "audio_samples/en-US_sample.wav"
server = "localhost:50051"

wf = wave.open(audio_file, 'rb')
with open(audio_file, 'rb') as fh:
    data = fh.read()

channel = grpc.insecure_channel(server)
client = rasr_srv.RivaSpeechRecognitionStub(channel)
config = rasr.RecognitionConfig(
    encoding=ra.AudioEncoding.LINEAR_PCM,
    sample_rate_hertz=wf.getframerate(),
    language_code="en-US",
    max_alternatives=1,
    enable_automatic_punctuation=False,
    audio_channel_count=1
)

request = rasr.RecognizeRequest(config=config, audio=data)

response = client.Recognize(request)
print(response)

---
### Cleanup

You can stop all Docker containers before shutting down the Jupyter kernel. **Caution: The following command will stop all running containers.**

In [None]:
! docker stop $(docker ps -a -q)