<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 5.0 ASR Deployment with NVIDIA Riva
## (part of Lab 1)

In this notebook, you'll take the `.riva` QuartzNet ASR model you exported with TAO Toolkit and deploy it with [NVIDIA Riva](https://developer.nvidia.com/riva). After the model is deployed in Riva, you can issue inference requests to the Riva server from a client.

**[5.1 NVIDIA Riva](#5.1-NVIDIA-Riva)<br>**
**[5.2 Riva ServiceMaker](#5.2-Riva-ServiceMaker)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.2.1 `riva-build`](#5.2.1-riva-build)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.2.2 `riva-deploy`](#5.2.2-riva-deploy)<br>
**[5.3 Riva Server](#5.3-Riva-Server)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.1 Riva Configuration](#5.3.1-Riva-Configuration)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.2 Exercise: Configure Riva for a Custom ASR](#5.3.2-Exercise:-Configure-Riva-for-a-Custom-ASR)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.3 Riva Start Services](#5.3.3-Riva-Start-Services)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.3.4 Riva Available Services Check](#5.3.4-Riva-Available-Services-Check)<br>
**[5.4 Riva ASR Service Request](#5.4-Riva-ASR-Service-Request)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.4.1 Python Client Demo](#5.4.1-Python-Client-Demo)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[5.4.2 Request Riva ASR service](#5.4.2-Request-Riva-ASR-service)<br>
**[5.5 Streaming ASR](#5.5-Streaming-ASR)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[5.5.1 Stop Riva Services](#5.5.1-Stop-Riva-Services)<br>

### Notebook Dependencies
The steps in this notebook assume that you have set up your NGC credential and that you successfully completed the `.riva` model export with TAO Toolkit in the [previous notebook](004_ASR_TAO_Inference.ipynb).  If you've done that, you are ready to go!  Skip to [section 5.1](#5.1-NVIDIA-Riva).

If you have not exported the ASR model, or if you have restarted your DLI course instance since you exported, you will need:
1. **NGC Credentials**<br>Be sure you have added your NGC credential as described in the [NGC Setup notebook](003_Intro_NGC_Setup.ipynb)
2. **asr-model.riva**
Execute the next cell to load a copy of the exported model into the correct location.

In [1]:
%%bash
mkdir -p /dli/task/tao/results/quartznet/export
cp /dli/task/tao/backup_riva/asr-model.riva /dli/task/tao/results/quartznet/export/

---
# 5.1 NVIDIA Riva

NVIDIA Riva is a platform for building and deploying AI applications that fuse vision, speech and other sensors. It offers a complete workflow to build, train and deploy AI systems that can use visual cues such as gestures and gaze along with speech in context. With the NVIDIA Riva conversational AI platform, you can:

- Build speech and NLP AI applications using pretrained NVIDIA models available at NVIDIA GPU Cloud ([NGC](https://ngc.nvidia.com/catalog/models?orderBy=modifiedDESC&query=%20label%3A%22NeMo%2FPyTorch%22&quickFilter=models&filters=)).

- Train and finetune toolkits: re-train your model on domain-specific data, with NVIDIA [NeMo](https://github.com/NVIDIA/NeMo) and 
[TAO Toolkit](https://docs.nvidia.com/tao/tao-toolkit/index.html#tao-toolkit).

- Optimize neural network performance and latency using [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt)

- Deploy AI applications with [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server):

For more detailed information on Riva, please refer to the [Riva developer documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/index.html).

---
# 5.2 Riva ServiceMaker
ServiceMaker is the set of tools that aggregate all the necessary artifacts (models, files, configurations, and user settings) for Riva deployment to a target environment. In this lab, we are going to use the [riva-speech:1.4.0-beta-servicemaker](https://ngc.nvidia.com/catalog/containers/nvidia:riva:riva-speech) container available on NGC.

<img src="images/asr/servicemaker.png" width=1000>

## 5.2.1 `riva-build`

`riva-build` is responsible for combining one or more exported models (`.riva` format) into a single file containing an intermediate RMIR (Riva Model Intermediate Representation) format. The RMIR file contains a deployment-agnostic specification of the entire pipeline along with all the assets required for the final deployment and inference. Check out the [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/model-overview.html) to find out more.

Start by setting up the relevant paths:

In [16]:
# Set the workspace path to "/path/to/your/workspace"
WORKSPACE = "/dli/task"

# ServiceMaker container
RIVA_SM_CONTAINER = "nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker"

# Directory where the exported .riva model is stored $MODEL_LOC/*.riva
MODEL_LOC = WORKSPACE + "/tao/results/quartznet/export"

# Riva model repo 
RIVA_MODEL_LOC = WORKSPACE + "/riva/riva_quickstart/models_repo"

# Directory where the .rmir model will be stored $RMIR_LOC/*.rmir
RMIR_LOC = RIVA_MODEL_LOC+ "/rmir"

# Name of the .erjvs file
MODEL_NAME = "asr-model.riva"

# Key that model is encrypted with, while exporting with TAO
KEY='tlt_encode'

Pull the Riva ServiceMaker container image (see [NGC riva-speech image repository](https://ngc.nvidia.com/catalog/containers/nvidia:riva:riva-speech)).

In [17]:
# Get the ServiceMaker container
!docker pull $RIVA_SM_CONTAINER

1.4.0-beta-servicemaker: Pulling from nvidia/riva/riva-speech
Digest: sha256:1b3e518158e13af3157478d519ba1025d9b4e704c819d1b05c33b03535edd1c1
Status: Image is up to date for nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker
nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker


Run the container we just pulled with the `riva-build` command.  The [`--decoder_type` is required](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/release-notes.html?highlight=decoder_type#riva-speech-skills-1-4-0-beta).

In [18]:
# Syntax: riva-build <task-name> --decoder_type=<decoder> output-dir-for-rmir/model.rmir:key dir-for-riva/model.riva:key
!docker run --rm --gpus 1 \
    -v $MODEL_LOC:/tao \
    -v $RMIR_LOC:/riva \
    $RIVA_SM_CONTAINER -- \
    riva-build speech_recognition --decoder_type=greedy /riva/asr.rmir:$KEY /tao/$MODEL_NAME:$KEY


=== Riva Speech Skills ===

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2022-06-22 09:53:07,013 [ERROR] /riva/asr.rmir exists.  Use --force/-f to overwrite.


In [19]:
# check the generated rmir model
!ls $RMIR_LOC

asr.rmir


## 5.2.2 `riva-deploy`

The deployment tool takes as input one or more RMIR files and an output directory for the finished models. It creates an ensemble configuration specifying the pipeline for the execution, then writes all those assets to the output model repository. This step will take a few minutes as it generates optimized models with [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt).

[`riva-asr/config.pbtxt`](riva/riva_quickstart/models_repo/models/riva-asr/config.pbtxt) describes the model input/output format and the ensemble scheduling for the ASR task, which includes the following models:

- Feature extractor `riva-asr-feature-extractor-streaming`
- ASR `riva-trt-riva-asr-am-streaming`
- Voice activity detector `riva-asr-voice-activity-detector-ctc-streaming`
- CTC (Connectionist Temporal Classification) decoder `riva-asr-ctc-decoder-cpu-streaming`

<img src="images/asr/riva_asr.png">

In [20]:
# This can take a few minutes
# Syntax: riva-deploy -f dir-for-rmir/model.rmir:key output-dir-for-repository
!docker run --rm --gpus 1 \
     -v $RIVA_MODEL_LOC:/data \
     $RIVA_SM_CONTAINER -- \
     riva-deploy -f  /data/rmir/asr.rmir:$KEY /data/models/


=== Riva Speech Skills ===

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

2022-06-22 09:54:46,278 [INFO] Writing Riva model repository to '/data/models/'...
2022-06-22 09:54:46,278 [INFO] The riva model repo target directory is /data/models/
2022-06-22 09:54:46,525 [INFO] Extract_binaries for featurizer -> /data/models/riva-asr-feature-extractor-streaming/1
2022-06-22 09:54:46,527 [INFO] Extract_binaries for nn -> /data/models/riv

You can check the generated models in the Riva models location, `$RIVA_MODEL_LOC/models`. The ASR pipeline description can be found in $RIVA_MODEL_LOC/models/riva-asr/config.pbtxt 

In [21]:
# Check optimized models 
!ls $RIVA_MODEL_LOC/models

riva-asr
riva-asr-ctc-decoder-cpu-streaming
riva-asr-feature-extractor-streaming
riva-asr-voice-activity-detector-ctc-streaming
riva-trt-riva-asr-am-streaming


In [22]:
# Check ASR ensembling
!cat $RIVA_MODEL_LOC/models/riva-asr/config.pbtxt

name: "riva-asr"
platform: "ensemble"
max_batch_size: 64
input {
  name: "AUDIO_SIGNAL"
  data_type: TYPE_FP32
  dims: [-1]
}
input {
  name: "SAMPLE_RATE"
  data_type: TYPE_UINT32
  dims: [1]
}
input {
  name: "END_FLAG"
  data_type: TYPE_UINT32
  dims: [1]
}
input {
  name: "CUSTOM_CONFIGURATION"
  data_type: TYPE_STRING
  dims: [-1, 2]
}
output {
  name: "FINAL_TRANSCRIPTS"
  data_type: TYPE_STRING
  dims: [-1]
}
output {
  name: "FINAL_TRANSCRIPTS_SCORE"
  data_type: TYPE_FP32
  dims: [-1]
}
output {
  name: "FINAL_WORDS_START_END"
  data_type: TYPE_INT32
  dims: [-1, 2]
}
output {
  name: "PARTIAL_TRANSCRIPTS"
  data_type: TYPE_STRING
  dims: [-1]
}
output {
  name: "PARTIAL_TRANSCRIPTS_STABILITY"
  data_type: TYPE_FP32
  dims: [-1]
}
output {
  name: "PARTIAL_WORDS_START_END"
  data_type: TYPE_INT32
  dims: [-1, 2]
}
output {
  name: "AUDIO_PROCESSED"
  data_type: TYPE_FP32
  dims: [1]
}
parameters {
  key: "chunk_size"
  value {
    string_value: "0.1"
  }
}
parameters {
  key: 

---
# 5.3 Riva Server
After the model repository is generated, we are ready to start the Riva server.  For this step, we use Riva Quick Start scripts downloaded from NGC.  The scripts have already been downloaded for the class.  You can download them yourself, either directly from NGC while logged in, or using the NGC command line tool (see [Riva Speech Skills Quick Start Guide](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/quick-start-guide.html#local-deployment-using-quick-start-scripts)). 

Set `RIVA_QS` to the `riva_quickstart` location:  

In [23]:
# Set the Riva Quick Start directory
RIVA_QS = WORKSPACE + "/riva/riva_quickstart"

In [24]:
!ls $RIVA_QS

asr_lm_tools			    riva_api-1.4.0b0-py3-none-any.whl
config.sh			    riva_clean.sh
examples			    riva_init.sh
models_repo			    riva_start.sh
nb_demo_speech_api.ipynb	    riva_start_client.sh
nemo2riva-1.4.0b0-py3-none-any.whl  riva_stop.sh
protos


There are a number of scripts available for managing Riva services. We can initialize the models using `riva_init.sh`, then start and stop the server with `riva_start.sh` and `riva_stop.sh`. We also need to set flags and values in `config.sh` to specify which services and models we want to initiate and start. 

## 5.3.1 Riva Configuration

Open [config.sh](riva/riva_quickstart/config.sh) and note the following important sections:

##### Enable/Disable Riva Services
For each service, a true value means that the server is enabled for that particular capability.  For example, if we just want to run an ASR server, we can set the `service_enabled_asr` parameter to be `true` and all other parameters `false`.  An enabled service also means that later in the config file, all NGC models listed in the section will be downloaded.
```yaml
# Enable or Disable Riva Services
service_enabled_asr=true
service_enabled_nlp=true
service_enabled_tts=true
```

##### Set the Encryption Key
   We want our encryption consistent for all of our projects, so we want this key to be the same as the one used to export our original model (and it already is!).  For the purposes of this class, this setting won't change.
```yaml
# Specify the encryption key to use to deploy models
MODEL_DEPLOY_KEY="tlt_encode"
```

##### Set the Model Location
`riva_model_loc` should be the folder that contains both the `rmir` folder `models` folders.  This value will need to be changed to the actual absolute path for a given project.
```yaml
# Custom models produced by NeMo or TLT and prepared using riva-build
# may also be copied manually to this location $(riva_model_loc/rmir).
#
# Models ($riva_model_loc/models)
# During the riva_init process, the RMIR files in $riva_model_loc/rmir
# are inspected and optimized for deployment. The optimized versions are
# stored in $riva_model_loc/models. The riva server exclusively uses these
# optimized versions.
riva_model_loc="riva-model-repo"
```

## 5.3.2 Exercise: Configure Riva for a Custom ASR
Using what you've learned, modify [config.sh](riva/riva_quickstart/config.sh) to 
- Enable only the ASR service
- Provide the correct encryption key (should already be correct)
- Specify the correct model repository path to `riva_model_loc`

Check your work against the [solution](solutions/ex5.3.2.sh) before moving on to the next section.  You can verify it with `diff` in the next cell. You should get no "difference" (an empty output) if your config file matches the solution.

In [25]:
# TODO modify config.sh so that this cell verifies changes are correct
# There should be no output if the files match
!diff $RIVA_QS/config.sh solutions/ex5.3.2.sh

11,12c11,12
< service_enabled_nlp=true
< service_enabled_tts=true
---
> service_enabled_nlp=false
> service_enabled_tts=false
43c43
< riva_model_loc="riva-model-repo"
---
> riva_model_loc="/dli/task/riva/riva_quickstart/models_repo"


## 5.3.3 Riva Start Services

The `riva_init.sh` script downloads the Riva containers needed, downloads models listed in `config.sh`, and optimizes  models as required with [NVIDIA TensorRT](https://developer.nvidia.com/tensorrt). Since we've already used the ServiceMaker `riva-deploy` tool to optimize the models we are using, `riva_init.sh` won't have much to do, but it is provided here for completeness.

The `riva_start.sh` script starts the server.

In [29]:
# Ensure you have permission to execute these scripts
!cd $RIVA_QS && chmod +x *.sh

In [30]:
# Initialize Riva
!cd $RIVA_QS && bash riva_init.sh config.sh

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Image nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech-client:1.4.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker exists. Skipping.

Downloading models (RMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.

=== Riva Speech Skills ===

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE

In [31]:
# Run Riva Start. This will deploy your model(s).
!cd $RIVA_QS && bash riva_start.sh config.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...


Riva ASR services should be running when you get "Riva server is ready..." (about 30 seconds).

##### Troubleshooting:
If it failed, open a terminal and clean the Riva model repository with:

```
cd /dli/task/riva/riva_quickstart && bash riva_clean.sh config.sh
```
   
Run `riva-deploy` again as explained in [section 5.2.2](#5.2.2-riva-deploy).

## 5.3.4 Riva Available Services Check

To check the exposed Riva services, run the `docker logs riva-speech` command. 

You should see the following models ready:

<img src="images/asr/riva_speech_logs.png">

In [32]:
!docker logs riva-speech


=== Riva Speech Skills ===

NVIDIA Release 21.07 (build 25292380)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0622 10:05:20.405663 74 metrics.cc:228] Collecting metrics for GPU 0: Tesla T4
I0622 10:05:20.456804 74 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0622 10:05:20.457637 74 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0622 10:05:20.457652 74 onnxruntim

---
# 5.4 Riva ASR Service Request
Now that the Riva server is up and running with your models, you can send inference requests querying the server. 
To send gRPC (a remote procedure call protocol) requests, you can install Riva Python API bindings for the client. This is available as a pip wheel within the Riva Quick Start directory.  For this class, the API has already been installed.  If you were to run this on your own environment, you could do it with:
```sh
cd $RIVA_QS && pip install riva_api-1.3.0b0-py3-none-any.whl
```


In [33]:
cd $RIVA_QS && pip install riva_api-1.3.0b0-py3-none-any.whl

[Errno 2] No such file or directory: '/dli/task/riva/riva_quickstart && pip install riva_api-1.3.0b0-py3-none-any.whl'
/dli/task


## 5.4.1 Python Client Demo
The following cell creates a Python file that queries the Riva server (using gRPC) to yield a result.

In [34]:
%%writefile $RIVA_QS/asr_client.py

import argparse
import wave
import sys
import grpc
import time
import riva_api.audio_pb2 as ra
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv


def get_args():
    parser = argparse.ArgumentParser(description="Streaming transcription via Riva AI Services")
    parser.add_argument("--server", default="localhost:50051", type=str, help="URI to GRPC server endpoint")
    parser.add_argument("--audio-file", required=True, help="path to local file to stream")
    parser.add_argument(
        "--show-intermediate", action="store_true", help="show intermediate transcripts as they are available"
    )
    return parser.parse_args()


def listen_print_loop(responses, show_intermediate=False):
    num_chars_printed = 0
    idx = 0
    for response in responses:
        idx += 1
        if not response.results:
            continue

        result = response.results[0]
        if not result.alternatives:
            continue

        transcript = result.alternatives[0].transcript

        if show_intermediate:
            overwrite_chars = ' ' * (num_chars_printed - len(transcript))

            if not result.is_final:
                sys.stdout.write(">> " + transcript + overwrite_chars + '\r')
                sys.stdout.flush()

                num_chars_printed = len(transcript) + 3

            else:
                print("## " + transcript + overwrite_chars + "\n")
                num_chars_printed = 0
        else:
            if result.is_final:
                print(f"## {transcript.encode('utf-8')}\n")
                sys.stdout.buffer.write(transcript.encode('utf-8'))

CHUNK = 1024
args = get_args()
wf = wave.open(args.audio_file, 'rb')

channel = grpc.insecure_channel(args.server)
client = rasr_srv.RivaSpeechRecognitionStub(channel)


config = rasr.RecognitionConfig(
    encoding=ra.AudioEncoding.LINEAR_PCM,
    sample_rate_hertz=wf.getframerate(),
    language_code="en-US",
    max_alternatives=1,
    enable_automatic_punctuation=True,
)
streaming_config = rasr.StreamingRecognitionConfig(config=config, interim_results=True)

# read data
def generator(w, s):
    yield rasr.StreamingRecognizeRequest(streaming_config=s)
    d = w.readframes(CHUNK)
    while len(d) > 0:
        yield rasr.StreamingRecognizeRequest(audio_content=d)
        d = w.readframes(CHUNK)


responses = client.StreamingRecognize(generator(wf, streaming_config))
listen_print_loop(responses, show_intermediate=args.show_intermediate)

Writing /dli/task/riva/riva_quickstart/asr_client.py


## 5.4.2 Request Riva ASR service
Listen to a sample audio file and test it with the ASR client.

In [35]:
# change path of the file here
import IPython.display as ipd
path = WORKSPACE + '/tao/data/an4_converted/wavs/cen8-fkai-b.wav'
ipd.Audio(path)

In [36]:
# query RIVA asr service
!python3 $RIVA_QS/asr_client.py --audio-file $path

## b'October 1 1969. '

October 1 1969. 

---
# 5.5 Streaming ASR
Riva Quick Start includes a directory of examples. We can use the streaming client to see how the ASR transcribes words as they are spoken in a stream.  Try it!

In [37]:
!python3 $RIVA_QS/examples/riva_streaming_asr_client.py  --input-file $path

Number of clients: 1
Number of iteration: 1
Input file: /dli/task/tao/data/an4_converted/wavs/cen8-fkai-b.wav
File duration: 2.20s
1 threads done, output written to output_<thread_id>.txt


In [38]:
!cat output_0.txt

>>>Time 0.06s: i to
>>>Time 0.06s: a to
>>>Time 0.07s: october
>>>Time 0.07s:  octoberfer
>>>Time 0.07s: october fer
>>>Time 0.09s: october first
>>>Time 0.10s: october first
>>>Time 0.10s: october first night
>>>Time 0.12s: october first nightien
>>>Time 0.12s: october first ninetenh
>>>Time 0.12s: october first ninetene tet
>>>Time 0.14s: october first ninetee sext
>>>Time 0.14s: october first nineteen sixtiny
>>>Time 0.14s: october first nineteen sixtenen
>>>Time 0.16s: october first nineteen sixty nine
>>>Time 0.16s: october first nineteen sixty nine
>>>Time 0.16s: october first nineteen sixty nine
>>>Time 0.18s: october first nineteen sixty nine
Time 0.18s: Transcript 0: october first nineteen sixty nine 
>>>Time 0.18s: 


## 5.5.1 Stop Riva Services 
We need to stop riva services as we will be modifying the deployed models.

In [39]:
# Run Riva Stop. 
!bash $RIVA_QS/riva_stop.sh

Shutting down docker containers...


---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you have:
- Transformed the ASR model to RMIR format
- Configured and exposed the Riva ASR service
- Requested the ASR service using a Python client API

Next, you'll connect Riva services to a web application.  Move on to the [Riva Contact Application](006_ASR_Application.ipynb).

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>