<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 6.0 The Riva Contact Application
## (part of Lab 1)

In this notebook, you'll launch the Riva contact app with default ASR and NER (named entity recognition) services that you can try out for yourself!

<img src="images/asr/contact-app-default.png">

**[6.1 Riva Contact](#6.1-Riva-Contact)<br>**
**[6.2 Start the Riva ASR/NLP Services](#6.2-Start-the-Riva-ASR/NLP-Services)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[6.2.1 Exercise: Configure Riva for Streaming ASR and NER](#6.2.1-Exercise:-Configure-Riva-for-Streaming-ASR-and-NER)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[6.2.2 Start Riva Services](#6.2.2-Start-Riva-Services)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[6.2.3 Riva Available Services Check](#6.2.3-Riva-Available-Services-Check)<br>
**[6.3 Run the Application Service](#6.3-Run-the-Application-Service)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[6.3.1 Start the Contact Web Server](#6.3.1-Start-the-Contact-Web-Server)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[6.3.2 Direct NER](#6.3.2-Direct-NER)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[6.3.3 Stop Riva Services](#6.3.3-Stop-Riva-Services)<br>

### Notebook Dependencies
To run this app on your system, you will need:
1. **Microphone**<br>
For best ASR results, a headset is recommended.  
1. **Chrome browser**<br>
In order to use the app over HTTP in our class setup, you will need to override the browser block to your camera and microphone.  Instructions are included later in the notebook.
1. **NGC Credentials**<br>Be sure you have added your NGC credential as described in the [NGC Setup notebook](003_Intro_NGC_Setup.ipynb)

In [11]:
# Check running docker containers. This should be empty.
!docker ps

CONTAINER ID   IMAGE                                               COMMAND                  CREATED          STATUS          PORTS                                                                                                 NAMES
cf9703a1ed3a   nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-server   "/opt/riva/nvidia_en…"   10 minutes ago   Up 10 minutes   0.0.0.0:50051->50051/tcp, 0.0.0.0:49164->8000/tcp, 0.0.0.0:49163->8001/tcp, 0.0.0.0:49162->8002/tcp   riva-speech


In [12]:
# If not empty,
# Clear Docker containers to start fresh...
!docker kill $(docker ps -q)
# Check for clean environment - this should be empty
!docker ps

cf9703a1ed3a
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES


---
# 6.1 Riva Contact

Riva provides sample applications showing several use cases for virtual assistants and call centers. More about those samples can be found in the [Riva samples documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/). 


Riva Contact (from the [Riva Contact Center Video Conference sample](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/callcenter.html)), is a web-based demonstration application for contact applications powered by conversational AI technologies. 
It is a peer-to-peer video conferencing app using streaming ASR and NLP.
The application is based on a lightweight Node.js and backed by the robust NVIDIA Riva AI Services.

In the background, each user’s web client sends a separate audio stream to the Riva Contact server. The server makes a streaming gRPC call to hosted Riva AI Services, which return an ongoing stream of ASR transcripts. This stream of transcripts is handed back to the speaker’s web client, with in-progress results that may change as the user speaks.

When ASR results are marked as "final" (typically during short pauses in speech), the server hands the resulting transcript over to the NLP service for named entity recognition (NER). If Riva is configured to use a general-domain NER model, the service will recognize entities like the name of a person, location, or organization. 

Once the NER results are complete, the application server returns the final transcript and its NER annotation back to the web client. The web client then exchanges transcripts with the other user for an ongoing, annotated transcript of the conversation.  

<img src="images/asr/riva-contact-architecture.png" width=1000>

Riva Contact is a Node.js application, intended to run in a Linux environment. It requires Riva Speech Services to be running with two primary models:
- Streaming ASR
- Named Entity Recognition (NER)

You can use the default ASR and NER models available in `riva/models`.  Alternatively, you can deploy your own trained custom models for a specific domain - something you will try for yourself in a later notebook!

---
# 6.2 Start the Riva ASR/NLP Services
To deploy Riva AI Services, we can use the Riva Quick Start scripts as we did in the earlier [ASR deployment notebook](005_ASR_Riva_Deployment.ipynb), to set up a local workspace and deploy the Riva services using Docker. 

In [13]:
# Set the workspace path to "/path/to/your/workspace"
WORKSPACE = "/dli/task"

# Set the location of the models directory
RIVA_MODEL_LOC = WORKSPACE + "/riva"

# Set the Riva Quick Start directory
RIVA_QS = WORKSPACE + "/riva/riva_quickstart"

Riva Quick Start configuration offers a list of available ASR, NLP and TTS models that can be downloaded from NGC and optimized for the target hardware. 
This process can take several minutes.  To save time, _the default Riva models have already been downloaded and optimized for the platform in `riva/models`_. 
If you wish to download any of these on your own system, you can uncomment the desired models in the Riva `config.sh` file and they can be downloaded using the [`riva_init.sh` command](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/custom-model-deployment.html?highlight=riva_init%20sh). 

In [14]:
# Check available default riva models 
!ls $RIVA_MODEL_LOC/models

denoiser
quartznet-asr-trt-ensemble-vad-streaming
quartznet-asr-trt-ensemble-vad-streaming-ctc-decoder-cpu-streaming
quartznet-asr-trt-ensemble-vad-streaming-feature-extractor-streaming
quartznet-asr-trt-ensemble-vad-streaming-offline
quartznet-asr-trt-ensemble-vad-streaming-offline-ctc-decoder-cpu-streaming-offline
quartznet-asr-trt-ensemble-vad-streaming-offline-feature-extractor-streaming-offline
quartznet-asr-trt-ensemble-vad-streaming-offline-voice-activity-detector-ctc-streaming-offline
quartznet-asr-trt-ensemble-vad-streaming-voice-activity-detector-ctc-streaming
riva-trt-quartznet
riva-trt-riva_intent_weather-nn-bert-base-uncased
riva-trt-riva_ner-nn-bert-base-uncased
riva-trt-riva_punctuation-nn-bert-base-uncased
riva-trt-riva_qa-nn-bert-base-uncased
riva-trt-riva_text_classification_domain-nn-bert-base-uncased
riva-trt-tacotron2_encoder
riva-trt-waveglow
riva_detokenize
riva_intent_weather
riva_label_tokens_weather
riva_ner
riva_ner_label_tokens
riva_punctuation
riva_punctuat

Since the models are already downloaded and optimized, we don't need to run the `riva_init.sh` script and can move straight to starting the server.

## 6.2.1 Exercise: Configure Riva for Streaming ASR and NER
To expose Riva ASR and NLP services, modify [config.sh](riva/riva_quickstart/config.sh) to 
- Enable ASR and NLP Riva services
- Provide the encryption key (should already be correct)
- Specify the correct model repository path to `riva_model_loc`

Check your work against the [solution](solutions/ex6.2.1.sh) before moving on to the next section.  You can verify it with `diff` in the next cell. You should get no "difference" (an empty output) if your config file matches the solution.  

Note that it would generally be necessary to also uncomment the models needed from NGC for streaming ASR and NER in the configuration file.  In this lab, it is not necessary because they've already been downloaded.

In [15]:
# TODO modify config.sh so that this cell verifies changes are correct
# There should be no output if the files match
!diff $RIVA_QS/config.sh solutions/ex6.2.1.sh

## 6.2.2 Start Riva Services

Now, we are ready to start the Riva server with ASR and NLP services. We'll first initialize with `riva_init.sh`.  The `riva_start.sh` script starts the server.

In [16]:
# Ensure you have permission to execute these scripts.
!cd $RIVA_QS && chmod +x *.sh

In [17]:
# Initialize Riva
!cd $RIVA_QS && bash riva_init.sh config.sh

Logging into NGC docker registry if necessary...
Pulling required docker images if necessary...
Note: This may take some time, depending on the speed of your Internet connection.
> Pulling Riva Speech Server images.
  > Image nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-server exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech-client:1.4.0-beta exists. Skipping.
  > Image nvcr.io/nvidia/riva/riva-speech:1.4.0-beta-servicemaker exists. Skipping.

Downloading models (RMIRs) from NGC...
Note: this may take some time, depending on the speed of your Internet connection.
To skip this process and use existing RMIRs set the location and corresponding flag in config.sh.

=== Riva Speech Skills ===

NVIDIA Release devel (build 22382700)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE

In [18]:
# Start the Riva server
!cd $RIVA_QS && bash riva_start.sh config.sh

Starting Riva Speech Services. This may take several minutes depending on the number of models deployed.
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Waiting for Riva server to load all models...retrying in 10 seconds
Riva server is ready...


Riva ASR and NLP services are running when you get "Riva server is ready..." (about 40 seconds)

## 6.2.3 Riva Available Services Check

To check the exposed Riva services, execute the `docker logs riva-speech` command. 

In [19]:
!docker logs riva-speech


=== Riva Speech Skills ===

NVIDIA Release 21.07 (build 25292380)

Copyright (c) 2018-2021, NVIDIA CORPORATION.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.
NVIDIA modifications are covered by the license terms that apply to the underlying
project or file.

NOTE: Legacy NVIDIA Driver detected.  Compatibility mode ENABLED.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for the inference server.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 ...

  > Riva waiting for Triton server to load all models...retrying in 1 second
I0622 10:24:33.262959 74 metrics.cc:228] Collecting metrics for GPU 0: Tesla T4
I0622 10:24:33.275503 74 onnxruntime.cc:1722] TRITONBACKEND_Initialize: onnxruntime
I0622 10:24:33.275537 74 onnxruntime.cc:1732] Triton TRITONBACKEND API version: 1.0
I0622 10:24:33.275542 74 onnxruntim

---
# 6.3 Run the Application Service

In the app, we use Riva for two purposes:
1. To get a streaming transcript of the conversation using ASR
1. To tag key phrases (named entities) in that transcript with NER. 

The flow is:
- Extract an audio stream from the client and pass that audio to the Node.js server (step 1).
- The Node.js server calls Riva services, using gRPC, to get transcripts (step 2) and named entities (step 4)
- The Node.js server sends the results back to the client (step 3 and 5).
- The client can then render the transcripts in the browser and pass the transcripts over the peer-to-peer connection so that both users can see the whole conversation.

Riva Contact uses environment variables to manage its configuration parameters. These are kept in a configuration file, [env.txt](contact-app/env.txt). In your own project, you may wish to change some of these configurations.  For example, the NER entity list would change if you were to deploy your own custom domain-specific NER model.

## 6.3.1 Start the Contact Web Server
To start the web service, open a JupyterLab terminal.  You can do this by first opening the JupyterLab Launcher (small '+' sign at the top of the file browser) and clicking the "Terminal" icon.  Next, enter the following in the terminal to start the app server:  

```sh
cd /dli/task/contact-app
npm install
npm run start
```

In [20]:
# Run this cell, then click the link to open a terminal
# Enter the commands provided above in the terminal window to start the web server
from IPython.display import HTML
HTML('<a href="", data-commandlinker-command="terminal:create-new">Open Terminal</a>')

In the terminal window, you should see that the server has started running on port 8009:

<img src="images/asr/webserver_running.png">

After you have started the server, execute the following cell to create a link to open the app! 

In [21]:
%%js
const href = window.location.hostname + '/app/';
let a = document.createElement('a');
let link = document.createTextNode('Open Riva Contact!');
a.appendChild(link);
a.href = "http://" + href;
a.style.color = "navy"
a.target = "_blank"
element.append(a);

<IPython.core.display.Javascript object>

When you open the app, you may see a "Lost connection to server" alert.  Just click "Ok".  Other than that, your initial view should look like the following:

<img src="images/asr/riva_contact_start.png">

<h3> ***WARNING Browser Restrictions WARNING***</h3>

To use the web app, access to your microphone is required (camera is optional).<br>
Several browsers restrict camera/microphone access to applications served from a secure origin (HTTPS or local IP).
For your own development purposes, you can set up a self-signed certificate or proxy.  

Some browsers provide a way to treat specific URLs as secure:
- Chrome browser: <br>
Configure the "treat insecure origin as secure" flag by adding the application URL on the following page: <br>
   ***chrome://flags/#unsafely-treat-insecure-origin-as-secure***<br>
(Copy and paste this "chrome://" link to a tab on your browser to open the page) <br>
You'll see the flag with a text window at the top of the page.  Add your own course URL to the box.  Here is an example (your URL is different).<br>
More discussion can be found in [this blog](https://medium.com/@Carmichaelize/enabling-the-microphone-camera-in-chrome-for-local-unsecure-origins-9c90c3149339).

<img src="images/asr/chrome_override_example.png">

- Safari browser: <br>
Enable in the menu: Develop > WebRTC > Allow Media Capture on Insecure Sites

This will require you to restart the browser. No worries, nothing will be lost.

Go to the Application URL. Once the page is loaded, you're welcome to start the Riva transcription.
In the box titled "Riva transcription," hit the "Start" button, then start speaking. You'll see in-progress transcripts in the text field at the bottom. As those transcripts are finalized, they'll appear, with NER annotations, in the transcription box.

## 6.3.2 Direct NER 

It is also possible to call the NER service directly without speaking. Type into the text field at the bottom and hit the "Submit" button. This will directly call the NER capability exposed by Riva with the submitted text (without calling the ASR service).

## 6.3.3 Stop Riva Services

In [None]:
# Shut down Riva 
!bash $RIVA_QS/riva_stop.sh
# Shut down web app
!pkill -9 node

---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you have:
- Started ASR and NER default Riva services
- Launched the Riva Contact app 
- Demonstrated live streaming ASR with NER for yourself!

This concludes the ASR portion of the course.  Next, you'll start a deeper dive into NLP services, beginning with the [NER fine-tuning notebook](007_NLP_Finetune_NER.ipynb).

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>