<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 11.0 Deploying Riva Services within a Kubernetes Cluster and Further Riva API Examples 
## (part of Lab 3)

In this notebook, you'll deploy NVIDIA Riva within Kubernetes, and try some API queries for text-to-speech (TTS) and natural language processing (NLP).

**[11.1 Deploy NVIDIA Riva](#11.1-Deploy-NVIDIA-Riva)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[11.1.1 Exercise: Configure Helm Values and Deploy](#11.1.1-Exercise:-Configure-Helm-Values-and-Deploy)<br>
**[11.2 Riva Services](#11.2-Riva-Services)<br>**
**[11.3 Riva TTS Example](#11.3-Riva-TTS-Example)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[11.3.1 Exercise: Pod IP with Port 50051](#11.3.1-Exercise:-Pod-IP-with-Port-50051)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[11.3.2 Exercise: LoadBalancer IP with Port 50051](#11.3.2-Exercise:-LoadBalancer-IP-with-Port-50051)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[11.3.3 Exercise: Localhost with Mapped Port](#11.3.3-Exercise:-Localhost-with-Mapped-Port)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[11.3.4 Upgrade the Service with Helm](#11.3.4-Upgrade-the-Service-with-Helm)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[11.3.4.1 Exercise: Upgrade the Service Type to NodePort](#11.3.4.1-Exercise:-Upgrade-the-Service-Type-to-NodePort)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[11.3.4.2 Verify the Upgrade](#11.3.4.2-Verify-the-Upgrade)<br>
**[11.4 Riva NLP Examples](#11.4-Riva-NLP-Examples)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[11.4.1 `AnalyzeIntent` API](#11.4.1-AnalyzeIntent-API)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[11.4.2 `TextTransform` API](#11.4.2-TextTransform-API)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[11.4.3 Shutdown](#11.4.3-Shutdown)<br>

In the previous parts of the class, you have deployed Riva using very basic shell commands. 
You have also deployed a basic CUDA application to a Kubernetes cluster.
Now it is time to put it all together and deploy Riva into production!

### Notebook Dependencies
1. The steps in this notebook assume that you are starting with a K8s cluster that is GPU enabled with feature discovery.  Let's ensure that by stopping and restarting the a cluster and bringing it to a known state. 
2. As with earlier NVIDIA Riva deployments, you need NGC API credentials.  In this case, you'll also need your email address.

In [None]:
# Delete and restart K8s
!minikube delete
!minikube start --driver=none
# Install the GPU device plugin with Helm
!helm repo add nvdp https://nvidia.github.io/k8s-device-plugin \
    && helm repo update
!helm install \
    --generate-name nvdp/nvidia-device-plugin
# Install GPU feature discovery with Helm
!helm repo add nvgfd https://nvidia.github.io/gpu-feature-discovery \
    && helm repo update
!helm install \
    --version=0.4.1 \
    --generate-name nvgfd/gpu-feature-discovery

In [None]:
# Fill in your personal API key and email address (valid in the scope of this notebook)
NGC_API_KEY = "YOUR_NGC_API_KEY"
NGC_EMAIL = "YOUR_EMAIL"

---
# 11.1 Deploy NVIDIA Riva

To deploy NVIDIA Riva on Kubernetes, start by fetching `riva-api` with Helm, and examining the assets downloaded.

In [None]:
# Fetch riva-api with Helm
# You may already have fetched this earlier in the course through Riva
!helm fetch https://helm.ngc.nvidia.com/nvidia/riva/charts/riva-api-1.4.0-beta.tgz \
    --username='$oauthtoken' --password=$NGC_API_KEY --untar

In [None]:
!ls -l riva-api

The configuration file, `values.yaml` contains a number of settings for the service including image details, credentials, and service type.  It also contains a list of ASR, NLP, and TTS models that will be downloaded and optimized upon initialization under `ngcModelConfigs:`

In [None]:
!cat riva-api/values.yaml

The Helm Chart starts two containers:
* `riva-model-init` - Responsible for fetching all of the model assets configured in `values.yaml` and their optimization for the target platform (appropriate TensorRT optimization will be executed).  After initialization is complete, this container will self-terminate.
* `riva-speech-api` - Hosts Riva services after initialization is complete. 

Before proceeding, we'll need to make some edits to set the configurations in `values.yaml` to match our environment and limit the models deployed.  If we deploy all the possible models, we may run out of memory!

In [None]:
# Here is where Riva models are located in our class environment
!ls -al /dli/task/riva

## 11.1.1 Exercise: Configure Helm Values and Deploy
Modify the YAML file for our environment and deploy `riva-api` with Helm.  For our environment, the host path location for Riva `models`, `rmir`, and `artifacts` is `/dli/task/riva`.  We also need to comment out all of the models listed to avoid unnecessary deployments as we already have our models we need in the `/dli/task/riva` directory.

Exercise:
* Open the [values.yaml](riva-api/values.yaml) config file
* Comment out all uncommented models under `ngcModelConfigs:`
* Modify the `modelDeployVolume.hostPath.path` to reflect our environment
* Modify `artifactDeployVolume.hostPath.path` to reflect our environment
* Save the file
* Check your work against the [solution](solutions/ex11.1.1.yaml) before moving on
* Deploy it!

In [None]:
# TODO modify values.yaml so that this cell verifies changes are correct
# Check your work - your file should have the same uncommented models (none!) and folder paths as the solution!
print("YOUR SETTINGS\n=============")
!cat riva-api/values.yaml | grep -v "^\s*[#;]" | sed -n '/ngcModelConfigs:/,/modelDeployVolume:/p' | sed ';$d'
!cat riva-api/values.yaml | grep -v "^\s*[#;]" | grep -A 20 modelDeployVolume: | grep 'DeployVolume\|path'
print("\nSOLUTION SETTINGS\n=================")
!cat solutions/ex11.1.1.yaml | grep -v "^\s*[#;]" | sed -n '/ngcModelConfigs:/,/modelDeployVolume:/p' | sed ';$d'
!cat solutions/ex11.1.1.yaml | grep -v "^\s*[#;]" | grep -A 20 modelDeployVolume: | grep 'DeployVolume\|path'

In [None]:
%env model_key_string=tlt_encode

!helm install riva-api \
    --generate-name \
    --set ngcCredentials.password=`echo -n $NGC_API_KEY | base64 -w0` \
    --set ngcCredentials.email=$NGC_EMAIL \
    --set modelRepoGenerator.modelDeployKey=`echo -n model_key_string | base64 -w0`

---
# 11.2 Riva Services

In [None]:
!kubectl describe pods riva-api

At first, the models are downloading (this is reflected in the status), so we have to wait. Wait a minute and look at the status again.

In [None]:
!kubectl describe pods riva-api | grep -A 2 'Containers:\|State:'

In [None]:
!kubectl describe pods riva-api 

We need to wait until the status of the `riva-model-init` container changes from "Waiting" to "Running". You can keep executing the previous command to check as many times as needed.  Once `riva-model-init` is "Running", we should be able to view the Docker container logs. We need the name of the pod to view the logs, which we'll grab with a Linux `grep` command.

In [None]:
%%bash
# Grab the name
RIVA_API_LONGNAME=$(kubectl describe pods riva-api | grep "Name:         riva-api-" | awk '{print $2}')
echo "The pod name is $RIVA_API_LONGNAME"
# Check the logs
kubectl logs $RIVA_API_LONGNAME --container=riva-model-init

The logs should say that the models are already deployed and optimized and that the initialization has finished.  For example, they should consist of lines like:

```
    Directory rmir_text_classification_v1.0.0-b.1 already exists, skipping. Use '--force' option to override.
    Directory rmir_named_entity_recognition_v1.0.0-b.1 already exists, skipping. Use '--force' option to override.
    Directory rmir_riva_tts_ljspeech_v1.0.0-b.1 already exists, skipping. Use '--force' option to override.
    /opt/riva
    2021-05-17 17:25:37,336 [INFO] Writing Riva model repository to ...
    2021-05-17 17:25:37,336 [INFO] The riva model repo target directory is /data/models
    2021-05-17 17:25:39,892 [WARNING] /data/models/riva_tokenizer already exists, skipping deployment.
    2021-05-17 17:25:39,892 [WARNING] /data/models/riva-trt-riva_intent_weather-nn-bert-base-uncased already exists, skipping deployment. 
```
    
Troubleshooting note:<br>
If there is a mistake in the path configuration, then the initialization container will attempt to download all of the assets. The models take aproximately 6GB of space and their target-specific optimization is a non-trivial task.  Therefore, this step can take 45+ minutes. If the logs are saying that Riva is downloading models, you can uninstall this helm deployment by executing `!helm uninstall riva-api`, correct the [values.yaml](riva-api/values.yaml) file, and try deploying again. 

When Riva model initialization is complete, Riva services will initialize. This can also take a while as all models need to be loaded to memory and verified, and there are quite a few models!

In [None]:
!ls -l riva/models
!du -sh riva/models

Check to see if the service container, `riva-speech-api` is running yet.<br>
Once it is, take a look at the logs for the container.  The logs should list all the models loaded and confirm that "Riva Conversational AI Server listening on 0.0.0.0:50051" in the last line.

In [None]:
# Repeat execution of this cell until riva-speech-api is "Running" and "Ready"
!kubectl describe pods riva-api | grep -A 2 'Containers:\|State:'

In [None]:
%%bash
# Grab the name
RIVA_API_LONGNAME=$(kubectl describe pods riva-api | grep "Name:         riva-api-" | awk '{print $2}')
echo "The pod name is $RIVA_API_LONGNAME"
# Check the logs
kubectl logs $RIVA_API_LONGNAME --container=riva-speech-api

---
# 11.3 Riva TTS Example

If you have observed "Riva Conversational AI Server listening on 0.0.0.0:50051" in the logs, we are ready to run an application. We will query the API with a TTS example. <br>
First, import the dependencies:

In [None]:
import io
import librosa
from time import time
import numpy as np
import IPython.display as ipd
import grpc
import requests

# NLP proto
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv

# ASR proto
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv

# TTS proto
import riva_api.riva_tts_pb2 as rtts
import riva_api.riva_tts_pb2_grpc as rtts_srv
import riva_api.riva_audio_pb2 as ra 

Configure the connection to our server. As you might recall, the service is listening on port 50051. Lets try configuring localhost:50051.  Call the app and output an audio file to listen to.

In [None]:
channel = grpc.insecure_channel('localhost:50051')

Next, we'll create a little function that sets the channel and submits a line of text to the `SynthesizeSpeech` model and returns an audio sample.  Then run the audio!

In [None]:
def test_tts(input_channel, input_text):
    riva_tts = rtts_srv.RivaSpeechSynthesisStub(input_channel)
    
    req = rtts.SynthesizeSpeechRequest()
    req.text = input_text
    req.language_code = "en-US"                    # currently required to be "en-US"
    req.encoding = ra.AudioEncoding.LINEAR_PCM     # Supports LINEAR_PCM, FLAC, MULAW and ALAW audio encodings
    req.sample_rate_hz = 22050                     # ignored, audio returned will be 22.05KHz
    req.voice_name = "ljspeech"                    # ignored

    resp = riva_tts.Synthesize(req)
    audio_samples = np.frombuffer(resp.audio, dtype=np.float32)
    return audio_samples

In [None]:
ipd.Audio(test_tts(channel, "Is it recognize speech or wreck a nice beach?"), rate=22050)

Well, that didn't work... Why?

## 11.3.1 Exercise: Pod IP with Port 50051

When running Riva from within Kubernetes, our "localhost" IP (127.0.0.1) is not connected to the Riva services.  There are a few different pathways we could use to send our request.  The first is to select the Riva API pod IP address and send our requests there. The IP is listed in the pod description. 

In [None]:
!kubectl get pod -o wide 

In [None]:
!kubectl get service --all-namespaces

Replace the `POD_IP` with the actual IP value in the next cell and try it this way.

In [None]:
#TODO replace the POD_IP
channel = grpc.insecure_channel('POD_IP:50051')

ipd.Audio(test_tts(channel, "Is it recognize speech or wreck a nice beach?"), rate=22050)

Did that work?  There is another way as well.  

## 11.3.2 Exercise: LoadBalancer IP with Port 50051

In [None]:
!kubectl get services

Alternatively, we could use the load balancer IP that is set up with a 50051 port mapping for requests.  

Replace the `LOADBALANCER_IP` with the actual value in the next cell and try it this way.

In [None]:
#TODO replace the LOADBALANCER_IP
channel = grpc.insecure_channel('LOADBALANCER_IP:50051')

ipd.Audio(test_tts(channel, "Is it recognize speech or wreck a nice beach?"), rate=22050)

## 11.3.3 Exercise: Localhost with Mapped Port
Connect to the external facing port mapped from the load balancer to localhost. In this case, this port is assigned randomly, so lets check what it is by looking at the port mapped to 50051 in the services list:

In [None]:
!kubectl get services

Replace the `MAPPED_PORT` with the actual value in the next cell and try it this way.

In [None]:
#TODO replace the MAPPED_PORT
channel = grpc.insecure_channel('localhost:MAPPED_PORT')

ipd.Audio(test_tts(channel, "Is it recognize speech or wreck a nice beach?"), rate=22050)

## 11.3.4 Upgrade the Service with Helm
Load balancing is used to distribute tasks over a set of compute resources.  Since we have just one GPU and pod in our example, we do not need the load balancer.  We can turn it off by changing the service type in the `values.yaml` file executing the upgrade command. Here's what we have now:

In [None]:
!cat ./riva-api/values.yaml | grep -A 3 service:

The [`helm upgrade` command](https://helm.sh/docs/helm/helm_upgrade/) has the form:

```
helm upgrade [RELEASE] [CHART] [flags]
```

   * CHART is the archive location of the `chart.yaml` file, `riva-api`
   * RELEASE is be the specific name of the riva-api service deployed. 
   
RELEASE is listed in the services names, so we can grab it from there. 

In [None]:
%%bash
# Show the RELEASE value
RELEASE=$(kubectl get svc -A | grep "riva-api"| awk '{print $2}')
echo $RELEASE

### 11.3.4.1 Exercise: Upgrade the Service Type to NodePort
Modify the YAML file for to change the service type from `LoadBalancer` to `NodePort` and upgrade it with Helm.

Exercise:
* Open the [values.yaml](riva-api/values.yaml) config file
* Modify the "service.type" to "NodePort"
* Save the file
* Check your work against the [solution](solutions/ex11.3.4.1.yaml) before moving on
* Upgrade the service!

In [None]:
# TODO modify values.yaml so that this cell verifies changes are correct
# Check your work - your file should have the values as the solution!
print("YOUR SETTING")
!cat ./riva-api/values.yaml | grep -A 3 service:
print("\nSOLUTION SETTING")
!cat solutions/ex11.3.4.1.yaml | grep -A 3 service:

In [None]:
%%bash
RELEASE=$(kubectl get svc -A | grep "riva-api"| awk '{print $2}')
helm upgrade $RELEASE riva-api

### 11.3.4.2 Verify the Upgrade
Since we have configured port 32222 as our NodePort, we should see the change now in our live service list:

In [None]:
!kubectl get services

As a consequence, we have a known IP:PORT value to reliably expose the Riva server.

In [None]:
channel = grpc.insecure_channel('localhost:32222')

ipd.Audio(test_tts(channel, "Is it recognize speech or wreck a nice beach?"), rate=22050)

What did the code actually do? It executed a request to a TTS service transcribing the sentence provided, then generated an audio file with the transcript.

---
# 11.4 Riva NLP Examples

In the TTS example, we used the `SynthesizeSpeechRequest` API to synthesize speech.  We can similarly make a requests with other APIs and we'll try a couple of the NLP examples.  You can find more in the [Riva Speech Skills documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/notebooks/Riva_speech_API_demo.html#).

## 11.4.1 `AnalyzeIntent` API
The `AnalyzeIntent` API can be used to query an "intent slot" classifier. If we don't have a specific domain, this API can be leveraged with an additional text classification model to classify the domain of the input query before routing the text to the appropriate intent slot model.

We'll keep things simple and use an example where the domain is known. This example skips execution of the domain classifier
and proceeds directly to the intent slot model for the requested domain.

In [None]:
channel = grpc.insecure_channel('localhost:32222')
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)
req = rnlp.AnalyzeIntentRequest()
req.query = "How is the humidity in San Francisco?"
req.options.domain = "weather"  # The <domain_name> is appended to "riva_intent_" to look for a
                                # model "riva_intent_<domain_name>". So in this e.g., the model "riva_intent_weather"
                                # needs to be preloaded in riva server. If you would like to deploy your
                                # custom Joint Intent and Slot model use the `--domain_name` parameter in
                                # ServiceMaker's `riva-build intent_slot` command.

resp = riva_nlp.AnalyzeIntent(req)
print(resp)

In [None]:
# Some weather Intent queries
queries = [
    "Is it currently cloudy in Tokyo?",
    "What is the annual rainfall in Pune?",
    "What is the humidity going to be tomorrow?"
]
for q in queries:
    req = rnlp.AnalyzeIntentRequest()
    req.query = q
    start = time()
    resp = riva_nlp.AnalyzeIntent(req)

    print(f"[{resp.intent.class_name}]\t{req.query}")

## 11.4.2 `TextTransform` API
We can use this API to run the punctuation and capitalization model as follows:

In [None]:
# Use the TextTransform API to run the punctuation and capitalization model
channel = grpc.insecure_channel('localhost:32222')
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)

req = rnlp.TextTransformRequest()
req.model.model_name = "riva_punctuation"
req.text.append("add punctuation to this sentence")
req.text.append("do you have any red nvidia shirts")
req.text.append("i need one cpu four gpus and lots of memory "
                "for my new computer it's going to be very cool")

nlp_resp = riva_nlp.TransformText(req)
print("TransformText Output:")
print("\n".join([f" {x}" for x in nlp_resp.text]))

## 11.4.3 Shutdown
Clean up your environment by shutting down Riva and K8s.

In [None]:
# Shut down K8s
!minikube delete
!docker kill $(docker ps -q)
# Check for clean environment - this should be empty
!docker ps

---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you have:
- Deployed Riva on K8s
- Queried the TTS API, `SynthesizeSpeechRequest`
- Learned how to access the Riva server from various IP:Port combinations
- Queried the `AnalyzeIntent` and `TextTransform` NLP APIs

Now that you've finished the hands-on portion of the course, you can work on the assessments to test your understanding and obtain a certificate!  Move on to the assessment questions in the course dashboard or the [coding assessment notebook](assessment.ipynb)

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>