In [None]:
# Copyright 2021 NVIDIA Corporation. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" style="width: 90px; float: right;">

# Your First Steps to Design an Intelligent Assistant for Hands-free Applications 

Advances in natural language processing (NLP) have brought the performance of speech-to-text (STT) systems into the realm of human-level accuracy, while today’s deep-learning based language understanding models accurately answer naturally phrased questions about a body of text, such as a product manual in a foreign language. When coupled with accelerated computing, practical conversational systems like Intelligent Assistants (IA) now provide spoken responses to users’ requests in a short fraction of a second. This talk covers the initial steps to create an example IA for a hands-free application: Performing an upgrade to an aircraft while keeping maintenance notes. 

<img src="maint.jpg" style="float: center;">


## Problem Description

We examine the enablers for an IA based on NVIDIA’s Riva SDK, including a consideration of design trade-offs for the STT subsystem, options for Question Answering (QA), the application of Neural Machine Translation in QA, and how to provide verbal responses in high-quality synthetic speech.

This example will help you to begin programming with Riva, highlighting the APIs you need to call on your path to creating your own Conversational AI system. Please take note that this notebook doesn't cover the realtime aspects of an creating an IA but instead isolates the major pieces you'll need to aid your understanding.

## Part 0: Setup
Our first step is to import required Python libraries and connect to the Riva speech server. You'll note that we import a few additional libraries later on, too. Let's ignore those warnings for now, as we're not going to venture into experimental territory!

In [66]:
import io
import os

import timeit
from time import time
from datetime import datetime
import pytz
import numpy as np
import IPython.display as ipd
import grpc
import requests

%matplotlib inline
import matplotlib.pyplot as plt
import soundfile as sf
import re
import librosa, librosa.display

import io
import librosa
from time import time
import numpy as np
import IPython.display as ipd
import grpc
import requests

# NLP proto
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv

# ASR proto
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv

# TTS proto
import riva_api.riva_tts_pb2 as rtts
import riva_api.riva_tts_pb2_grpc as rtts_srv
import riva_api.riva_audio_pb2 as ra


channel = grpc.insecure_channel('localhost:50051')

riva_asr = rasr_srv.RivaSpeechRecognitionStub(channel)
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)
riva_tts = rtts_srv.RivaSpeechSynthesisStub(channel)

## Part 1: Processing Speech-to-Text in Riva for Spoken Commands

Using STT, our Assistant will need to understand the actual words being spoken by the technician before any further processing or understanding can take place. In practice, an actual Assistant's design would probably run in realtime mode, which means that speech audio is immediately processed into tex. Then, in turn, the text will be further processed by NLP modalities like Named Entitiy Recognition, Intent Slotting, and Query Answering. A dialog manager would track the state of the discourse and issue actions as needed, like storing maintentance notes or finding the answers to questions.

But first things first. Here are example commands and queries our technician might speak:
* Commence work on case 62239.
* Maintenance note. Paint is peeling from the tail fin surface.
* How do I loosen the crimp?
* What kind of screwdriver to adjust the pushrod screw?
* Which tool unlocks the nut?

Let's load an example audio clip and process it, once we listen to it.

In [67]:
audio_path = '/data/wav/a31186/'
audio_files = os.listdir(audio_path)

# Read and play an audio file
path = audio_path + audio_files[1]
audio, sr = librosa.core.load(path, sr=None)
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

The speech audio samples are now loaded by Riva for transcription. 

It's important to point out that a general ASR model is used for this example, excluding special jargon pertaining to aircraft maintenance. Further, the model was trained on speech audio with normal, room-level noise in the background, not an aircraft hangar or tarmac! You can improve the performance of your ASR by fine-tuning for your expected vocabulary and environment using [NVIDIA TAO](https://developer.nvidia.com/tao)!

In [68]:
# Set up an offline/batch recognition request
req = rasr.RecognizeRequest()
req.audio = content                                   # raw bytes
req.config.encoding = ra.AudioEncoding.LINEAR_PCM     # Only PCM is supported in this release
req.config.sample_rate_hertz = sr                     # Audio will be resampled if necessary
req.config.language_code = "en-US"                    # Ignored, will route to correct model in future release
req.config.max_alternatives = 1                       # How many top-N hypotheses to return
req.config.enable_automatic_punctuation = False       # Add punctuation when end of VAD detected
req.config.audio_channel_count = 1                    # Mono channel

response = riva_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript

print("ASR Transcript:\n", asr_best_transcript)

ASR Transcript:
 commence work on case six two two three nine


You can implement [some number parsing](https://stackoverflow.com/questions/493174/is-there-a-way-to-convert-number-words-to-integers)  to handle the case number as digits and pull up any necessary instructions and documentation for the technician.

In [69]:
# From a solution on Stack Overflow

def is_number(x):
    if type(x) == str:
        x = x.replace(',', '')
    try:
        float(x)
    except:
        return False
    return True

def text2int (textnum, numwords={}):
    units = [
        'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight',
        'nine', 'ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen',
        'sixteen', 'seventeen', 'eighteen', 'nineteen',
    ]
    tens = ['', '', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
    scales = ['hundred', 'thousand', 'million', 'billion', 'trillion']
    ordinal_words = {'first':1, 'second':2, 'third':3, 'fifth':5, 'eighth':8, 'ninth':9, 'twelfth':12}
    ordinal_endings = [('ieth', 'y'), ('th', '')]

    if not numwords:
        numwords['and'] = (1, 0)
        for idx, word in enumerate(units): numwords[word] = (1, idx)
        for idx, word in enumerate(tens): numwords[word] = (1, idx * 10)
        for idx, word in enumerate(scales): numwords[word] = (10 ** (idx * 3 or 2), 0)

    textnum = textnum.replace('-', ' ')

    current = result = 0
    curstring = ''
    onnumber = False
    lastunit = False
    lastscale = False

    def is_numword(x):
        if is_number(x):
            return True
        if word in numwords:
            return True
        return False

    def from_numword(x):
        if is_number(x):
            scale = 0
            increment = int(x.replace(',', ''))
            return scale, increment
        return numwords[x]

    for word in textnum.split():
        if word in ordinal_words:
            scale, increment = (1, ordinal_words[word])
            current = current * scale + increment
            if scale > 100:
                result += current
                current = 0
            onnumber = True
            lastunit = False
            lastscale = False
        else:
            for ending, replacement in ordinal_endings:
                if word.endswith(ending):
                    word = "%s%s" % (word[:-len(ending)], replacement)

            if (not is_numword(word)) or (word == 'and' and not lastscale):
                if onnumber:
                    # Flush the current number we are building
                    curstring += repr(result + current) + " "
                curstring += word + " "
                result = current = 0
                onnumber = False
                lastunit = False
                lastscale = False
            else:
                scale, increment = from_numword(word)
                onnumber = True

                if lastunit and (word not in scales):                                                                                                                                                                                                                                         
                    # Assume this is part of a string of individual numbers to                                                                                                                                                                                                                
                    # be flushed, such as a zipcode "one two three four five"                                                                                                                                                                                                                 
                    curstring += repr(result + current)                                                                                                                                                                                                                                       
                    result = current = 0                                                                                                                                                                                                                                                      

                if scale > 1:                                                                                                                                                                                                                                                                 
                    current = max(1, current)                                                                                                                                                                                                                                                 

                current = current * scale + increment                                                                                                                                                                                                                                         
                if scale > 100:                                                                                                                                                                                                                                                               
                    result += current                                                                                                                                                                                                                                                         
                    current = 0                                                                                                                                                                                                                                                               

                lastscale = False                                                                                                                                                                                                              
                lastunit = False                                                                                                                                                
                if word in scales:                                                                                                                                                                                                             
                    lastscale = True                                                                                                                                                                                                         
                elif word in units:                                                                                                                                                                                                             
                    lastunit = True

    if onnumber:
        curstring += repr(result + current)

    return curstring

In [52]:
command_digits = text2int(asr_best_transcript)
print(command_digits)

commence work on case 62239


That trascript is indeed correct, but it's lacking capitalization and punctuation. While that's just a minor detail at this point, it will become important for automatic maintenance notes, where human eyes will presumably read the entries.

Let's call the TextTransform API and consider the output.

In [70]:
# Use the TextTransform API to run the punctuation model
req = rnlp.TextTransformRequest()
req.model.model_name = "riva_punctuation"

# command_digits = "where should we get lunch"
# command_digits = "i was thinking about pizza but now i'm not sure"

req.text.append(command_digits)
nlp_resp = riva_nlp.TransformText(req)
command = "\n".join([f" {x}" for x in nlp_resp.text])
print("TransformText Output:")
print(command)

TransformText Output:
 Commence work on case 62239.


Nice. Now, we have a more natural entry for the notes, probably the way the technician meant it, gramatically speaking.

Note that this puncutation-(re)insertion step was separated for illustration but could've been included in the API call originally for ASR:

```
req.config.enable_automatic_punctuation = False       # Make this True instead
```

## Part 2: Using Riva's Text-to-Speech API for Spoken Responses

It's useful for the Assistant to acknowledge verbally that a technician's command was understood, including details such as the case number that the technician said earlier. 

Let's use Riva's TTS API and pretrained "ljspeech" voice model to speak the confirmation of the case number.

In [54]:
def riva_speak(text):
    req = rtts.SynthesizeSpeechRequest()
    req.text = text
    req.language_code = "en-US"                    # currently required to be "en-US"
    req.encoding = ra.AudioEncoding.LINEAR_PCM     # Supports LINEAR_PCM, FLAC, MULAW and ALAW audio encodings
    req.sample_rate_hz = 22050                     # ignored, audio returned will be 22.05KHz
    req.voice_name = "ljspeech"                    # ignored

    resp = riva_tts.Synthesize(req)
    audio_samples = np.frombuffer(resp.audio, dtype=np.float32)
    return audio_samples

In [55]:
ipd.Audio(riva_speak("Acknowledged. " + command), rate=22050)

Other voice models besided ljspeech are expected to be available in Riva, possibly customizable voices at some point.

## Part 3: Keeping Maintenance Notes
Let's say you wish, for example, use a command like "maintanance note" to cause a spoken comment to be recorded as text in a log, then acknowledged by the Assistant. 

Let's try a simple example, just to illustrate the API. First, the spoken audio: **Maintenance note. Paint is peeling from the tail fin surface.**

In [56]:
# Read another .wav file for the maintenance note command
path = audio_path + audio_files[0]
audio, sr = librosa.core.load(path, sr=None)
with io.open(path, 'rb') as fh:
    content = fh.read()
ipd.Audio(path)

In [71]:
# Set up an offline/batch recognition request
req = rasr.RecognizeRequest()
req.audio = content                                   # raw bytes
req.config.encoding = ra.AudioEncoding.LINEAR_PCM     # Only PCM is supported in this release
req.config.sample_rate_hertz = sr                     # Audio will be resampled if necessary
req.config.language_code = "en-US"                    # Ignored, will route to correct model in future release
req.config.max_alternatives = 1                       # How many top-N hypotheses to return
req.config.enable_automatic_punctuation = False       # Add punctuation when end of VAD detected
req.config.audio_channel_count = 1                    # Mono channel

response = riva_asr.Recognize(req)
asr_best_transcript = response.results[0].alternatives[0].transcript
print(asr_best_transcript)
asr_best_transcript = asr_best_transcript[17:]

# Punctuation now
req = rnlp.TextTransformRequest()
req.model.model_name = "riva_punctuation"
req.text.append(asr_best_transcript)
nlp_resp = riva_nlp.TransformText(req)
note_text = "\n".join([f" {x}" for x in nlp_resp.text])
    
tz_ORD = pytz.timezone('America/Chicago') 
datetime_ORD = datetime.now(tz_ORD)
current_time = datetime_ORD.strftime("%H:%M")
response = "\u2192 Note saved at " + current_time +  ". "+ note_text
print(response)
ipd.Audio(riva_speak(response), rate=22050)

commence work on case six two two three nine
→ Note saved at 14:31.  Case six, two, two, three, nine.


The note wasn't actually saved anywhere, but you could've done that as part of a larger, formal maintenance log application. The possibilities are endless! 

## Part 3: Question Answering using Riva

This example notebook only scratches the surface of Riva QA [(deeper here)](https://developer.nvidia.com/blog/developing-a-question-answering-application-quickly-using-riva/), in which a computer system automatically answers a question posed by a human in plain language. The answers are pulled from structured (tables or databases) or unstructured text, like this "context" paragraph about the rainforest:

<em>In 2010 the Amazon rainforest experienced another severe drought, in some ways more extreme than the 2005 drought. The affected region was approximate 1,160,000 square miles (3,000,000 km2) of rainforest, compared to 734,000 square miles (1,900,000 km2) in 2005. The 2010 drought had three epicenters where vegetation died off, whereas in 2005 the drought was focused on the southwestern part. The findings were published in the journal Science. In a typical year the Amazon absorbs 1.5 gigatons of carbon dioxide; during 2005 instead 5 gigatons were released and in 2010 8 gigatons were released/ </em>

The "context" is the background that will be understood by the language model for question ansewring, while the "query" is the question we're asking. Let's try a few questions - uncomment each of the queries, one at a time, and run the cell each time. See the confidence factor?

In [73]:
riva_uri = "localhost:50051"
grpc_server = riva_uri
channel = grpc.insecure_channel(grpc_server)
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)
req = rnlp.NaturalQueryRequest()

test_context = "In 2010 the Amazon rainforest experienced another severe drought, in some ways more extreme than the 2005 drought. The affected region was approximate 1,160,000 square miles (3,000,000 km2) of rainforest, compared to 734,000 square miles (1,900,000 km2) in 2005. The 2010 drought had three epicenters where vegetation died off, whereas in 2005 the drought was focused on the southwestern part. The findings were published in the journal Science. In a typical year the Amazon absorbs 1.5 gigatons of carbon dioxide; during 2005 instead 5 gigatons were released and in 2010 8 gigatons were released."

#query = "How many tons of carbon are absorbed the Amazon in a typical year?"
#query = "When did the rainforest experience a severe drought?"
query = "What's dying off?"
req.query = query
req.context = test_context
resp = riva_nlp.NaturalQuery(req)
print(resp)

results {
  answer: "vegetation"
  score: 0.8144050240516663
}



That's fascinating and important, but let's return to our maintenance example. 

Here's our new context from a random online repair manual from the [US Army Publishing Directorate](https://armypubs.army.mil/):

<img src="repair.jpg" style="width: 300px; float: right;">
<em>Unlock the jamnut using the SCT32084 tool (supplied by the manufacturer). Adjust the pushrod adjustment screw using a 1/4-inch straight edge screwdriver (Figure 37). Turn the screw clockwise to loosen the crimp (enlarge the gaging dimension), or counterclockwise to tighten the crimp (reduce the gaging dimension). After each adjustment, securely tighten the jam nut using the SCT32084 tool (while holding the adjustment screw tight with the screwdriver). Reinstall the tool nose being sure to tighten the 8-32 socket cap screws securely.</em>

Let's ask Riva's QA API some questions against that context, using the default language model (not fine tuned for Army repair lingo). Here are our questions, but feel free to experiment with your own:
* Which tool unlocks the nut?
* Which screws do I use for reinstallation?
* What kind of screwdriver to adjust the pushrod screw?
* How do I loosen the crimp?

In [74]:
# Set up a Riva QA query
riva_uri = "localhost:50051"
grpc_server = riva_uri
channel = grpc.insecure_channel(grpc_server)
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)
req = rnlp.NaturalQueryRequest()

# This long line is the context (repair manual excerpt) pasted from above.
test_context = "Unlock the jamnut using the SCT32084 tool (supplied by the manufacturer). Adjust the pushrod adjustment screw using a 1/4-inch straight edge screwdriver (Figure 37). Turn the screw clockwise to loosen the crimp (enlarge the gaging dimension), or counterclockwise to tighten the crimp (reduce the gaging dimension). After each adjustment, securely tighten the jam nut using the SCT32084 tool (while holding the adjustment screw tight with the screwdriver). Reinstall the tool nose being sure to tighten the 8-32 socket cap screws securely."
req.context = test_context

# Here's the battery of questions.
req.query = "Which tool unlocks the nut?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

req.query = "Which screws do I use for reinstallation?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

req.query = "What kind of screwdriver to adjust the pushrod screw?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

req.query = "How do I loosen the crimp?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

Which tool unlocks the nut?
→ SCT32084

Which screws do I use for reinstallation?
→ 8-32 socket cap screws

What kind of screwdriver to adjust the pushrod screw?
→ 1/4-inch straight edge screwdriver

How do I loosen the crimp?
→ Turn the screw clockwise



Once you have your answers, you could have Riva speak the information aloud, as we showed before. But for now, we're about to experience a plot twist in our project.

## Part 4: Plot twist. ¿Hablas español?

If only it were always so easy! As it would turn out, we have a little complication. Let's say our technical documentation turns out to be in Spanish and was not from the US Army. 

Can we still make a query in English against a text in another langauge? Yes, and there's an entire field of research and practice for [Cross-language Information Retrieval](https://en.wikipedia.org/wiki/Cross-language_information_retrieval). CLIR techniques strive for robust answering through methods like [query expansion](https://en.wikipedia.org/wiki/Query_expansion), which seeks to add terms to the query to minimize mismatch and improve retrieval accuracy. CLIR techniques aim for robust responses even with not-so-robust queries.

By contrast, in this proof-of-concept, we simply apply Neural Machine Translation (NMT) to translate the maintenance manual's context passage into Spanish using [NVIDIA NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation.html) to demonstrate the processing flow. In case you're a SME in CLIR, this is just an example meant as an appetizer, and NVIDIA researchers are collaborating on robust CLIR leveraging NeMo's NMT resources.

In [75]:
# Let's summon Nemo and load the Spanish-to-English NMT model.
# It will take a while to load the NMT model the first time.
import nemo
import nemo.collections.nlp as nemo_nlp

nmt_es_en_model = nemo_nlp.models.MTEncDecModel.from_pretrained(model_name='nmt_es_en_transformer12x2')

[NeMo I 2021-10-18 19:33:59 cloud:56] Found existing object /root/.cache/torch/NeMo/NeMo_1.4.0/nmt_es_en_transformer12x2/42fbff52240a2c8cb1127d2a97201f6d/nmt_es_en_transformer12x2.nemo.
[NeMo I 2021-10-18 19:33:59 cloud:62] Re-using file from: /root/.cache/torch/NeMo/NeMo_1.4.0/nmt_es_en_transformer12x2/42fbff52240a2c8cb1127d2a97201f6d/nmt_es_en_transformer12x2.nemo
[NeMo I 2021-10-18 19:33:59 common:702] Instantiating model from pre-trained checkpoint
[NeMo I 2021-10-18 19:34:10 tokenizer_utils:136] Getting YouTokenToMeTokenizer with model: /tmp/tmpx2zw0zve/tokenizer.32000.BPE.model with r2l: False.
[NeMo I 2021-10-18 19:34:10 tokenizer_utils:136] Getting YouTokenToMeTokenizer with model: /tmp/tmpx2zw0zve/tokenizer.32000.BPE.model with r2l: False.


[NeMo W 2021-10-18 19:34:10 modelPT:130] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    src_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tgt_file_name: /raid/sharded_tarfiles_60_even/batches.tokens.16000._OP_1..302_CL_.tar
    tokens_in_batch: 16000
    clean: true
    max_seq_length: 512
    cache_ids: false
    cache_data_per_node: false
    use_cache: false
    shuffle: true
    num_samples: -1
    drop_last: false
    pin_memory: false
    num_workers: 8
    load_from_cached_dataset: false
    reverse_lang_direction: true
    load_from_tarred_dataset: true
    metadata_path: /raid/sharded_tarfiles_60_even/metadata.json
    tar_shuffle_n: 100
    
[NeMo W 2021-10-18 19:34:10 modelPT:137] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validat

[NeMo I 2021-10-18 19:34:13 save_restore_connector:143] Model MTEncDecModel was successfully restored from /root/.cache/torch/NeMo/NeMo_1.4.0/nmt_es_en_transformer12x2/42fbff52240a2c8cb1127d2a97201f6d/nmt_es_en_transformer12x2.nemo.


Let's list the NMT models available in NeMo, including English to/from Spanish, French, German, Russian, Mandarin Chinese at the time of writing. More are on the way, and with some work, it's possible for you to use models from Huggingface and others as well. You can see the model architecture in the model names, e.g. "nmt_en_es_transformer12x2" means it's a 12 encoder, 2 decoder transformer. 

In [76]:
# List available NMT models
nemo_nlp.models.MTEncDecModel.list_available_models()

[PretrainedModelInfo(
 	pretrained_model_name=nmt_en_de_transformer12x2,
 	description=En->De translation model. See details here: https://ngc.nvidia.com/catalog/models/nvidia:nemo:nmt_en_de_transformer12x2,
 	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/nmt_en_de_transformer12x2/versions/1.0.0rc1/files/nmt_en_de_transformer12x2.nemo
 ),
 PretrainedModelInfo(
 	pretrained_model_name=nmt_de_en_transformer12x2,
 	description=De->En translation model. See details here: https://ngc.nvidia.com/catalog/models/nvidia:nemo:nmt_de_en_transformer12x2,
 	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/nmt_de_en_transformer12x2/versions/1.0.0rc1/files/nmt_de_en_transformer12x2.nemo
 ),
 PretrainedModelInfo(
 	pretrained_model_name=nmt_en_es_transformer12x2,
 	description=En->Es translation model. See details here: https://ngc.nvidia.com/catalog/models/nvidia:nemo:nmt_en_es_transformer12x2,
 	location=https://api.ngc.nvidia.com/v2/models/nvidia/nemo/nmt_en_es_transformer12x2/

Here are the instructions in Spanish, verified as correct by a native speaker.

<em>Desbloquee la contratuerca con la herramienta SCT32084 (suministrada por el fabricante). Ajuste el tornillo de ajuste de la varilla de empuje con un destornillador de borde recto de 1/4 de pulgada (Figura 37). Gire el tornillo en el sentido de las agujas del reloj para aflojar el engarzado (ampliar la dimensión de calibre) o en sentido antihorario para apretar el engarzado (reducir la dimensión de calibre). Después de cada ajuste, apriete firmemente la contratuerca con la herramienta SCT32084 (mientras mantiene apretado el tornillo de ajuste con el destornillador). Vuelva a instalar la punta de la herramienta asegurándose de apretar firmemente los tornillos de cabeza hueca 8-32.</em>


In [77]:
# Load the text passage into a variable
es_text = ['Desbloquee la contratuerca con la herramienta SCT32084 (suministrada por el fabricante). Ajuste el tornillo de ajuste de la varilla de empuje con un destornillador de borde recto de 1/4 de pulgada (Figura 37). Gire el tornillo en el sentido de las agujas del reloj para aflojar el engarzado (ampliar la dimensión de calibre) o en sentido antihorario para apretar el engarzado (reducir la dimensión de calibre). Después de cada ajuste, apriete firmemente la contratuerca con la herramienta SCT32084 (mientras mantiene apretado el tornillo de ajuste con el destornillador). Reinstale la punta de la herramienta asegurándose de apretar firmemente los tornillos de cabeza hueca 8-32.']

Let's translate the instructions into English using NeMo's standard es-en model. You'll note that it just takes one line of code at this point.

In [78]:
en_text = nmt_es_en_model.translate(es_text) # <--- The actual line of code to translate text
print(en_text); 

['Unlock the counter nut with the SCT32084 tool (supplied with the manufacturer). Adjust the push rod adjustment screw with a 1 / 4-inch straight-edge screwdriver (Figure 37). Turn the screw clockwise to loosen the crimping (widen the gauge dimension) or counterclockwise to tighten the crimping (reduce the gauge dimension). After each adjustment, tighten the counter nut firmly with the SCT32084 tool (while holding the adjustment screw with the screwdriver). Re-install the tip of the tool making sure to tighten the 8-32 hollow head screws firmly.']


Now, let's use Riva's QA to query against the freshly-translate context in English (the instructions). Reminder, here's what we're going to ask:
* Which tool unlocks the nut?
* Which screws do I use for reinstallation?
* What kind of screwdriver to adjust the pushrod screw?
* How do I loosen the crimp?

In [79]:
riva_uri = "localhost:50051"
grpc_server = riva_uri
channel = grpc.insecure_channel(grpc_server)
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)
req = rnlp.NaturalQueryRequest()

test_context = str(en_text)
req.context = test_context

req.query = "Which tool unlocks the nut?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

req.query = "Which screws do I use for reinstallation?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

req.query = "What kind of screwdriver to adjust the pushrod screw?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

req.query = "How do I loosen the crimp?"
resp = riva_nlp.NaturalQuery(req)
print(req.query+"\n"+ "\u2192 "+resp.results[0].answer+"\n")

Which tool unlocks the nut?
→ SCT32084

Which screws do I use for reinstallation?
→ 8-32

What kind of screwdriver to adjust the pushrod screw?
→ 1 / 4-inch straight-edge screwdriver

How do I loosen the crimp?
→ Turn the screw clockwise



Then compare the answers between using the En (English) and Es (Español - Spanish) contexts. Here's what we had before:

    Which tool unlocks the nut?
    → SCT32084

    Which screws do I use for reinstallation?
    → 8-32 socket cap screws

    What kind of screwdriver to adjust the pushrod screw?
    → 1/4-inch straight edge screwdriver

    How do I loosen the crimp?
    → Turn the screw clockwise

Those answers are correct and match the results from the original English context. ¡Muy excelente!

## Closing Remarks

This notebook showed you some of the basic elements to experiment with, as you work to create your Conversational AI system. We took the approach of isolating each element for your experimentation as a stepping stone on your way to integrating Riva into your realtime Intelligent Agent.

## Next steps
- Streaming using Riva chatbot tools: https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/weather.html
- Dialog manager to improve responses: https://docs.nvidia.com/deeplearning/riva/user-guide/docs/samples/dialogflow.html
- Use proven CLIR techniques for foreign language queries, like query expansion to improve problems with rigidity.
- Finetune your own speech model or NLP model using NeMo https://github.com/NVIDIA/NeMo or Transfer Learning ToolKit  https://developer.nvidia.com/transfer-learning-toolkit
