# gQuant NeMo Chatbot Example

In the previous example notebooks, we use gQuant to perform ETL, backtesting, machine learning, hyperparameter tuning tasks. As we  see, gQuant can be extended easily for other types of workflow accelerations as long as the computation can be organized as graph computations. 

[NeMo](https://github.com/NVIDIA/NeMo) is a toolkit for creating Conversational AI applications. Most importantly, NeMo abstracts the neural network into neural modules with input and output ports similar to gQuant nodes. So the neural network computation is organized as a computation graph made of neural modules. Because of the similarity between gQuant node and neural modules, the neural module can be easily wrapped up into a gQuant node. NeMo is integrated into the gQuant so the neural modules can be wired, trained, inferred visually with the help of gQuant UI. 

In this tutorial, we will show how to use gQuant to train an RNN model for chatbot applications. 

## NeMo gQuant integration

Convert the NeMo neural module to gQuant node is easy. gQuant provides a base NeMo node that can be used for inheritance for neural modules. The child class just need to pass in the class name of the Neural Module in the constructor. Here is one example that converts the EncoderRNN Neural Module to gQuant node.

```python
from gquant.plugin_nodes.nemo_util.nemoBaseNode import NeMoBase
import nemo
import nemo.backends.pytorch.tutorials

class EncoderRNNNode(NeMoBase):
    def init(self):
        super().init(nemo.backends.pytorch.tutorials.chatbot.modules.EncoderRNN)
```

`NeMoBase` inspects the `EncoderRNN` constructor signature and converts it as node configuration JSON schema. It automatically infers the types for each of the parameters.  If the auto type inference has an error, it can be fixed by modifying the `self.fix_type` dictionary in the constructor. The keys of the dictionary are the parameter names while the values are the type strings.

In the `util` directory, we provide a script `auto_gen.py` that automatically converts the NeMo Neural Modules into gQuant nodes. Currently, it converts 90% of the Neural Modules. The converted Python files are exported to the `modules` directory. gQuant by design can load any externally defined nodes by defining the node modules in the `gquantrc` file.  Here is an example `gquantrc` file that loads the converted NeMo neural module files:

```ini
nemo.asr= %(MODULEPATH)s/asr.py
nemo.common= %(MODULEPATH)s/common.py
nemo.cv= %(MODULEPATH)s/cv.py
nemo.nlp= %(MODULEPATH)s/nlp.py
nemo.simple_gan= %(MODULEPATH)s/simple_gan.py
nemo.tts= %(MODULEPATH)s/tts.py
nemo.tutorials= %(MODULEPATH)s/tutorials.py
```
Note, gQuant can automatically interpret the `.` in the module names to form hierarchical menus.


## Data Preparation

Let's first prepare the chatbot dataset that will be used to train an RNN model.

In [1]:
import gzip
import os
import shutil
data_file = "movie_data.txt"
# Download the data file.
if not os.path.isfile(data_file):
    with gzip.open("../../NeMo/tests/data/movie_lines.txt.gz", 'rb') as f_in:
        with open(data_file, 'wb') as f_out:
            shutil.copyfileobj(f_in, f_out)

It assumes you are using the docker container to run this example. The `movie_lines.txt.gz` file is located at `../../NeMo/tests/data/`. If it is not the case, the data file can be fetched by the following command.
```bash
wget https://github.com/NVIDIA/NeMo/raw/v0.11.1/tests/data/movie_lines.txt.gz
gzip -d movie_lines.txt.gz
mv movie_lines.txt movie_data.txt
```

Let's load the necessary NeMo module and gQuant module, and setup the Neural Module Factory environment. 

In [2]:
# start the neural module factory
import nemo
nemo.core.NeuralModuleFactory()
import json
from gquant.dataframe_flow import TaskGraph

Later we will use Ray Tune for hyperparameter tuning. Let's setup the ray environment so we can utilize all the GPUs in the host node for hyperparameter search.

In [3]:
import ray
ray.init(dashboard_host='0.0.0.0')

2020-10-09 18:19:07,391	INFO resource_spec.py:231 -- Starting Ray with 52.64 GiB memory available for workers and up to 26.34 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-10-09 18:19:07,958	INFO services.py:1193 -- View the Ray dashboard at [1m[32m172.17.0.2:8265[39m[22m


{'node_ip_address': '172.17.0.2',
 'raylet_ip_address': '172.17.0.2',
 'redis_address': '172.17.0.2:6379',
 'object_store_address': '/tmp/ray/session_2020-10-09_18-19-07_390022_8297/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2020-10-09_18-19-07_390022_8297/sockets/raylet',
 'webui_url': '172.17.0.2:8265',
 'session_dir': '/tmp/ray/session_2020-10-09_18-19-07_390022_8297'}

## Chatbot Model

The chat bot model is taken from the NeMo tutorial example. It consists of a Data layer, a RNN encoder, RNN decoder for Cross Entropy loss and Greedy RNN decoder for text inference. Load the TaskGraph into gQuant:

In [4]:
taskGraph=TaskGraph.load_taskgraph('../taskgraphs/nemo_examples/chatbot_example.gq.yaml')
taskGraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'data'), ('type', 'DialogDataLayerNode'), ('conf', {'batch…

You may notice that each trainable neural module has an `in_nm` input port that takes a neural module as input. This input port is mainly used for sharing weights between neural modules. The gQuant Neural Module wrapper supports all 3 weight sharing mechanisms mentioned in the [NeMo documents](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/v0.11.0/tutorials/weightsharing.html): Module Reuse, Weight Copying, and Weight Tying. Click on the trainable neural module node above, you will see that the weight sharing method can be configured easily in a dropdown list. In the above example, the encoder and decoder modules for thes evaluation dataset reuse the encoder and decoder modules for training. The greedy decoder shares the weights with the decoder by tying the weights. 

All the neural modules have an `out_nm` port that outputs the neural module instance itself for other nodes to consume. For example, it can be used for weight sharing. Or, it can be used to extract other important information from the neural module. Later in the inference stage, we use the outputted data layer neural module to extract the text dictionary information. 

There are two special nodes in the above graph: `NeMo Train Node` and `NeMo Infer Node`. As the name indicates, they are used for training and inference for a NeMo module graph. Both nodes can connect to any number of `NmTensors` which are the outputs from the Neural Modules. In the `NeMo Train Node`, the user can specify the `NmTensor` for training loss, and a list of `NmTensors` for logging. Users can select the optimization method from a list of pre-defined optimization methods. Users can also select the `WarmUp Policy` from a list of pre-defined policies. Other configurations like the checkpoints and training epochs can be configured in a similar way. The output port of the `NeMo Train Node` is the directory for the checkpoint files. In the above example, it is used by `NeMo Infer Node` to load the checkpoint files, and run inference. 

Click on the `run` button to see the training and inference of this chatbot model are in action. During the run the log console has informative information from the Ray Tune library. Click one the 'list' button to see the log or you can go to "View -> Show Log Console". It will train 3 epochs by default. If better accuracy is needed, try to increase the number of epochs in the `NeMo Train Node` and run it again.

The chatbot graph is a little complicated. We can wrap the whole computation graph into a `Context Composite Node` as explained in the `08_gquant_machine_learning` notebook. The number of layers for encoders and decoders and the drop out rate parameters are exposed as context parameters for this context composite node. We will tune these two parameters later. 

Load this simplified graph:

In [5]:
taskGraph=TaskGraph.load_taskgraph('../taskgraphs/nemo_examples/chatbot_simplified.gq.yaml')
taskGraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'rnn_train'), ('type', 'ContextCompositeNode'), ('conf', {…

The context composite node output the evaluation results from the inference node and the evaluation data layer neural module for extracting the text dictionary. It produces the same result as the previous complicated graph. The graph can be run in a programmatically way. Let's verify the result by the following command: 

In [6]:
result = taskGraph.run()

[NeMo I 2020-10-09 17:50:57 data:132] Start preparing training data ...
[NeMo I 2020-10-09 17:50:57 data:102] Reading lines...
[NeMo I 2020-10-09 17:51:09 data:134] Read 150000 sentence pairs
[NeMo I 2020-10-09 17:51:09 data:136] Trimmed to 43840 sentence pairs
[NeMo I 2020-10-09 17:51:09 data:137] Counting words...
[NeMo I 2020-10-09 17:51:09 data:141] Bad message (TypeError('not all arguments converted during string formatting')): {'name': 'nemo_logger', 'msg': 'Counted words:', 'args': (14455,), 'levelname': 'INFO', 'levelno': 20, 'pathname': '/home/quant/NeMo/nemo/backends/pytorch/tutorials/chatbot/data.py', 'filename': 'data.py', 'module': 'data', 'exc_info': None, 'exc_text': None, 'stack_info': None, 'lineno': 141, 'funcName': 'loadPrepareData', 'created': 1602265869.9934738, 'msecs': 993.4737682342529, 'relativeCreated': 267813.71688842773, 'thread': 140232394950464, 'threadName': 'MainThread', 'processName': 'MainProcess', 'process': 7025}
[NeMo I 2020-10-09 17:51:09 data:58] 

[NeMo W 2020-10-09 17:51:24 callbacks:415] No checkpoints will be saved because step_freq and epoch_freq are both -1.


[NeMo I 2020-10-09 17:51:24 callbacks:534] Found 3 modules with weights:
[NeMo I 2020-10-09 17:51:24 callbacks:536] GreedyLuongAttnDecoderRNN
[NeMo I 2020-10-09 17:51:24 callbacks:536] LuongAttnDecoderRNN
[NeMo I 2020-10-09 17:51:24 callbacks:536] EncoderRNN
[NeMo I 2020-10-09 17:51:24 callbacks:537] Total model parameters: 30876086
[NeMo I 2020-10-09 17:51:24 callbacks:473] Found checkpoint folder nemo_log. Will attempt to restore checkpoints from it.


[NeMo W 2020-10-09 17:51:24 callbacks:499] For module EncoderRNN, no file matches  in nemo_log
[NeMo W 2020-10-09 17:51:24 callbacks:501] Checkpoint folder nemo_log was present but nothing was restored. Continuing training from random initialization.


[NeMo I 2020-10-09 17:51:24 callbacks:232] loss: 8.730251
[NeMo I 2020-10-09 17:51:24 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 8.307969
[NeMo I 2020-10-09 17:51:26 deprecated_callbacks:321] Evaluation time: 2.3719964027404785 seconds
[NeMo I 2020-10-09 17:51:32 callbacks:232] loss: 4.24844
[NeMo I 2020-10-09 17:51:32 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 4.2280946
[NeMo I 2020-10-09 17:51:34 deprecated_callbacks:321] Evaluation time: 2.2655580043792725 seconds
[NeMo I 2020-10-09 17:51:39 callbacks:232] loss: 3.9356227
[NeMo I 2020-10-09 17:51:39 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 3.783767
[NeMo I 2020-10-09 17:51:42 deprecated_callbacks:321] Evaluation time: 2.654122829437256 seconds
[NeMo I 2020-10-09 17:51:47 callbacks:232] loss: 3.8796775
[NeMo I 2020-10-09 17:51:47 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 3.5885432
[NeMo I 2020-10-09 17:51:49 deprecated_callbacks:321] Evaluation time: 2.314290761947632 seconds
[NeMo I 2020-10-09 17:51:56 callbacks:232] loss: 3.4285002
[NeMo I 2020-10-09 17:51:56 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 3.3833182
[NeMo I 2020-10-09 17:51:58 deprecated_callbacks:321] Evaluation time: 2.6722679138183594 seconds
[NeMo I 2020-10-09 17:52:04 callbacks:232] loss: 3.3505182
[NeMo I 2020-10-09 17:52:04 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 3.2234194
[NeMo I 2020-10-09 17:52:07 deprecated_callbacks:321] Evaluation time: 2.358950138092041 seconds
[NeMo I 2020-10-09 17:52:12 callbacks:232] loss: 3.018219
[NeMo I 2020-10-09 17:52:12 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 3.0797098
[NeMo I 2020-10-09 17:52:14 deprecated_callbacks:321] Evaluation time: 2.337757110595703 seconds
[NeMo I 2020-10-09 17:52:20 callbacks:232] loss: 3.0476317
[NeMo I 2020-10-09 17:52:20 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 2.915382
[NeMo I 2020-10-09 17:52:23 deprecated_callbacks:321] Evaluation time: 2.6708412170410156 seconds
[NeMo I 2020-10-09 17:52:29 callbacks:232] loss: 2.988197
[NeMo I 2020-10-09 17:52:29 deprecated_callbacks:316] Doing Evaluation ..............................




eval: loss 2.773505
[NeMo I 2020-10-09 17:52:31 deprecated_callbacks:321] Evaluation time: 2.327364444732666 seconds
[NeMo I 2020-10-09 17:52:32 deprecated_callbacks:339] Final Evaluation ..............................




eval: loss 2.7484186
[NeMo I 2020-10-09 17:52:34 deprecated_callbacks:344] Evaluation time: 2.294581413269043 seconds
node inference,  from <NodeInTaskGraph nemo_gquant_modules.nemo_util.trainNemo.NemoTrainNode object at 0x7f824e4e3dd0> oport checkpoint_dir not in out cols
{}
[NeMo I 2020-10-09 17:52:34 actions:695] Evaluating batch 0 out of 273
[NeMo I 2020-10-09 17:52:35 actions:695] Evaluating batch 27 out of 273
[NeMo I 2020-10-09 17:52:36 actions:695] Evaluating batch 54 out of 273
[NeMo I 2020-10-09 17:52:37 actions:695] Evaluating batch 81 out of 273
[NeMo I 2020-10-09 17:52:38 actions:695] Evaluating batch 108 out of 273
[NeMo I 2020-10-09 17:52:39 actions:695] Evaluating batch 135 out of 273
[NeMo I 2020-10-09 17:52:39 actions:695] Evaluating batch 162 out of 273
[NeMo I 2020-10-09 17:52:40 actions:695] Evaluating batch 189 out of 273
[NeMo I 2020-10-09 17:52:41 actions:695] Evaluating batch 216 out of 273
[NeMo I 2020-10-09 17:52:42 actions:695] Evaluating batch 243 out of 27

The graph computation results are stored in the `result` variable. It can be used as a named tuple or a dictionary. The keys of the result dictionary can be queried by the following command:|

In [7]:
result.get_keys()

('rnn_train.inference@torch_tensor', 'rnn_train.eval_data@out_nm')

The evaluated greedy tensors are indices of the predicted letters in the dictionary. We can translate the numbers to human-readable texts by the following convenient function:

In [8]:
from nemo.utils import logging
# Define the callback function which prints intermediate results to console.
def outputs2words(tensors, vocab):
    source_ids = tensors[1][:, 0].cpu().numpy().tolist()
    response_ids = tensors[2][:, 0].cpu().numpy().tolist()
    tgt_ids = tensors[3][:, 0].cpu().numpy().tolist()
    source = list(map(lambda x: vocab[x], source_ids))
    response = list(map(lambda x: vocab[x], response_ids))
    target = list(map(lambda x: vocab[x], tgt_ids))
    source = ' '.join([s for s in source if s != 'EOS' and s != 'PAD'])
    response = ' '.join([s for s in response if s != 'EOS' and s != 'PAD'])
    target = ' '.join([s for s in target if s != 'EOS' and s != 'PAD'])
    logging.info(f"Train Loss:{str(tensors[0].item())}")
    logging.info(f"SOURCE: {source} <---> PREDICTED RESPONSE: {response} " f"<---> TARGET: {target}")

We are ready to check the quality of the model performance. Note, by default, only 3 epochs are trained. For better performance, please increase the number of epochs.

In [9]:
for batch in range(len(result['rnn_train.inference@torch_tensor'][0])):
    outputs2words([i[batch] for i in result['rnn_train.inference@torch_tensor']], result['rnn_train.eval_data@out_nm'].voc.index2word)

[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:13] Train Loss:2.8878540992736816
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:14] SOURCE: oh . just not at school . . . <---> PREDICTED RESPONSE: what ? ? ? ? ? <---> TARGET: yeah
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:13] Train Loss:2.8501431941986084
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:14] SOURCE: you know what they re doin now lou . <---> PREDICTED RESPONSE: i m sorry . i m sorry . <---> TARGET: this i know benny .
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:13] Train Loss:2.992612600326538
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:14] SOURCE: are you sure this is the river road ? <---> PREDICTED RESPONSE: i m not . . . . <---> TARGET: i saw the sign .
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:13] Train Loss:2.9164655208587646
[NeMo I 2020-10-09 17:53:04 <ipython-input-8-d6a2d1ea5b1b>:14] SOURCE: it didn t hurt too much

## Chatbot model Hypyparameter Tuning

If you haven't followed the `09_gquant_machine_hpo` notebook, it is recommended to go back and check it. It explains in detail how to use gQuant to perform hyperparameter tuning. We will use it to tune the hyperparameters for the chatbot model.

Since the context composite node exposes the `number of layers` and `drop-out rate` as context parameters, the `hpo` `Nemo Hyper Tune Node` will do a grid search of the `number of layers` parameter and random uniform search on the `drop-out` parameter. The eval loss will be used as the hyperparameter tuning metric. 

In the following example, we will perform `1` sample hyperparameter tuning. Click on the `run` button to see the hyperparameter tuning in action.

In [10]:
taskGraph=TaskGraph.load_taskgraph('../taskgraphs/nemo_examples/chatbot_hpo.gq.yaml')
taskGraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'rnn_train'), ('type', 'ContextCompositeNode'), ('conf', {…

It is satisfying to see all the GPUs in the host node are busy searching for different hyperparameters. The best set of hyperparameters are reported in the end. It can also be fed back to other context composite node to use. 

Since we are dealing with deep learning model training, the model is trained in a batched fashion. It is sometimes wasteful to run the training to end to get a metric. Luckily, the `Ray Tune` library provides a few scheduler algorithms that can perform the early stop in the optimization process. The `Nemo Hyper Tune Node` integrates a few scheduler algorithms from the `Ray Tune` lib that are ready to be used. 

In the following example, we will use [ASHA](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-scheduler-hyperband), a scalable algorithm for [principled early stopping](https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/). On a high level, ASHA terminates trials that are less promising and allocates more time and resources to more promising trials. As our optimization process becomes more efficient, we can afford to increase the search space by 5x, by adjusting the parameter `num_samples`. In the following example, we use `10` as `num_samples`.

Click on the `run` button to see the large scale hyperparameter search in action.

In [11]:
taskGraph=TaskGraph.load_taskgraph('../taskgraphs/nemo_examples/chatbot_large_hpo_search.gq.yaml')
taskGraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'rnn_train'), ('type', 'ContextCompositeNode'), ('conf', {…

Checking the dynamically updated logs, you can see there are `20` search trails running in total because of the `10` samples and `2` types of grid search. Most of the trials are terminated early at `1` iteration. Only a few are running untill the end. 

After tuning is done, the best hyperparameter set is used to train the model again and run the inference. It should give a better model performance.

## Clean up

In [12]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}