# Morpheus Murano App 1.0.0

This Murano App uses a microk8s cluster to provide a space to test NVIDIA's Morpheus on the Nectar cloud. To seperate Morpheus from any other deployments you may wish to test, all the `morpheus` namespace has been used. The deployment also includes a jupyter lab server installed with the SDK to make things more user friendly during experimentation.

The morpheus stack includes:

- [MLFlow](https://mlflow.org/) to manage models
- [Triton](https://developer.nvidia.com/nvidia-triton-inference-server) as an inference server
- [Kafka](https://kafka.apache.org/) as a data broker

You can check the health of the cluster using:
`microk8s kubectl -n morpheus get all`

If you would like to update Morpheus or for more information on usage, have a read of the [quick start guide](https://docs.nvidia.com/morpheus/morpheus_quickstart_guide.html#set-up-ngc-api-key-and-install-ngc-registry-cli) which this notebook is based on.

NVIDIA recommends that you only run one pipeline at once, this means that you will need to uninstall any active pipelines before replacing it during testing. By default this app is running the the `Morpheus SDK Client` in 'sleep mode' under the `helper` release name. Rather than destroying and recreating the sdk, we will be interfacing with it through the CLI directly.


In [1]:
!microk8s kubectl -n morpheus get all

NAME                             READY   STATUS    RESTARTS      AGE
pod/mlflow-7889bfd95f-djsvn      1/1     Running   1 (23h ago)   23h
pod/zookeeper-5bdfd5ff4d-8j692   1/1     Running   0             23h
pod/broker-6f4d759474-7kn4p      1/1     Running   0             23h
pod/ai-engine-868b768b99-fz7c6   1/1     Running   0             23h
pod/sdk-cli-helper               1/1     Running   0             22h

NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/mlflow      NodePort    10.152.183.67    <none>        5000:30500/TCP               23h
service/zookeeper   ClusterIP   10.152.183.16    <none>        2181/TCP                     23h
service/ai-engine   ClusterIP   10.152.183.152   <none>        8000/TCP,8001/TCP,8002/TCP   23h
service/broker      NodePort    10.152.183.198   <none>        9092:30092/TCP               23h

NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mlflow      1/1     1  

# Using Morpheus

Morpheus can be run in two ways:

- Run off file
  
    NVIDIA provides several example models and datasets to test morpheus with. You can browse these on their [GitHub](https://github.com/nv-morpheus/Morpheus/tree/branch-22.06/models). The quick start guide includes several examples which involve loading in this data into the input topics and storing outputs in a file. You can also upload your own data to the container using Jupyter's interface.

- Run off incoming packets
  
    We have provided you with a naiive python script which will scrape any incoming TCP traffic and load it into the `tcpdump_naiive` kafka topic. You will need to open the file and edit the IP to match the address of kafka's bootstrap server(the cluster IP of `service/broker`). By default this will only send the body of the message, though the script also supports sending the rest of the packet information - simply uncomment the relevant line.


This implementation of Morpheus is deployed using four containers:

- The *ai-engine* container, which runs NVIDIA Triton, listening for HTTP requests on port 8000, gRPC on port 8001 and metrics in the prometheus format on port 8002.
- The *sdk-cli-helper* container, which runs the SDK and the Jupyter Lab (this container). We are using NginX to manage the connections to Jupyter to make your life easier.
- The *broker* container, which runs Kafka, listening on port 9092. It is also exposed on 30092 on the main machine allowing you to feed it data, though it will be blocked by default by Nectar's security. You will need to enable this security rule if you'd like to do this, ensure that you are not working with sensitive data if you'd like to experiment with this.
- The zookeeper
- The *mlflow* container, which runs MLFlow, listening on port 5000. It's UI is also exposed on 30500 on the main machine allowing you to view the models you have deployed.

In [28]:
# Let's set up some aliases to execute commands in the containers to make our lives easier
x_sdk = "microk8s kubectl -n morpheus exec -it sdk-cli-helper -- "
x_morpheus = "microk8s kubectl -n morpheus exec -it sdk-cli-helper -- /opt/conda/envs/morpheus/bin/morpheus"
x_broker = "microk8s kubectl -n morpheus exec deploy/broker -c broker -- "
x_mlflow_python = "microk8s kubectl -n morpheus exec -it deploy/mlflow -- /opt/conda/envs/mlflow/bin/python"
x_mlflow = "microk8s kubectl -n morpheus exec -it deploy/mlflow -- /opt/conda/envs/mlflow/bin/mlflow"

In [3]:
!$x_sdk curl -v ai-engine:8000/v2/health/ready

*   Trying 10.152.183.152:8000...
* Connected to ai-engine (10.152.183.152) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: ai-engine:8000
> User-Agent: curl/7.83.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 0
< Content-Type: text/plain
< 
* Connection #0 to host ai-engine left intact


In [4]:
!$x_morpheus

Usage: morpheus [OPTIONS] COMMAND [ARGS]...

Options:
  --debug / --no-debug            [default: no-debug]
                                  Specify the logging level to use.  [default:
  --log_config_file FILE          Config file to use to configure logging. Use
                                  only for advanced situations. Can accept
                                  both JSON and ini style configurations
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Commands:
  run    Run one of the available pipelines
  tools  Run a utility tool


In [30]:
!$x_broker kafka-topics.sh

Create, delete, describe, or change a topic.
Option                                   Description                            
------                                   -----------                            
--alter                                  Alter the number of partitions,        
                                           replica assignment, and/or           
                                           configuration for the topic.         
--at-min-isr-partitions                  if set when describing topics, only    
                                           show partitions whose isr count is   
                                           equal to the configured minimum. Not 
                                           supported with the --zookeeper       
                                           option.                              
--bootstrap-server <String: server to    REQUIRED: The Kafka server to connect  
  connect to>                              to. In case of provid

In [6]:
!$x_mlflow

Usage: mlflow [OPTIONS] COMMAND [ARGS]...

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  artifacts    Upload, list, and download artifacts from an MLflow...
  azureml      Serve models on Azure ML.
  db           Commands for managing an MLflow tracking database.
  deployments  Deploy MLflow models to custom targets.
  experiments  Manage experiments.
  gc           Permanently delete runs in the `deleted` lifecycle stage.
  models       Deploy MLflow models locally.
  run          Run an MLflow project from the given URI.
  runs         Manage runs.
  sagemaker    Serve models on SageMaker.
  server       Run the MLflow tracking server.
  ui           Launch the MLflow tracking UI for local viewing of run...


# Sensitive Information Detection (SID)

NVIDIA has provided us with a set of example models and datasets to get started in the `/models` and `/models/datasets` directory of the SDK container. In this example we will look at the SID model.

To share these files with MLFlow, copy it to the `/common` directory which is mapped to `/opt/morpheus/common` on the host. 

In [7]:
!$x_sdk cp -RL /workspace/models /common

## MLFlow

The MLFlow service is used for managing and deploying models. We can use NVIDIA's scripts to deploy models to the Triton ai-engine.

In [16]:
!$x_mlflow_python publish_model_to_mlflow.py \
      --model_name sid-minibert-onnx \
      --model_directory /common/models/triton-model-repo/sid-minibert-onnx \
      --flavor triton 

Registered model 'sid-minibert-onnx' already exists. Creating a new version of this model...
2022/09/20 05:16:23 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: sid-minibert-onnx, version 4
Created version '4' of model 'sid-minibert-onnx'.
/mlflow/artifacts/0/c7afb7d6e5014f3b8fed303effb2c8f3/artifacts


In [9]:
!$x_mlflow deployments create -t triton \
      --flavor triton \
      --name sid-minibert-onnx \
      -m models:/sid-minibert-onnx/1 \
      -C "version=1"

Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in <module>
    sys.exit(cli())
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/envs/mlflow/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/mlflow/l

## Kafka

To run our pipeline we will need to create kafka topics for input and output

In [31]:
# Create an input topic
!$x_broker kafka-topics.sh\
      --create \
      --bootstrap-server broker:9092 \
      --replication-factor 1 \
      --partitions 3 \
      --topic input

Error while executing topic command : Topic 'input' already exists.
[2022-09-20 05:27:45,706] ERROR org.apache.kafka.common.errors.TopicExistsException: Topic 'input' already exists.
 (kafka.admin.TopicCommand$)
command terminated with exit code 1


In [32]:
# Create an output topic
!$x_broker kafka-topics.sh\
      --create \
      --bootstrap-server broker:9092 \
      --replication-factor 1 \
      --partitions 3 \
      --topic output

Error while executing topic command : Topic 'output' already exists.
[2022-09-20 05:27:48,438] ERROR org.apache.kafka.common.errors.TopicExistsException: Topic 'output' already exists.
 (kafka.admin.TopicCommand$)
command terminated with exit code 1


In [12]:
# View the topics we've just created
!$x_broker --list  --zookeeper zookeeper:2181

__consumer_offsets
input
output


In [13]:
!$x_sdk ls examples/data/

abp_pcap_dump.jsonlines		nvsmi.jsonlines
appshield			pcap_dump.jsonlines
email.jsonlines			sid_training_data_truth.csv
email_with_addresses.jsonlines


## Pipeline examples
Now that we've finished setting up, we can set up some pipelines! Morpheus constructs pipelines made up of 'stages', including preprocessing and postprocessing steps accellerated with NVIDIA RAPIDS, which will allow you to build something to handle your data stream in near real time. If Morpheus's provided pipeline stages do not fit your needs, it also allows you to extend its capabilities with a [custom stage written in Python or C++](https://docs.nvidia.com/morpheus/developer_guide/guides/1_simple_python_stage.html#background).

Morpheus provides scripts to simulate an input stream from a file or by streaming a file into the data broker, but in a production setting you would feed your data into the pipeline using the input topic we set up earlier.

For more examples of these pipelines have a read through these [example workflows](https://docs.nvidia.com/morpheus/morpheus_quickstart_guide.html#example-workflows).

In [49]:
# Simulate a datastream using the pcap_dump dataset file
!$x_morpheus --log_level=DEBUG run \
      --num_threads=3 \
      --edge_buffer_size=4 \
      --use_cpp=True \
      --pipeline_batch_size=1024 \
      --model_max_batch_size=32 \
      pipeline-nlp \
        --model_seq_length=256 \
        from-file --filename=./examples/data/pcap_dump.jsonlines \
        monitor --description 'FromFile Rate' --smoothing=0.001 \
        deserialize \
        preprocess --vocab_hash_file=data/bert-base-uncased-hash.txt --truncation=True --do_lower_case=True --add_special_tokens=False \
        monitor --description='Preprocessing rate' \
        inf-triton --force_convert_inputs=True --model_name=sid-minibert-onnx --server_url=ai-engine:8001 \
        monitor --description='Inference rate' --smoothing=0.001 --unit inf \
        add-class \
        serialize --exclude '^ts_' \
        to-file --filename=/common/data/sid-minibert-onnx-output.jsonlines --overwrite

[2mParameter, 'labels_file', with relative path, 'data/labels_nlp.txt', does not exist. Using package relative location: '/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/data/labels_nlp.txt'[0m
[32mConfiguring Pipeline via CLI[0m
[2mLoaded labels file. Current labels: [['address', 'bank_acct', 'credit_card', 'email', 'govt_id', 'name', 'password', 'phone_num', 'secret_keys', 'user']][0m
[2mParameter, 'vocab_hash_file', with relative path, 'data/bert-base-uncased-hash.txt', does not exist. Using package relative location: '/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/data/bert-base-uncased-hash.txt'[0m
[31mStarting pipeline via CLI... Ctrl+C to Quit[0m
Config: 
{
  "ae": null,
  "class_labels": [
    "address",
    "bank_acct",
    "credit_card",
    "email",
    "govt_id",
    "name",
    "password",
    "phone_num",
    "secret_keys",
    "user"
  ],
  "debug": false,
  "edge_buffer_size": 4,
  "feature_length": 256,
  "fil": null,
  "log_confi

# Next Steps
Using NVIDIA's pretrained models and examples are a good place to start to get a grasp of the framework, but fine tuning is necessary for any model deployed in a production setting.

NVIDIA has provided training scripts and notebooks for each of their models for you to play around with to retrain their models. If their models do not adapt well to your use case you can also look at [hugging face's model repository](https://huggingface.co/models) as a place to start and train your own model.

### WARNING
Note that many of these scripts and notebooks were moved from their original repositories when being compiled into the Morpheus container. This means that any paths present within these scripts and notebooks **will not accurately reflect where they are in the container**. NVIDIA also **has not provided any environment files** to help us recreate the environments used to train these models. The datasets will likely be somewhere in the folder we copied to `/opt/morpheus/common/` earlier. It may be easier to browse the file structure on the [NVIDIA Morpheus GitHub](https://github.com/nv-morpheus/Morpheus/tree/branch-22.09/models). It will be up to you track down the datasets and recreate the environments if you wish to use the tools that NVIDIA have provided as a starting ground. Alternatively you can look through these resources to get inspiration on how you can train your own models to make use of this framework. 

Good luck!

In [53]:
# Copy NVIDIA's training scripts into our root directory so that we can easily access their scripts and notebooks
!cp /opt/morpheus/common/models/training-tuning-scripts/ ./ -r