<img src="http://developer.download.nvidia.com/compute/machine-learning/frameworks/nvidia_logo.png" width=60 height=60 align="left"/>
<img src="https://info.nvidia.com/rs/156-OFN-742/images/Red_Hat_new_BW.jpg" width=100 height=100 align="left"/>

<br><br>

# Deploying Traffic Analyser IVA application using NVIDIA Metropolis platform in OCP Kubernetes cluster environment 

## Overview

This notebook demonstrates how to: 
1. Setup OpenShift cluster on AWS and run GPU Operator
2. Pull pre-trained TAO models from NGC
3. Optimize the model with Nvidia TensorRT
4. Scale the DeepStream-Triton application on OCP kubernetes cluster
5. Observe inference throughput on 8 A100 GPUs

## Requirements

- NVIDIA GPU 
  - A100 (p4d instance on AWS)
- OpenShift Platform
- Ubuntu system to run this notebook on
- Python3 environment to run this notebook in


## Links

**Nvidia NGC resources**


* Model - TrafficCamNet: 

  https://ngc.nvidia.com/catalog/models/nvidia:tao:trafficcamnet
  
* Model - VehicleTypeNet: 
  
  https://ngc.nvidia.com/catalog/models/nvidia:tao:vehicletypenet

* Container - DeepStream-Triton:

  https://ngc.nvidia.com/catalog/containers/nvidia:deepstream


**RedHat OpenShift resources**

* RedHat OpenShift link

  http://openshift.com/

* OpenShift operators

  https://www.openshift.com/learn/topics/operators


## Topology of the cluster

We will first deploy one OCP cluster without GPU node using the openshift-installer command. This will spawn the following nodes:

- 3 x Master nodes (m5.xlarge by default)
- 2 x Worker nodes

After this, we would add a scale-up node (GPU worker nodes, p4d instance type).


## Setup OpenShift Cluster

To setup OpenShift cluster, we would the openshift-install CLI tool to initialise and delete the cluster, as per our requirement.
For this demo, we would use AWS to deploy the cluster and hence would require its credentials.

Let's run the following commands to setup the cluster.

In [None]:
# download the cli tool
!wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.8.11/openshift-install-linux.tar.gz
!tar xvf openshift-install-linux.tar.gz

# create cluster
! ./openshift-install create cluster

# enter your aws access and secret keys
# select region as "eu-central-1"
# it will take upto 40 minutes to setup the cluster
# keep note of the "kubeconfig" which gets generated after the cluster is up

## Install OpenShift CLI

To communicate with this cluster, we will use OpenShift client tool to add A100 nodes and install the GPU operator.

In [None]:
# download the tool
!wget https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.8.11/openshift-client-linux.tar.gz
!tar xvf openshift-client-linux.tar.gz

# move it to another directory
!mkdir bin
!mv oc bin
!export PATH=$PATH:$(pwd)/bin
!oc version # to verify that the CLI & cluster can be accessed


## Add NVIDIA A100 GPU node to the cluster

Now that our cluster is up and ready and we have the OpenShift client installed, we will now add a GPU node (p4d instance) to this cluster. This node will be used to run the Traffic Analyser application.

Though the worker nodes can be easily added through the GUI, we are going to use utility scripts developed by OpenShift's developers.

In [None]:
# cloning the git repo
!git clone https://github.com/kpouget/ci-artifacts -b mig
!cd ci-artifacts

# install the pip package
!pip3 install -r requirements.txt

# run the tool
!./run_toolbox.py cluster capture_environment # verify that things are correctly setup

In [None]:
# setting variables for aws instance
! export P4D_REGION=1b
! export MACHINE_TYPE=p4d.24xlarge

# getting region's machineset
! REGION_MACHINESET=$(oc get machinesets -n openshift-machine-api -oname | grep -- "$P4D_REGION"'$' | head -1 | cut -d/ -f2)

# running and starting the p4d instance
!./run_toolbox.py cluster set_scale "$MACHINE_TYPE" 1 --base-machineset="${REGION_MACHINESET}"

In [None]:
# check if there are a total 3 master and 3 worker nodes 
! oc get nodes

## Install NVIDIA GPU Operator

To use NVIDIA GPUs on OpenShift, you have to install the NVIDIA GPU Operator. This Operator exposes GPUs to Kubernetes as extended resources that can be requested and exposed into Pods and containers. The GPU Operator is enabling OpenShift cluster administrator to decide the geometry to apply to the MIG-capable GPUs of a node, apply a specific label to this node, and wait for the GPU Operator to reconfigure the GPUs and advertise the new MIG devices as resources to Kubernetes.

The instructions to install the NVIDIA GPU Operator on this OpenShift cluster can be followed from the [NVIDIA official page](https://docs.nvidia.com/datacenter/cloud-native/openshift/steps-overview.html)

## Enable MIG strategy on A100

To use NVIDIA GPUs on OpenShift, you have to install the NVIDIA GPU Operator. This Operator exposes GPUs to Kubernetes as extended resources that can be requested and exposed into Pods and containers. The GPU Operator is enabling OpenShift cluster administrator to decide the geometry to apply to the MIG-capable GPUs of a node, apply a specific label to this node, and wait for the GPU Operator to reconfigure the GPUs and advertise the new MIG devices as resources to Kubernetes.

The instructions to install the NVIDIA GPU Operator on this OpenShift cluster can be followed from the [NVIDIA official page.](https://docs.nvidia.com/datacenter/cloud-native/openshift/steps-overview.html) Make sure that the MIG configuration that you select homogeneously for all the GPUs is 2g.10gb as we've found this to be the most performant for this use-case.

## Deploy the Traffic Analyser - Metropolis IVA application

Now that our cluster is ready with a 8 x A100 GPU node with the right drivers and operators, we will now deploy the use-case application.

We have developed a simple deployment yaml that can be used schedule pods on this cluster. This yaml will execute an automation script which takes care of pulling models and other assets from NGC, optimize them through TensorRT and then execute the video analytics pipeline through DeepStream.

In [None]:
%%writefile -a deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: metropolis
spec:
  selector:
    matchLabels:
      app: ds
  replicas: 24
  template:
    metadata:
      labels:
        app: ds
    spec:
      restartPolicy: Always
      containers:
      - image: nvcr.io/nvidia/deepstream:5.1-21.02-triton
        name: cnt
        command: ["/bin/sh","-c"]
        args: ["git clone https://github.com/AshishSardana/ds_triton.git && cd ds_triton && bash -x automate_script.sh"]
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            nvidia.com/gpu: 1
      nodeSelector:
        nvidia.com/gpu.product: NVIDIA-A100-SXM4-40GB-MIG-2g.10gb
        nvidia.com/mig.config: all-2g.10gb

We would encourage you to read the `automate_script.sh` in this [Github repo](https://github.com/AshishSardana/ds_triton) to understand the workflow in detail.

You can run this deployment file using:

In [None]:
! oc create -f deployment.yaml

View running pods in namespace 'default'


In [None]:
! oc get pods --namespace default

View logs from the application pod


In [None]:
! oc logs -f metropolis

Optimizing fine-tuned BERT QA model to TensorRT (TRT)

Steps:

#### 1. Clone TensorRT Github repository on OCP node: https://github.com/NVIDIA/TensorRT.git

#### 2. We'll use TensorRT container from NGC: https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt 

  This container does not come preinstalled with all the python dependencies. Please install the dependencies by executing the following command from within the container: 

`
/opt/tensorrt/python/python_setup.sh 
`

#### 3. We'll be using /TensorRT/demo/BERT/builder.py script to build our optimized TensorRT engine with the following arguments:


```
mkdir -p /home/engines && \                     # Make dir to save model
python3 builder.py \                            # Python script to build TRT engine
-m /home/bert-fine-tuned/model.ckpt-8144 \      # Fine-tuned BERT model
-o /home/engines/bert_large_128.engine \        # Output dir where TRT engine will be stored
-b 1 \                                          # Batch size
-s 128 \                                        # Sequence length
--fp32 \                                        # Precision
-c /home/bert-fine-tuned/                       # Config dir

```

Now we are ready to package the above steps in a yaml for deployment on OCP. We can use the same yaml used for training by modifying the image and command to run in the pod.



```
apiVersion: v1
kind: Pod
metadata:
  name: trt
  namespace: default
spec:
  restartPolicy: OnFailure
  containers:
    - name: trt
      image: "nvcr.io/nvidia/tensorrt:20.09-py3"
      command: ["/bin/bash", "-ec", " bash /opt/tensorrt/python/python_setup.sh; cd /home/TensorRT/demo/BERT; mkdir -p /home/engines && python3 builder.py -m /home/bert-fine-tuned/model.ckpt-8144 -o /home/engines/bert_large_128.engine -b 1 -s 128 --fp32 -c /home/bert-fine-tuned/;"]
      env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: all
        - name: NVIDIA_DRIVER_CAPABILITIES
          value: "compute,utility"
        - name: NVIDIA_REQUIRE_CUDA
          value: "cuda>=5.0"
      securityContext:
        privileged: true
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
      volumeMounts:
      - mountPath: /home
        name: ocs-ml-data
  volumes:
  - name: ocs-ml-data
    persistentVolumeClaim:
      # directory location on host
      claimName: ocs-ml-data
      readOnly: false
```




In [None]:
! oc create -f trt_export.yaml

Let's check the status of the pod

In [None]:
! oc get pods

Finally, let's check the logs and make sure the engine is created in the output directory

In [None]:
! oc logs trt