<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>

# 9.0 Enabling GPU within a Kubernetes (K8s) Cluster
## (part of Lab 3)

<img src="images/k8s/kubernetes_stack_0.png" style="float: right;">
In this notebook, you'll learn how to prepare a Kubernetes cluster for GPU acceleration full production deployment of conversational AI applications.<br><br>

**[9.1 Launch a K8s Cluster](#9.1-Launch-a-K8s-Cluster)<br>**
**[9.2 Deploy a CUDA Test Application](#9.2-Deploy-a-CUDA-Test-Application)<br>**
**[9.3 Add GPU Awareness to K8s](#9.3-Add-GPU-Awareness-to-K8s)<br>**
**[9.4 Interact with GPU Resources in K8s](#9.4-Interact-with-GPU-Resources-in-K8s)<br>**
&nbsp;&nbsp;&nbsp;&nbsp;[9.4.1 Exercise: Configure Pod](#9.4.1-Exercise:-Configure-Pod)<br>
&nbsp;&nbsp;&nbsp;&nbsp;[9.4.2 Final Checks and Shutdown](#9.4.2-Final-Checks-and-Shutdown)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[9.4.2.1 Exercise: Delete a Pod](#9.4.2.1-Exercise:-Delete-a-Pod)<br>

In the previous parts of the class, you deployed NVIDIA Riva using basic shell commands. As convenient as this method is during development, it becomes impractical when deploying to production, that is, when managing larger numbers of servers and services. 

[Kubernetes](https://kubernetes.io/), also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications. 
In this part of the class, we will first launch a K8s cluster, enable the cluster for GPU acceleration and interact with those resources. This is our first step toward monitoring, managing, and deploying conversational AI applications in production. Monitoring and deployment will be covered in later notebooks.

### Notebook Dependencies
The steps in this notebook assume that you are starting with a clean environment.  Ensure that by stopping any previous Kubernetes installation and all docker containers, then looking at our environment's state. 

In [None]:
# Check running docker containers. This should be empty.
!docker ps

In [None]:
# If not empty,
# Clear Docker containers to start fresh...
!docker kill $(docker ps -q)

# Check for clean environment - this should be empty
!docker ps

In [None]:
# Deletes local Kubernetes cluster if it exists
!minikube delete

--- 
# 9.1 Launch a K8s Cluster

A [Kubernetes cluster](https://kubernetes.io/docs/concepts/overview/components/) consists of a set of worker machines (physical or virtual), called nodes, that run containerized applications. Every cluster has at least one worker node, though it can also support thousands of nodes! For this class, we will use [Minikube](https://minikube.sigs.k8s.io/docs/), which allows us to deploy a local and self-contained Kubernetes cluster with a single node. 

Review the class hardware resources available and launch the K8s cluster.

We can see details and status of the available GPU using the `nvidia-smi` command.

<img src="images/k8s/nvidia_smi.png">

In [None]:
# What GPU are we using and how much memory does it have?
!nvidia-smi

In [None]:
# What type of CPU processor(s) are we using?
!cat /proc/cpuinfo | grep "model name"

In [None]:
# How many processors are available?
!nproc

In [None]:
# Launch the K8s cluster using Minikube
!minikube start --driver=none

Once the cluster is successfully launched, we expect to see a number of containers running.  Check this by executing `docker ps` again.

In [None]:
# Listing the Kuberenetes components deployed
!docker ps

We should now have access to the [kubectl command line tool](https://kubernetes.io/docs/reference/kubectl/overview/), which is used to interact with the cluster. List the nodes and services in the cluster using the `kubectl get` command:

In [None]:
# List nodes in the cluster
!kubectl get nodes

In [None]:
# List all services deployed
!kubectl get services

--- 
# 9.2 Deploy a CUDA Test Application

Next, we will deploy a simple GPU-accelerated application. This is a toy application which randomly generates two very large vectors and adds them. Print out the YAML configuration file needed to deploy the application:

In [None]:
# Set the configuration directory
CONFIG_DIR='/dli/task/kubernetes-config'

In [None]:
# Review the application we will deploy
!cat $CONFIG_DIR/gpu-pod.yaml

The main difference between a YAML file specifying a GPU-accelerated application compared to one specifying a non-GPU-accelerated application, is the configuration of the GPU resources required. In our case, we have created a basic configuration requesting a single NVIDIA GPU by setting `resources: limits: nvidia.com/gpu:` to 1. 

To deploy an application, execute the `kubectl apply` command, specifying the YAML configuration file with the `-f` file option.

In [None]:
# Deploy the application
!kubectl apply -f $CONFIG_DIR/gpu-pod.yaml

Once deployed, we can observe the status of a pod created with `kubectl get`:

In [None]:
# Get the status of the pod deployed
!kubectl get pods gpu-operator-test

At this stage, the application is in the "Pending" state. <br>
Why do you think this is case? Do you think its just the fact we have not given the application enough time to launch? Or do you think there are other reasons for this behavior? Try executing the same command again to see if the status changes.

In [None]:
# Checking again. Is it still pending?
!kubectl get pods gpu-operator-test

So the application is indeed in the "Pending" state and it will remain like that irrespective of the amount of time we wait. Why? Begin to answer this by looking at the configuration of the available nodes (in our case we just have one). In particular, look for any NVIDIA-specific configuration using the `kubectl describe` command, as this will help us identify GPU resources:

In [None]:
# Can we see the GPU?
!kubectl describe nodes

Can you find anything? Try again, filtering the output with `grep`:

In [None]:
# Let's look for the lines containing the word "nvidia"
!kubectl describe nodes | grep nvidia

We did not find anything. That would explain why the application is still pending. Our cluster is not aware of the presence of the GPU.  The cluster is unable to schedule the execution since our YAML required GPU resources, but they are for all intents and purposes unavailable. We need to add the NVIDIA GPU device plugin.

--- 
# 9.3 Add GPU Awareness to K8s
To take advantage of GPU acceleration on Kubernetes, install the [NVIDIA GPU plugin](https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin) to the cluster. Before adding it, look at the status without the plugin  with `kubectl get`:

In [None]:
# Try to find the GPU device plugin. Not there 
!kubectl get pods -A

To install the NVIDIA GPU plugin, we can use the Kubernetes package manager [Helm](https://helm.sh/).

In [None]:
# Install the device plugin with the Helm package manager
!helm repo add nvdp https://nvidia.github.io/k8s-device-plugin \
   && helm repo update
!helm install --version=0.9.0 --generate-name nvdp/nvidia-device-plugin

Check the status again to make sure the plugin was deployed:

In [None]:
# Now the device plugin "nvidia-device-plugin-*" should be "Running" after a "ContainerCreating" status
!kubectl get pods -A

We should now see the NVIDIA-specific configuration listed against the nodes:

In [None]:
# Now we should see Allocable GPUs
!kubectl describe nodes

In [None]:
# Let's look for the lines containing the word nvidia
!kubectl describe nodes | grep nvidia

As we deployed the GPU device plugin, what do you think happened to our application?

In [None]:
# Let's check the application again
!kubectl get pods gpu-operator-test

Our application executed successfully when the GPU resources became available. In fact, it has now completed so we can have a look at its execution logs with `kubectl logs`:

In [None]:
# Let's look at the output
!kubectl logs gpu-operator-test

Check the list of Helm charts installed with the `helm list` command (see the [Helm documentation](https://helm.sh/docs/helm/helm_list/)). The `--filter` option allows filtering by name.  Use the `--output` option to specify the output format ("json", "table", or "yaml").  

Now, let's delete the Kubernetes pod `gpu-operator-test`:

In [None]:
# Let's delete the pod
!kubectl delete pod gpu-operator-test 

Congratulations! You deployed a GPU accelerated applicaiton with Kuberenetes. So far, we have specified that we want a single GPU without specifying which GPU we want.

--- 
# 9.4 Interact with GPU Resources in K8s

Now, let's see how to get more control over the GPU-accelerated cluster. Being able to control the GPU type, or the MIG ([Multi-Instance GPU](https://www.nvidia.com/en-us/technologies/multi-instance-gpu/)) partition on an Ampere GPU is very important as GPUs vary in terms of computational capability, memory, and cost. The MIG allows users to fragment the GPU into as many as 7 (on A100) partitions. This allows more granular control over the resources in the cluster and better application isolation. 

In order to control the GPU type, we'll add the `gpu-feature-discovery` plugin and deploy it with Helm. This plugin can be configured with several options, as described in the [gpu-feature-discovery repository](https://github.com/NVIDIA/gpu-feature-discovery#deployment-via-helm). One of the most interesting options when working with Ampere GPUs is the ability to support MIG partitions. The feature discovery plugin can be deployed with the following configurable features:


|Feature|Description|Default|
|-|-|-|
|`failOnInitError`|Fail if there is an error during initialization of any label sources|"true"|
|`sleepInterval`|Time to sleep between labeling|"60s"|
|`migStrategy`|Pass the desired strategy for labeling MIG devices on GPUs that support it [none | single | mixed]|"none"|
|`nfd.deploy`|When set to true, deploy NFD as a subchart with all of the proper parameters set for it|"true"|

In this class, we are not using Ampere GPUs, so we will do a simple install:

In [None]:
!helm repo add nvgfd https://nvidia.github.io/gpu-feature-discovery \
    && helm repo update
!helm install \
    --version=0.4.1 \
    --generate-name \
    nvgfd/gpu-feature-discovery

Let's look at additional information that we have about our system:

In [None]:
# Looking for all of the NVIDIA related information
!kubectl describe nodes | grep "nvidia.com" -A 15

You should see a wide range of GPU-specific information, including the driver and CUDA information, as well as which GPU is in use from `nvidia.com/gpu.product`.

This is probably a Tesla-T4, unless you are running the class on an alternative GPU. Recall that we deployed our test application `gpu-operator-test` with a generic "GPU".  It is possible to deploy it with more specific information regarding the GPU. 

A new YAML file, `gpu-pod-T4.yaml`, is already prepared. Let's inspect it first:

In [None]:
# Review the application we are deploying
!cat $CONFIG_DIR/gpu-pod-T4.yaml

As you might have noticed, the YAML was configured to deploy on an A100 GPU, which is not available in the class. Go ahead and deploy the application anyway.

In [None]:
!kubectl apply -f $CONFIG_DIR/gpu-pod-T4.yaml

In [None]:
!kubectl get pods gpu-operator-test-a100

Just as we saw in the earlier non-GPU case, the deployment is in the "Pending" state and it will remain in this state until an A100 GPU becomes available or it is terminated. 

## 9.4.1 Exercise: Configure Pod

Modify the YAML file and deploy the `gpu-operator-test` application on the correct GPU.
Open the [gpu-pod-T4.yaml](kubernetes-config/gpu-pod-T4.yaml) config file and make those chages:
* Change the pod name to "gpu-operator-test-t4"
* Set the GPU product to "Tesla-T4" instead of the A100

Check your work against the [solution](solutions/ex9.4.1.yaml) before moving on:

In [None]:
# TODO modify gpu-pod-T4.yaml so that this cell verifies changes are correct
# Check your work - you'll get no output if the files match
!diff $CONFIG_DIR/gpu-pod-T4.yaml solutions/ex9.4.1.yaml

Next, deploy the `gpu-operator-test-t4` pod using the modified [gpu-pod-T4.yaml](kubernetes-config/gpu-pod-T4.yaml).

In [None]:
!kubectl apply -f $CONFIG_DIR/gpu-pod-T4.yaml

## 9.4.2 Final Checks and Shutdown
It might take a few seconds, but the application should deploy and finish successfully.

In [None]:
# Get the status of the pod deployed
!kubectl get pods gpu-operator-test-t4

In [None]:
# Let's look at the output
!kubectl logs gpu-operator-test-t4

### 9.4.2.1 Exercise: Delete a Pod

Delete the Kubernetes pod `gpu-operator-test-t4`. Check the [solution](solutions/ex9.4.2.ipynb) before moving on:

In [None]:
# TODO delete the pod
!kubectl ??

Before moving forward to the next notebook, shut down K8s and clean up the docker environment.

In [None]:
# Shut down K8s
!minikube delete
# Shut down running docker containers
!docker kill $(docker ps -q)
# Check for clean environment - this should be empty
!docker ps

---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you have:
- Launched a K8s cluster
- Interacted with K8s using `kubectl`
- Installed plugins with Helm
- Enabled GPU acceleration and GPU feature discovery
- Deployed an application

Next, you'll monitor activity on the cluster. Move on to [Monitoring GPU within Kubernetes Cluster](010_K8s_Monitor.ipynb).

<a href="https://www.nvidia.com/dli"> <img src="images/DLI_Header.png" alt="Header" style="width: 400px;"/> </a>