# Level 3: Deploy a new GPU workload

Now we have a new GPU node, lets safely isolate a new workload using it.

We talked about <a href="https://github.com/odh-labs/rhoai-roadshow/blob/main/site/docs/5-gpuaas/notebooks/Level1_add_gpu_node.ipynb" target="_blank">GPU Concurrency</a> earlier on, now we want to create a new workload with our new GPU Node.

If you recall, the new AWS instance type had the following capacity:

|Instance Name | vCPUs | Memory (GiB) | NVIDIA A10 GPU | GPU Memory (GiB) | Network Bandwidth (Gbps) | EBS Bandwidth (Gbps) |
|--------------|-------|--------------|----------------|------------------|--------------------------|----------------------|
|  g5.xlarge   |   4   |      16      |	      1        |	      24	     |           10	            |          3.5         |

So, we are quite limited in terms of vCPU, RAM and GPU.

We want to configure out new GPU node for our marketing department. 

They want to do image generation using a model.

We have <a href="https://github.com/odh-labs/rhoai-roadshow/blob/main/site/docs/5-gpuaas/notebooks/Level2_gpu_operator.ipynb" target="_blank">configured our GPU</a> for use, however given the limited resources what else can we do to ensure that only our marketing department's applications make user of this resource? 

## Workload Scheduling

There are many levels of isolation within OpenShift. Common patterns for separating tenant workloads include:

- Give each tenant their own OpenShift cluster (this has become a lot easier with [OpenShift Hosted Control Planes](https://www.redhat.com/en/topics/containers/what-are-hosted-control-planes))
- Use OpenShift's projects and namespaces - this leverages a cluster's Role Based Access Control (RBAC)

We also need to ensure that workloads land on the nodes we want. 

To do this we are going to make use of node selectors, taints and tolerations and network policy.

We will explain what these are and how to use them as we go through this notebook.

First login to OpenShift.

In [1]:
!oc login -u admin -p ${ADMIN_PASSWORD} --server=https://api.sno.${BASE_DOMAIN}:6443 --insecure-skip-tls-verify


Login successful.

You have access to 106 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "agent-demo".
Welcome! See 'oc help' to get started.


We should see that both our SNO node, and extra GPU node are running OK.

In [2]:
!oc get machines.machine.openshift.io -A

NAMESPACE               NAME                                    PHASE     TYPE         REGION      ZONE         AGE
openshift-machine-api   sno-5dqmr-master-0                      Running   g6.8xlarge   us-east-2   us-east-2a   4d19h
openshift-machine-api   sno-5dqmr-worker-us-east-2a-gpu-kcx4p   Running   g5.xlarge    us-east-2   us-east-2a   32m


To talk about Taints and Tolerations in OpenShift/Kubernetes - we first have to have a basic understanding on the term `node affinity`.

Node affinity attracts pods to a specific set of nodes.

This can work as a hard requirement or merely a scheduling preference.

A **taint** works in the opposite way to this - it is used to repel a given set of pods from a node.

You can apply one or more taints to a node. 

That way, you mark that the node shouldn’t accept any pods that happen not to **tolerate** these taints.

So **tolerations** then are applied to pods and let the Kubernetes scheduler schedule pods on nodes with matching taints.

A toleration allows scheduling but doesn’t _guarantee_ it. That’s because the scheduler takes into account other parameters as well.

So, let's check out our nodes for any **taints** they may have:

In [3]:
!oc get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints --no-headers

ip-10-0-15-75.us-east-2.compute.internal    <none>
ip-10-0-29-181.us-east-2.compute.internal   <none>


OK, so currently the nodes have **none** - no taints. A node can have one or more taints.

Now we want to user taints to repel pods from the g5.xlarge A10 GPU node.

Taints have a key, value, and effect.

- The effect determines how the taint affects pod scheduling:
  - `NoSchedule`: The Kubernetes scheduler will only allow pods with a matching toleration to be scheduled on the node. 
  - `PreferNoSchedule`: The scheduler will try to avoid scheduling pods without a matching toleration, but it's not a hard requirement. 
  - `NoExecute`: Pods without a matching toleration will be evicted from the node.

Lets create a taint for out A10 GPU node. We will use the `key: gpu` `value: NVIDIA-A10G-SHARED` and effect of `PreferNoSchedule` 

In [4]:
!oc adm taint nodes -l nvidia.com/gpu.product=NVIDIA-A10G-SHARED gpu=NVIDIA-A10G-SHARED:PreferNoSchedule

node/ip-10-0-15-75.us-east-2.compute.internal tainted


Great, let's check the nodes again for **taints**:

In [5]:
!oc get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints --no-headers

ip-10-0-15-75.us-east-2.compute.internal    [map[effect:PreferNoSchedule key:gpu value:NVIDIA-A10G-SHARED]]
ip-10-0-29-181.us-east-2.compute.internal   <none>


Nice, our node taint is there.

In the <a href="https://github.com/odh-labs/rhoai-roadshow/blob/main/site/docs/5-gpuaas/notebooks/Level2_gpu_operator.ipynb" target="_blank">previous notebook</a> we configured a `HardwareProfile` for out GPU, let's update that with a matching **toleration** i.e. any notebook that uses this hardware profile will tolerate the node taint.

In [6]:
%%bash
oc apply -f- << EOF
apiVersion: dashboard.opendatahub.io/v1alpha1
kind: HardwareProfile
metadata:
  annotations:
    opendatahub.io/dashboard-feature-visibility: '[]'
  name: nvidia-a10-shared
  namespace: redhat-ods-applications
spec:
  description: ""
  displayName: Nvidia A10 (Shared)
  enabled: true
  identifiers:
  - defaultCount: 2
    displayName: CPU
    identifier: cpu
    maxCount: 4
    minCount: 1
    resourceType: CPU
  - defaultCount: 4Gi
    displayName: Memory
    identifier: memory
    maxCount: 8Gi
    minCount: 2Gi
    resourceType: Memory
  - defaultCount: 1
    displayName: nvidia.com/gpu
    identifier: nvidia.com/gpu
    minCount: 1
    resourceType: Accelerator
  nodeSelector:
    nvidia.com/gpu.product: NVIDIA-A10G-SHARED
  tolerations:
    - effect: PreferNoSchedule
      operator: Equal
      key: gpu
      value: NVIDIA-A10G-SHARED
EOF

hardwareprofile.dashboard.opendatahub.io/nvidia-a10-shared configured


Now we can create some workloads that make use of these node affinity settings.

It is common to use taints and tolerations along with a `nodeSelector` (which attracts pods to nodes). 

This way we can be sure that the workload lands on the A10 GPU node whilst pods without the toleration will be repelled.

We can run a single pod that has CUDA libraries loaded for NVIDIA (this is a vLLM serving pod we are using for one of our LLMs already).

We specify the command to be `sleep inf` i.e. the pod waits forever doing nothing, as well as setting resource limits and our node affinity settings.

In [7]:
%%bash
oc -n agent-demo run tools --image=quay.io/eformat/vllm:latest-bnb --overrides='
{
"apiVersion": "v1",
"kind": "Pod",
"spec": {
  "containers": [
    {
      "name": "kserve-container",
      "image": "quay.io/eformat/vllm:latest-bnb",
      "command": ["/bin/bash", "-c", "sleep inf"],
      "resources": {
        "limits": {
          "nvidia.com/gpu": "1"
        },
        "requests": {
          "nvidia.com/gpu": "1"
        }
      }
    }
  ],
  "nodeSelector": {
    "nvidia.com/gpu.present": "true",
    "nvidia.com/gpu.product": "NVIDIA-A10G-SHARED"
  },
  "tolerations": [
    {
      "effect": "PreferNoSchedule",
      "operator": "Equal",
      "key": "gpu",
      "value": "NVIDIA-A10G-SHARED"
    }
  ]
}}
'


pod/tools created


Let's check the pod ran OK and is scheduled on the right node with the A10 GPU. It may take a minute or two to pull the image to the Node. 

In [8]:
!oc -n agent-demo get pods tools -o wide

NAME    READY   STATUS    RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
tools   1/1     Running   0          7m33s   10.129.0.27   ip-10-0-15-75.us-east-2.compute.internal   <none>           <none>


Great. We can do ther check, we could `oc rsh tools` into the pod (or use the `Terminal` in the OpenShift console) to run `nvtop`
or we can check the output of the `nvidia-smi` command:

In [9]:
!oc -n agent-demo exec tools -- nvidia-smi

Sun Jun 29 23:25:21 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.148.08             Driver Version: 570.148.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A10G                    On  |   00000000:00:1E.0 Off |                    0 |
|  0%   35C    P8             24W /  300W |       0MiB /  23028MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

We see the `NVIDIA A10G` listed - so we know the pod can see the GPU OK.

## Modify the workbench

We are going to modify the workbench so it runs on our new GPU node.

From the OpenShift web console - stop (or recreate) the `gpuaas` notebook, and assign the `Nvidia A10 (Shared)` Hardware Profile. 

![images/gpuaas-a10-workbench.png](images/gpuaas-a10-workbench.png)

Now (re)start the workbench (i know, you are probably reading this in that workbench !!)

Because we correctly set up node affinity for our HardwareProfile, you should see the workbench pod correctly assigned to your A10 gpu node.

In [3]:
!oc get pods gpuaas-0 -o wide

NAME       READY   STATUS    RESTARTS   AGE     IP            NODE                                       NOMINATED NODE   READINESS GATES
gpuaas-0   2/2     Running   0          4m29s   10.129.0.28   ip-10-0-15-75.us-east-2.compute.internal   <none>           <none>


## Let's have some fun generating Images

Our marketing department are creative cats 🐈 and love to use AI to generate images for their marketing campaigns.

The want to use the open source [ComfyUI](https://www.comfy.org/) tool. So lets try it out now in our notebook.

Clone the following repo.

In [4]:
!git clone https://github.com/comfyanonymous/ComfyUI

Cloning into 'ComfyUI'...
remote: Enumerating objects: 20719, done.[K
remote: Total 20719 (delta 0), reused 0 (delta 0), pack-reused 20719 (from 1)[K
Receiving objects: 100% (20719/20719), 70.86 MiB | 31.56 MiB/s, done.
Resolving deltas: 100% (13815/13815), done.


Change directories to `ComfyUI`

In [5]:
%cd ComfyUI

/opt/app-root/src/rhoai-roadshow/site/docs/5-gpuaas/notebooks/ComfyUI


Now install the python dependencies CopfyUI needs to run.

In [6]:
!pip install xformers!=0.0.18 -r requirements.txt

Collecting xformers!=0.0.18
  Downloading xformers-0.0.31-cp39-abi3-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting comfyui-frontend-package==1.23.4 (from -r requirements.txt (line 1))
  Downloading comfyui_frontend_package-1.23.4-py3-none-any.whl.metadata (117 bytes)
Collecting comfyui-workflow-templates==0.1.30 (from -r requirements.txt (line 2))
  Downloading comfyui_workflow_templates-0.1.30-py3-none-any.whl.metadata (55 kB)
Collecting comfyui-embedded-docs==0.2.3 (from -r requirements.txt (line 3))
  Downloading comfyui_embedded_docs-0.2.3-py3-none-any.whl.metadata (2.9 kB)
Collecting torch (from -r requirements.txt (line 4))
  Downloading torch-2.7.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting torchsde (from -r requirements.txt (line 5))
  Downloading torchsde-0.2.6-py3-none-any.whl.metadata (5.3 kB)
Collecting torchvision (from -r requirements.txt (line 6))
  Downloading torchvision-0.22.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
Colle

We need to grab a diffusion model that can generate images. There are a lot of them out there.

A nice list of images can be seen in the [collab workbook](https://github.com/comfyanonymous/ComfyUI/blob/master/notebooks/comfyui_colab.ipynb) that is part of the ComfyUI repo.

We will use the basic Stable Diffusion model from [**stabilityai**](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) which has a permissive license for usage.

Let's download it locally (its about 6.5G in size).

In [7]:
!wget -c https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors -P ./models/checkpoints/

--2025-06-29 23:35:40--  https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
Resolving huggingface.co (huggingface.co)... 3.160.5.25, 3.160.5.76, 3.160.5.109, ...
Connecting to huggingface.co (huggingface.co)|3.160.5.25|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cas-bridge.xethub.hf.co/xet-bridge-us/64bfcd5ff462a99a04fd1ec8/3d6f740fa52572e1071b8ecb7c5f8a8e2cbef596a51121102877bd9900078891?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250629%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250629T233540Z&X-Amz-Expires=3600&X-Amz-Signature=4f40ffaa8fb62381bad996a9328fb9ab92f8f961bc73815d9d3082d59187c340&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27sd_xl_base_1.0.safetensors%3B+filename%3D%22sd_xl_base_1.0.safetensors%22%3B&x-id=GetObject&Expires=1751243740&Policy=eyJTdGF0ZW1lbnQiOlt7IkNv

Before we run ComfyUI we need to install the `localtunnel` nodejs package so we can connect to the user interface.

In [8]:
!npm install localtunnel

[1G[0K⠙[1G[0K⠹[1G[0K⠸[1G[0K⠼[1G[0K⠴[1G[0K⠦[1G[0K⠧[1G[0K⠇[1G[0K⠏[1G[0K⠋[1G[0K⠙[1G[0K⠹[1G[0K
added 22 packages in 2s
[1G[0K⠹[1G[0K
[1G[0K⠹[1G[0K3 packages are looking for funding
[1G[0K⠹[1G[0K  run `npm fund` for details
[1G[0K⠹[1G[0K

OK, we are nearly there. We will create a Kubernetes `Service` that targets the port from this workbench we will run ComfyUI on.

In [9]:
%%bash
oc apply -f- << EOF
kind: Service
apiVersion: v1
metadata:
  name: comfy
  namespace: agent-demo
spec:
  ipFamilies:
    - IPv4
  ports:
    - protocol: TCP
      port: 8188
      targetPort: 8188
  internalTrafficPolicy: Cluster
  clusterIPs:
    - 172.30.60.3
  type: ClusterIP
  ipFamilyPolicy: SingleStack
  sessionAffinity: None
  selector:
    statefulset: gpuaas
EOF

service/comfy unchanged


Exception in thread Thread-6 (iframe_thread):
Traceback (most recent call last):
  File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/opt/app-root/lib64/python3.11/site-packages/ipykernel/ipkernel.py", line 766, in run_closure
    _threading_Thread_run(self)
  File "/usr/lib64/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/tmp/ipykernel_129/3747656451.py", line 5, in iframe_thread
NameError: name 'time' is not defined


And in OpenShift, we can expose this Service externally using a `Route`.

In [10]:
%%bash
oc apply -f- << EOF
kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: comfy
  namespace: agent-demo
spec:
  to:
    kind: Service
    name: comfy
    weight: 100
  port:
    targetPort: 8188
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect
  wildcardPolicy: None
EOF

route.route.openshift.io/comfy unchanged


Because there Data Science project already has some `NetworkPolicy` defined, we must also allow all `ingress` traffic to our workbench pod on the port we are going to use.

In [11]:
%%bash
oc apply -f- << EOF
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: gpuaas-comfy
  namespace: agent-demo
spec:
  podSelector:
    matchLabels:
      notebook-name: gpuaas
  ingress:
    - ports:
        - protocol: TCP
          port: 8188
      from:
        - namespaceSelector:
            matchLabels:
              network.openshift.io/policy-group: ingress
  policyTypes:
    - Ingress
EOF

networkpolicy.networking.k8s.io/gpuaas-comfy unchanged


We can check the Route URL in the OpenShift console, or directly from the cli.

This is where we will connect using our web browser after we start ComfyUI.

In [12]:
!oc -n agent-demo get route comfy

NAME    HOST/PORT                                           PATH   SERVICES   PORT   TERMINATION     WILDCARD
comfy   comfy-agent-demo.apps.sno.sandbox2964.opentlc.com          comfy      8188   edge/Redirect   None


OK, using a bit of python - we can start up ComfyUI.

This cell will continue running until you stop or restart it using the notebook controls.

In [None]:
import threading

def iframe_thread(port):
  while True:
      time.sleep(0.5)
      sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
      result = sock.connect_ex(('127.0.0.1', port))
      if result == 0:
        break
      sock.close()
  print("\nComfyUI finished loading, trying to launch localtunnel (if it gets stuck here localtunnel is having issues)\n")

  print("The password/enpoint ip for localtunnel is:", urllib.request.urlopen('https://ipv4.icanhazip.com').read().decode('utf8').strip("\n"))
  p = subprocess.Popen(["lt", "--port", "{}".format(port)], stdout=subprocess.PIPE)
  for line in p.stdout:
    print(line.decode(), end='')


threading.Thread(target=iframe_thread, daemon=True, args=(8188,)).start()

!python main.py --listen 0.0.0.0 --port 8188 # --dont-print-server

Checkpoint files will always be loaded safely.
Total VRAM 22599 MB, total RAM 15803 MB
pytorch version: 2.7.1+cu126
xformers version: 0.0.31
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA A10G : cudaMallocAsync
Using xformers attention
Python version: 3.11.11 (main, Feb 10 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-5)]
ComfyUI version: 0.3.43
****** User settings have been changed to be stored on the server instead of browser storage. ******
****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******
ComfyUI frontend version: 1.23.4
[Prompt Server] web root: /opt/app-root/lib64/python3.11/site-packages/comfyui_frontend_package/static

Import times for custom nodes:
   0.0 seconds: /opt/app-root/src/rhoai-roadshow/site/docs/5-gpuaas/notebooks/ComfyUI/custom_nodes/websocket_image_save.py

Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server

To see the GUI go to: http://0.0.0.0:81

Open up the ComfyUI (in this example https://comfy-agent-demo.apps.sno.sandbox2964.opentlc.com) using the Route URL from above. It should look something like this.

![images/comfyui.png](images/comfyui.png)

If you do not see a workflow .. you can easily create one by **drag-n-droping** this image - `images/ComfyUI_00005_.png` into the workflow webpage. 

Set the `Load Checkpoint` to be the safetensor model `sd_xl_base_1.0.safetensors` we downloaded earlier:

![images/comfyui-checkpoint.png](images/comfyui-checkpoint.png)

You can also change the `prompt` used to generate the image.

![images/comfyui-prompt.png](images/comfyui-prompt.png)

If you hit the `Run` button, Comfyui will load the prompt, the model and generate an image. This may a minute or two for the first run (subsequent runs should be quicker).

You can check the GPU is in use by running `nvtop` from the `tools` pod Terminal.

![images/nvtop-comfyui.png](images/nvtop-comfyui.png)

The CompfyUI workflow should complete and output an image to the `ComfyUI/output` folder.

![images/comfyui-success.png](images/comfyui-success.png)

If the python kernel dies, it may be your pod needs more RAM - check the metrics in OpenShift to find out.

<div class="alert alert-block alert-success">
<b>Success:</b> Our marketing department can generate images for their new marketing campaign !!
</div>