# Kueue Knowledge Share Demo
In this demo we will demonstrate how Kueue is used within the CodeFlare SDK. 

### Requirements
* RHOAI installed on ROSA Cluster (this should installed KubeRay and Kueue).
* Nvidia GPU Operator installed.
* Node Feature Discovery Operator installed.
* Accelerator profile CR set appropriately for `nvidia.com/gpu`.
* Additional relevant CRs for the above (ClusterPolicy etc See GPU doc if needed).
* A `p3.2xlarge` machine pool with a node count set to 3 on cluster. 

In [None]:
# Import pieces from codeflare-sdk
from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthentication

In [None]:
# Create authentication object for user permissions using oc credentials
auth = TokenAuthentication(
    token = "",
    server = "",
    skip_tls=False
)
auth.login()

We'll define a RayCluster using the SDK as normal. Note that we've commented out the local_queue value. By default, the SDK will try to locate the name of your default local queue based on the annotation: `"kueue.x-k8s.io/default-queue": "true"` unless you specify otherwise.

If you don't have the relevant CRs you can create instances using the following:

### CRs Required by Kueue
___
**ClusterQueue CR**
```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: default
spec:
  namespaceSelector: {}
  resourceGroups:
    - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
      flavors:
        - name: "default"
          resources:
            - name: "cpu"
              nominalQuota: 2
            - name: "memory"
              nominalQuota: 32Gi
            - name: "nvidia.com/gpu"
              nominalQuota: 2
```
___
**LocalQueue CR**

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: default
  namespace: my-namespace  # must match the namespace where the job is created
  annotations:
    kueue.x-k8s.io/default-queue: "true"  # Optional: allows the SDK to pick this up as explained above.
spec:
  clusterQueue: default  # must match the name of your ClusterQueue
```
___
**ResourceFlavor CR**

```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
  name: default
```
___

Once these have been created, KubeRay and Kueue (which are both installed on RHOAI by default) will be able to interact. Now that we have the relevant CRs created, let's use the SDK to create a RayCluster. We'll intentionally set our CPU requests above the quota we set above.

In [None]:
# Create and configure our cluster object
# The SDK will try to find the name of your default local queue based on the annotation "kueue.x-k8s.io/default-queue": "true" unless you specify the local queue manually below
cluster_name = "kueue-ks-raycluster"
cluster1 = Cluster(ClusterConfiguration(
    name=cluster_name,
    head_cpu_requests=3,  # Requesting 3 CPUs, more than the available 2 CPUs in the ClusterQueue
    head_cpu_limits=3,
    head_memory_requests=6,
    head_memory_limits=6,
    head_extended_resource_requests={'nvidia.com/gpu': 1},  # 1 GPU for the head node
    worker_extended_resource_requests={'nvidia.com/gpu': 1},  # 1 GPU for each worker node
    num_workers=2,
    worker_cpu_requests='1',  # Requesting 1 CPU per worker (this adds up to 5 CPUs total)
    worker_cpu_limits=1,
    worker_memory_requests=4,
    worker_memory_limits=6,
    write_to_file=False,  # Writing Ray cluster files to disk (optional)
    # local_queue="kueue-ks" # Commented out for reasons outlined above
))

Click the Cluster Up button to start the RayCluster. Alternatively, run the below cell.

In [None]:
# Bring up the cluster
cluster1.up()

Let's check the cluster's status via the below cell. Because we've exceeded our CPU quota defined in our `ClusterQueue`, we should expect an inactive RayCluster with a status of `SUSPENDED`. 

In [None]:
cluster1.status()

### Verification
___

* You can verify this by running the command:
```
oc get localqueue <your-local-queue-name> -n <namespace> -o yaml
```

* If you check the bottom of the CR, you should see the following:
```yaml
pendingWorkloads: 1
```

### Adjusting the Quota
___
* We can unblock the workload by increasing our `ClusterQueue`. Adjust the CR to match the below values:
```yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: default
spec:
  resourceGroups:
    - coveredResources:
        - cpu
        - memory
        - nvidia.com/gpu
      flavors:
        - name: default
          resources:
            - name: cpu
              nominalQuota: "8"  # Increased CPU quota to unblock the job
            - name: memory
              nominalQuota: 64Gi  # Increased memory quota
            - name: nvidia.com/gpu
              nominalQuota: "4"  # Increased GPU quota
```

* Run the above oc command again and this time you should be able to observe similar to the following:
```yaml
status:
  admittedWorkloads: 1
  conditions:
  - lastTransitionTime: "2025-04-25T09:03:27Z"
    message: Can admit new workloads
    observedGeneration: 3
    reason: Ready
    status: "True"
    type: Active
  flavorsReservation:
  - name: default
    resources:
    - borrowed: "0"
      name: cpu
      total: "5"
    - borrowed: "0"
      name: memory
      total: 13671875Ki
    - borrowed: "0"
      name: nvidia.com/gpu
      total: "3"
  flavorsUsage:
  - name: default
    resources:
    - borrowed: "0"
      name: cpu
      total: "5"
    - borrowed: "0"
      name: memory
      total: 13671875Ki
    - borrowed: "0"
      name: nvidia.com/gpu
      total: "3"
  pendingWorkloads: 0
  reservingWorkloads: 1
```

* Next, we can verify this further by checking the cluster status via the below cell. NOTE: Allow some time for the cluster to be created.

In [None]:
cluster1.status()

### Resource Contention
We can further illustrate how Kueue allocates resources by creating an additional RayCluster that will exceed the quota we set.
Execute the below cell to create the new RayCluster.

In [None]:
cluster_name_2 = "kueue-ks-raycluster-2"
cluster2 = Cluster(ClusterConfiguration(
    name=cluster_name_2,
    head_cpu_requests=3,
    head_cpu_limits=3,
    head_memory_requests=6,
    head_memory_limits=6,
    head_extended_resource_requests={'nvidia.com/gpu': 0},
    worker_extended_resource_requests={'nvidia.com/gpu': 0},
    num_workers=2,
    worker_cpu_requests='1',  # Requesting 1 CPU per worker (this adds up to 5 CPUs total) -> now exceeding our adjusted quota.
    worker_cpu_limits=1,
    worker_memory_requests=4,
    worker_memory_limits=6,
    write_to_file=False,
))

In [None]:
cluster2.up()

### Verifying the Pending Workload
Now that we have a second cluster competing for resources, let's check the `ClusterQueue` again by running:
```
oc get clusterqueue default -o yaml
```
In the resulting yaml, you should be able to observe the below output:
```yaml
  flavorsReservation:
  - name: default
    resources:
    - borrowed: "0"
      name: cpu
      total: "5"
    - borrowed: "0"
      name: memory
      total: 13671875Ki
    - borrowed: "0"
      name: nvidia.com/gpu
      total: "3"
  flavorsUsage:
  - name: default
    resources:
    - borrowed: "0"
      name: cpu
      total: "5"
    - borrowed: "0"
      name: memory
      total: 13671875Ki
    - borrowed: "0"
      name: nvidia.com/gpu
      total: "3"
  pendingWorkloads: 1
  reservingWorkloads: 1
```
Here we can see that we have a workload pending and we can observe its requirements in `flavorsReservation`.
We can verify this further by executing the below cell. The output should list the RayCluster as SUSPENDED.

In [None]:
cluster2.status()

Let's create a third RayCluster that requires signifantly fewer resource requirements. We can use this to demonstrate how Kueue allocates resources based on requirements when there is resource contention.

In [None]:
cluster_name_3 = "kueue-ks-raycluster-3"
cluster3 = Cluster(ClusterConfiguration(
    name=cluster_name_3,
    head_cpu_requests='500m',
    head_cpu_limits='500m',
    head_memory_requests=2,
    head_memory_limits=2,
    head_extended_resource_requests={'nvidia.com/gpu':0}, # For GPU enabled workloads set the head_extended_resource_requests and worker_extended_resource_requests
    worker_extended_resource_requests={'nvidia.com/gpu':0},
    num_workers=2,
    worker_cpu_requests='250m',
    worker_cpu_limits=1,
    worker_memory_requests=4,
    worker_memory_limits=4,
))

In [None]:
cluster3.up()

The above cluster only requires 1 CPU per Head and Worker. This brings us to 7, 1 below our quota. Because of our `queueingStrategy` of `BestEffortFIFO` (which is set within the ClusterQueue spec), Kueue will priorise a workload that it has resources available for rather than waiting for the sufficient resources for RayCluster 2. This means that the above cluster will immediately served the resources it requires as its within our quota. We can check via oc as earlier using:
```
oc get clusterqueue default -o yaml
```
You should be able to observe the below:
```yaml
  flavorsUsage:
  - name: default
    resources:
    - borrowed: "0"
      name: cpu
      total: "6"
    - borrowed: "0"
      name: memory
      total: 23437500Ki
    - borrowed: "0"
      name: nvidia.com/gpu
      total: "3"
  pendingWorkloads: 1
  reservingWorkloads: 2
```

The second cluster we created will still be pending but the others will have the requisite resources. We can also observe that our total cpu has risen to 7. We can finalize our verification by running the following command:
```
oc get rayclusters -n rhods-notebooks
```
You should see output similar to the below:
```
NAME                    DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS      AGE
kueue-ks-raycluster     2                 2                   5      14G      3      ready       87m
kueue-ks-raycluster-2   2                                     5      14G      0      suspended   32m
kueue-ks-raycluster-3   2                 2                   1      10G      0      ready       2m2s
```
We can now see that cluster 3 is ready and 2 is still suspended due to its greater resource requirements. We can verify this further by running the below cell.

In [None]:
cluster3.status()

### Cleanup
Lets close this out by shutting down the 3 clusters we created.

In [None]:
cluster1.down()
cluster2.down()
cluster3.down()

In [None]:
auth.logout()