# Notes on Distributed Workloads

This Jupiter Notebook contains basic information and findings about how to run distributed workloads on OpenShift AI using te following components:

* **CodeFlare Operator**: Secures deployed Ray clusters and grants access to their URLs.
* **KubeRay**: Manages remote Ray clusters on OpenShift for running distributed compute workloads.
* **Kueue**: Manages quotas and how distributed workloads consume them, and manages the queueing of distributed workloads with respect to quotas.
* **Training Operator**: To use the Kubeflow Training Operator to tune models

In summary, there are two ways of tuning models on OpenShift AI and you can choose your own:
* If you want to use the CodeFlare framework to tune models, enable the codeflare, kueue, and ray components.
* If you want to use the Kubeflow Training Operator to tune models, enable the kueue and trainingoperator components.


NOTE: **CodeFlare SDK**: Defines and controls the remote distributed compute jobs and infrastructure for any Python-based environment. This component has to be installed on your Jupiter Notebook environment 


## 1. Import CloudFlare SDK

* Defines and controls the remote distributed compute jobs and infrastructure for any Python-based environment. 
* This component has to be installed on your Jupiter Notebook environment.

In [4]:
import os
import codeflare_sdk

## 2. Download the example guides
The demo notebooks from the CodeFlare SDK provide guidelines on how to use the CodeFlare stack in your own notebooks. Download the demo notebooks so that you can learn how to run the notebooks locally.

In [8]:
codeflare_sdk.copy_demo_nbs() if not os.path.exists('demo-notebooks') else None

## 3. List Kueue queues
Check whether your cluster administrator has defined a default local queue for the Ray cluster.

You can use the codeflare_sdk.list_local_queues() function to view all local queues in your current namespace, and the resource flavors associated with each local queue.

In [10]:
codeflare_sdk.list_local_queues()

[{'name': 'rhoai-playground-queue', 'flavors': ['default-flavor']}]

## 4. List Ray Clusters

In [None]:
# Create authentication object for user permissions
# IF unused, SDK will automatically check for default kubeconfig, then in-cluster config
# KubeConfigFileAuthentication can also be used to specify kubeconfig path manually
auth = codeflare_sdk.TokenAuthentication(
    token = "XXXX",
    server = "XXXX",
    skip_tls=False
)
auth.login()

In [16]:
codeflare_sdk.view_clusters()

VBox(children=(ToggleButtons(description='Select an existing cluster:', options=('jobtest',), value='jobtest')…

HBox(children=(Button(description='Delete Cluster', icon='trash', style=ButtonStyle(), tooltip='Delete the sel…

Output()

Output()