Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(k8s): adds Kubernetes support to Runhouse via Codeflare #120

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

RohanSreerama5
Copy link
Contributor

@RohanSreerama5 RohanSreerama5 commented Oct 12, 2023

POC: Adds Kubernetes support to Runhouse.
Utilizes codeflare-sdk to spin up Ray clusters.

Example Usage:

cluster = rh.kubernetes_cluster(
            name="cpu-cluster-demo",
            namespace="default", 
            instance_type="CPU:1",
            memory=4,
            num_workers=0,
            # kube_config_path="./my-config"
        )

Prerequisites:
Ensure you have helm, kubectl, make, gnu-sed, and go installed and setup.
You will also need an EKS cluster and its kubeconfig.
Make sure port 50052 on your local machine is free to use. Check by running lsof -i :50052
Tested on Python 3.8.13

Reference this PR if you need to setup a fresh EKS cluster:
Terraform script: https://github.com/run-house/runhouse/blob/rs/k8s-poc-skypilot/runhouse/scripts/kubernetes_cluster/main.tf
PR: #109

NOTE about kube_config:

  • You have the option of passing in your kube_config to the rh.kubernetes_cluster(...) constructor as a file path.
  • OR, we will automatically pick it up from the default location at ~/.kube/config
    If you plan to use the default location, make sure your kube_config is up to date, referencing the intended current-context

If you have the aws cli and are using EKS you can update it as so:
aws eks update-kubeconfig --region {REGION_NAME} --name {NAME_OF_EKS_CLUSTER}
Example: aws eks update-kubeconfig --region us-east-1 --name my-eks-cluster

Setup process:

  • mkdir tester_directory && cd tester_directory
  • git clone git@github.com:RohanSreerama5/codeflare-sdk.git
  • cd codeflare-sdk
  • pip install ray pydantic kubernetes executing rich
  • pip install -e .

NOTE: You may need to first run python -m pip install --upgrade pip to install the codeflare-sdk in editable mode.

  • Start a Python shell and make sure you can run the following without errors:
    from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration
    from codeflare_sdk.cluster.auth import KubeConfigFileAuthentication

  • From tester_directory level, run git clone git@github.com:run-house/runhouse.git

  • cd runhouse. Then, run git checkout rs/k8s-testing

  • conda install grpcio -y

  • pip install -e .

Run import runhouse as rh in a Python shell to make sure it works.

Now Runhouse and codeflare-sdk are setup.

Now open the Runhouse folder and go to CodeflareTest.ipynb.

Run each of the cells, noting that cluster.install_k8s_operators() should only be run one time.
Even if you restart the kernel, you do not need to run it again.

This notebook should setup a cluster, run a function, and report the status of the cluster as well as tear it down.

If you need to clean up your EKS cluster do the following:
Removing the K8s operators on the EKS cluster:
Uninstall KubeRay Operator:

  • Run helm uninstall kuberay-operator

Uninstall Codeflare Operator:
From inside the codeflare-operator repo, run

  • make uninstall -e SED=/opt/homebrew/opt/gnu-sed/libexec/gnubin/sed
  • make undeploy -e SED=/opt/homebrew/opt/gnu-sed/libexec/gnubin/sed
    (If one of them errors out it just means its already been removed from EKS)

NOTE: The codeflare-operator repo will already be inside your runhouse directory as it was automatically cloned down during setup.

Check your EKS cluster and make sure kuberay-operator has been removed from the default namespace.
Also, make sure that there the openshift-operators namespace is gone. (This namespace will have had
codeflare-operator inside it)

PENDING

  • Adding argument to accept k8s context instead
  • More code cleanup related work (create smaller helper functions ie. port-forwarding in a singular function)
  • Add pytest fixture tests
  • Test more rigorously for all Runhouse feature support

Post-POC activities

  • Persisting kube_config to secrets management (Vault)

Links:
Codeflare SDK: https://github.com/RohanSreerama5/codeflare-sdk
Codeflare Operator: https://github.com/RohanSreerama5/codeflare-operator

@RohanSreerama5 RohanSreerama5 marked this pull request as draft November 30, 2023 14:46
@RohanSreerama5 RohanSreerama5 changed the title feat(k8s): adds Kubernetes support to Runhouse feat(k8s): adds Kubernetes support to Runhouse via Codeflare Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant