-
Notifications
You must be signed in to change notification settings - Fork 17
Jetstream + RayServe deployment for interleave mode #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
3b23d20
20562e4
673dbd9
f636ff7
5c9cb85
5087d4b
4188207
78a1af3
fe809d1
525b855
9057115
7cd0cbb
9233b9f
b61f35f
6539490
431fb86
7ea3a43
b37bd06
750233d
434abf3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| FROM rayproject/ray:2.22.0-py310 | ||
|
|
||
| RUN pip install flax==0.8.3 | ||
| RUN pip install jax[tpu]==0.4.30 -f https://storage.googleapis.com/jax-releases/libtpu_releases.html | ||
| RUN pip install tensorflow-text | ||
| RUN pip install tensorflow | ||
|
|
||
| RUN pip install torch==2.3.1+cpu --index-url https://download.pytorch.org/whl/cpu | ||
| RUN pip install tensorflow flatbuffers absl-py sentencepiece seqio google-cloud-storage | ||
| RUN pip install safetensors colorama coverage humanize | ||
|
|
||
| RUN git clone https://github.com/google/jetstream-pytorch | ||
| WORKDIR jetstream-pytorch | ||
|
|
||
| RUN git submodule update --init --recursive | ||
| RUN pip install -e . |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,144 @@ | ||
| # This template contains a Kuberay cluster using a 2x2x2 TPU v4 PodSlice. | ||
| # To get access to TPU resources, please follow instructions in this link: | ||
| # https://cloud.google.com/kubernetes-engine/docs/how-to/tpus | ||
| apiVersion: ray.io/v1 | ||
| kind: RayCluster | ||
| metadata: | ||
| name: example-cluster-kuberay | ||
| spec: | ||
| headGroupSpec: | ||
| rayStartParams: | ||
| {} | ||
| template: | ||
| spec: | ||
| imagePullSecrets: | ||
| [] | ||
| serviceAccountName: ray-ksa | ||
| containers: | ||
| - volumeMounts: | ||
| - name: gcs-fuse-checkpoint | ||
| mountPath: /llama | ||
| readOnly: true | ||
| - mountPath: /tmp/ray | ||
| name: ray-logs | ||
| name: ray-head | ||
| image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This image is not publically available and references an internal project, can we host it on a public registry or provide the Dockerfile so users can build it themselves? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @ryanaoleary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I see the Dockerfile now. I suggest not referencing private images though cause users will just apply the YAML without updating the image per the instructions |
||
| imagePullPolicy: IfNotPresent | ||
| resources: | ||
| limits: | ||
| cpu: "4" | ||
| ephemeral-storage: 30Gi | ||
| memory: 40G | ||
| requests: | ||
| cpu: "4" | ||
| ephemeral-storage: 30Gi | ||
| memory: 40G | ||
| securityContext: | ||
| {} | ||
| env: | ||
| - name: JAX_PLATFORMS | ||
| value: "cpu" | ||
| - name: RAY_memory_monitor_refresh_ms | ||
| value: "0" | ||
| - name: RAY_GRAFANA_IFRAME_HOST | ||
| value: http://${grafana_host} | ||
| - name: RAY_GRAFANA_HOST | ||
| value: http://grafana:80 | ||
| - name: RAY_PROMETHEUS_HOST | ||
| value: http://frontend:9090 | ||
| ports: | ||
| - containerPort: 6379 | ||
| name: gcs | ||
| - containerPort: 8265 | ||
| name: dashboard | ||
| - containerPort: 10001 | ||
| name: client | ||
| - containerPort: 8000 | ||
| name: serve | ||
| - containerPort: 8471 | ||
| name: slicebuilder | ||
| - containerPort: 8081 | ||
| name: mxla | ||
| - containerPort: 8888 | ||
| name: grpc | ||
| volumes: | ||
| - emptyDir: {} | ||
| name: ray-logs | ||
| - name: gcs-fuse-checkpoint | ||
| csi: | ||
| driver: gcsfuse.csi.storage.gke.io | ||
| readOnly: true | ||
| volumeAttributes: | ||
| bucketName: ricliu-llama2-70b-chat | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This bucket is not publically available, can we host it in a public bucket or provide instructions to push the model weights to a bucket? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @ryanaoleary |
||
| mountOptions: "implicit-dirs" | ||
| metadata: | ||
| annotations: | ||
| gke-gcsfuse/volumes: "true" | ||
| labels: | ||
| cloud.google.com/gke-ray-node-type: head | ||
| app.kubernetes.io/name: kuberay | ||
| app.kubernetes.io/instance: example-cluster | ||
|
|
||
| workerGroupSpecs: | ||
| - rayStartParams: | ||
| {} | ||
| replicas: 1 | ||
| minReplicas: 1 | ||
| maxReplicas: 1 | ||
| numOfHosts: 2 | ||
| groupName: workergroup | ||
| template: | ||
| spec: | ||
| imagePullSecrets: | ||
| [] | ||
| serviceAccountName: ray-ksa | ||
| containers: | ||
| - volumeMounts: | ||
| - mountPath: /tmp/ray | ||
| name: ray-logs | ||
| - name: gcs-fuse-checkpoint | ||
| mountPath: /llama | ||
| readOnly: true | ||
| name: ray-worker | ||
| image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709 | ||
| imagePullPolicy: IfNotPresent | ||
| resources: | ||
| limits: | ||
| cpu: "8" | ||
| ephemeral-storage: 30Gi | ||
| google.com/tpu: "4" | ||
| memory: 200G | ||
| requests: | ||
| cpu: "8" | ||
| ephemeral-storage: 30Gi | ||
| google.com/tpu: "4" | ||
| memory: 200G | ||
| securityContext: | ||
| {} | ||
| env: | ||
| - name: JAX_PLATFORMS | ||
| value: "cpu" | ||
| ports: | ||
| null | ||
| volumes: | ||
| - emptyDir: {} | ||
| name: ray-logs | ||
| - name: gcs-fuse-checkpoint | ||
| csi: | ||
| driver: gcsfuse.csi.storage.gke.io | ||
| readOnly: true | ||
| volumeAttributes: | ||
| bucketName: ricliu-llama2-70b-chat | ||
| mountOptions: "implicit-dirs" | ||
| nodeSelector: | ||
| cloud.google.com/gke-tpu-accelerator: tpu-v4-podslice | ||
| cloud.google.com/gke-tpu-topology: 2x2x2 | ||
| iam.gke.io/gke-metadata-server-enabled: "true" | ||
| metadata: | ||
| annotations: | ||
| gke-gcsfuse/volumes: "true" | ||
| labels: | ||
| cloud.google.com/gke-ray-node-type: worker | ||
| app.kubernetes.io/name: kuberay | ||
| app.kubernetes.io/instance: example-cluster | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,140 @@ | ||
| # This template contains a Kuberay cluster using a 2x2x1 TPU v4 PodSlice. | ||
| # To get access to TPU resources, please follow instructions in this link: | ||
| # https://cloud.google.com/kubernetes-engine/docs/how-to/tpus | ||
| apiVersion: ray.io/v1 | ||
| kind: RayCluster | ||
| metadata: | ||
| name: example-cluster-kuberay | ||
| spec: | ||
| headGroupSpec: | ||
| rayStartParams: | ||
| {} | ||
| template: | ||
| spec: | ||
| imagePullSecrets: | ||
| [] | ||
| serviceAccountName: ray-ksa | ||
| containers: | ||
| - volumeMounts: | ||
| - name: gcs-fuse-checkpoint | ||
| mountPath: /llama | ||
| readOnly: true | ||
| - mountPath: /tmp/ray | ||
| name: ray-logs | ||
| name: ray-head | ||
| image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709 | ||
| imagePullPolicy: IfNotPresent | ||
| resources: | ||
| limits: | ||
| cpu: "4" | ||
| ephemeral-storage: 30Gi | ||
| memory: 40G | ||
| requests: | ||
| cpu: "4" | ||
| ephemeral-storage: 30Gi | ||
| memory: 40G | ||
| securityContext: | ||
| {} | ||
| env: | ||
| - name: JAX_PLATFORMS | ||
| value: "cpu" | ||
| - name: RAY_memory_monitor_refresh_ms | ||
| value: "0" | ||
| - name: RAY_GRAFANA_IFRAME_HOST | ||
| value: http://${grafana_host} | ||
| - name: RAY_GRAFANA_HOST | ||
| value: http://grafana:80 | ||
| - name: RAY_PROMETHEUS_HOST | ||
| value: http://frontend:9090 | ||
| ports: | ||
| - containerPort: 6379 | ||
| name: gcs | ||
| - containerPort: 8265 | ||
| name: dashboard | ||
| - containerPort: 10001 | ||
| name: client | ||
| - containerPort: 8000 | ||
| name: serve | ||
| - containerPort: 8888 | ||
| name: grpc | ||
| volumes: | ||
| - emptyDir: {} | ||
| name: ray-logs | ||
| - name: gcs-fuse-checkpoint | ||
| csi: | ||
| driver: gcsfuse.csi.storage.gke.io | ||
| readOnly: true | ||
| volumeAttributes: | ||
| bucketName: ricliu-llama2 | ||
| mountOptions: "implicit-dirs" | ||
| metadata: | ||
| annotations: | ||
| gke-gcsfuse/volumes: "true" | ||
| labels: | ||
| cloud.google.com/gke-ray-node-type: head | ||
| app.kubernetes.io/name: kuberay | ||
| app.kubernetes.io/instance: example-cluster | ||
|
|
||
| workerGroupSpecs: | ||
| - rayStartParams: | ||
| {} | ||
| replicas: 1 | ||
| minReplicas: 1 | ||
| maxReplicas: 1 | ||
| numOfHosts: 1 | ||
| groupName: workergroup | ||
| template: | ||
| spec: | ||
| imagePullSecrets: | ||
| [] | ||
| serviceAccountName: ray-ksa | ||
| containers: | ||
| - volumeMounts: | ||
| - mountPath: /tmp/ray | ||
| name: ray-logs | ||
| - name: gcs-fuse-checkpoint | ||
| mountPath: /llama | ||
| readOnly: true | ||
| name: ray-worker | ||
| image: gcr.io/tpu-vm-gke-testing/ricliu-jetstream:20240709 | ||
| imagePullPolicy: IfNotPresent | ||
| resources: | ||
| limits: | ||
| cpu: "8" | ||
| ephemeral-storage: 30Gi | ||
| google.com/tpu: "4" | ||
| memory: 200G | ||
| requests: | ||
| cpu: "8" | ||
| ephemeral-storage: 30Gi | ||
| google.com/tpu: "4" | ||
| memory: 200G | ||
| securityContext: | ||
| {} | ||
| env: | ||
| - name: JAX_PLATFORMS | ||
| value: "cpu" | ||
| ports: | ||
| null | ||
| volumes: | ||
| - emptyDir: {} | ||
| name: ray-logs | ||
| - name: gcs-fuse-checkpoint | ||
| csi: | ||
| driver: gcsfuse.csi.storage.gke.io | ||
| readOnly: true | ||
| volumeAttributes: | ||
| bucketName: ricliu-llama2 | ||
| mountOptions: "implicit-dirs" | ||
| nodeSelector: | ||
| cloud.google.com/gke-tpu-accelerator: tpu-v4-podslice | ||
| cloud.google.com/gke-tpu-topology: 2x2x1 | ||
| iam.gke.io/gke-metadata-server-enabled: "true" | ||
| metadata: | ||
| annotations: | ||
| gke-gcsfuse/volumes: "true" | ||
| labels: | ||
| cloud.google.com/gke-ray-node-type: worker | ||
| app.kubernetes.io/name: kuberay | ||
| app.kubernetes.io/instance: example-cluster | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think
ray job submitwas intended to be used with Ray Serve in this way. If you run it like this, the Ray Serve application will be treated as a Ray job and not survive a restart. Could we provide an example that uses theserve runCLI or the KubeRay RayService? cc @ryanaoleary