From 7098f84aca4e56cf5a354d87987a4d28013e98a7 Mon Sep 17 00:00:00 2001 From: Vassilis Vassiliadis Date: Thu, 26 Mar 2026 11:20:06 +0000 Subject: [PATCH 1/3] docs(ordered_pip): ordered pip plugin usage guide Signed-off-by: Vassilis Vassiliadis --- backend/kuberay/README.md | 130 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+) diff --git a/backend/kuberay/README.md b/backend/kuberay/README.md index 5f9c58a77..ed6c320b8 100644 --- a/backend/kuberay/README.md +++ b/backend/kuberay/README.md @@ -302,3 +302,133 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor > your HuggingFace home directory. On Kubernetes with RayClusters, avoid S3-like > filesystems as that is known to cause failures in **transformers**. Use a NFS > or GPFS-backed PersistentVolumeClaim instead. + +## Using the OrderedPip Ray Runtime Environment Plugin + +The `OrderedPipPlugin` is a Ray RuntimeEnvPlugin that enables you to control the +installation order of Python packages. This is particularly useful for packages +that require other packages to be installed during their **build phase**, not +just at runtime. + +### Why OrderedPip is Needed + +Some Python packages, such as `flash-attn`, `mamba-ssm`, and `causal-conv1d`, +import packages during their wheel building process like `torch`. During +standard pip installation, these packages may install a different version of +`torch` than the desired one because `torch` is not yet available in the +environment. The `OrderedPipPlugin` solves this by allowing you to install +packages in multiple phases, ensuring that build-time dependencies are +available when needed. + +#### Configuration Details + +The `ordered_pip` runtime environment accepts a dictionary with a `phases` key: + +- **`phases`**: A list where **each element follows the exact same schema as the + standard Ray `pip` field**. This means each phase can be: + - A list of package names (e.g., `["torch==2.6.0"]`) + - A dictionary with `packages` and optional `pip_install_options` fields + - Any other valid `pip` specification format + +> [!IMPORTANT] +> +> Each entry in `phases` uses the **identical schema** as Ray's standard `pip` +> runtime environment field. If you know how to configure `pip`, you already +> know how to configure each phase in `ordered_pip`. + +### Availability + +The `OrderedPipPlugin` is pre-installed in ado Docker images (both CPU and GPU +variants). It is also available in any local virtual environment that has +`ado-core` installed. However, it is switched off by default. + +### Enabling the Plugin + +To enable the `OrderedPipPlugin`, set the `RAY_RUNTIME_ENV_PLUGINS` environment +variable before starting Ray: + +```bash +export RAY_RUNTIME_ENV_PLUGINS='[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' +``` + +> [!NOTE] +> +> Even if you are using ado Docker images for your RayCluster, or if you have +> `ado-core` installed in your local virtual environment, the plugin is not +> switched on by default. You need to set the environment variable to enable it. + +### Usage Examples + +#### Using ordered_pip in Python Code + +Here's a complete example showing how to use `ordered_pip` in a Ray task: + +```python +import ray + +@ray.remote( + runtime_env={ + "ordered_pip": { + "phases": [ + # Phase 1: Install PyTorch first + ["torch==2.6.0"], + # Phase 2: Install packages that depend on PyTorch during build + { + "packages": ["mamba-ssm==2.2.5"], + # IMPORTANT. + # --no-build-isolation tells pip to build the wheel + # in the same venv where torch is already installed + "pip_install_options": ["--no-build-isolation"], + } + ] + } + } +) +def my_task(): + import torch + import mamba_ssm + return torch.__version__ + +result = ray.get(my_task.remote()) +print(f"PyTorch version: {result}") +``` + +#### Using ordered_pip with ray job submit + +You can also use `ordered_pip` with `ray job submit` by providing a runtime +environment YAML file: + +```yaml +# ray_runtime_env.yaml +ordered_pip: + phases: + # Phase 1: Install PyTorch first + - packages: + - torch==2.6.0 + # Phase 2: Install packages that depend on PyTorch during build + - packages: + - mamba-ssm==2.2.5 + pip_install_options: + # IMPORTANT. + # --no-build-isolation tells pip to build the wheel + # in the same venv where torch is already installed + - --no-build-isolation +``` + +Then submit your job with: + +```bash +ray job submit --runtime-env-json ray_runtime_env.yaml -- python my_script.py +``` + +**Key points:** + +- Phases execute sequentially in the order specified +- All phases reuse the same virtual environment +- The `--no-build-isolation` flag is critical for packages that need build-time + dependencies. It instructs pip to build wheels in the existing virtual + environment rather than in an isolated one +- Package order within a phase doesn't matter, but the order of phases does + +Actuators like `SFTTrainer` automatically use `OrderedPipPlugin` when available +to ensure correct installation of their dependencies. From b3810d466e90790c65bd6705029d69d05e134c9e Mon Sep 17 00:00:00 2001 From: Vassilis Vassiliadis Date: Thu, 26 Mar 2026 13:31:25 +0000 Subject: [PATCH 2/3] docs(ordered_pip): apply feedback from review Signed-off-by: Vassilis Vassiliadis --- backend/kuberay/README.md | 38 ++++++++++---------------------------- 1 file changed, 10 insertions(+), 28 deletions(-) diff --git a/backend/kuberay/README.md b/backend/kuberay/README.md index ed6c320b8..f1c0b5891 100644 --- a/backend/kuberay/README.md +++ b/backend/kuberay/README.md @@ -306,21 +306,16 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor ## Using the OrderedPip Ray Runtime Environment Plugin The `OrderedPipPlugin` is a Ray RuntimeEnvPlugin that enables you to control the -installation order of Python packages. This is particularly useful for packages -that require other packages to be installed during their **build phase**, not -just at runtime. +build order of Python packages. This is useful when installing packages packages +with build-time dependencies. For example, `mamba-ssm` and `torch`. -### Why OrderedPip is Needed +### Configuration Details -Some Python packages, such as `flash-attn`, `mamba-ssm`, and `causal-conv1d`, -import packages during their wheel building process like `torch`. During -standard pip installation, these packages may install a different version of -`torch` than the desired one because `torch` is not yet available in the -environment. The `OrderedPipPlugin` solves this by allowing you to install -packages in multiple phases, ensuring that build-time dependencies are -available when needed. - -#### Configuration Details +> [!IMPORTANT] +> +> Each entry in `phases` uses the **identical schema** as Ray's standard `pip` +> runtime environment field. If you know how to configure `pip`, you already +> know how to configure each phase in `ordered_pip`. The `ordered_pip` runtime environment accepts a dictionary with a `phases` key: @@ -330,17 +325,10 @@ The `ordered_pip` runtime environment accepts a dictionary with a `phases` key: - A dictionary with `packages` and optional `pip_install_options` fields - Any other valid `pip` specification format -> [!IMPORTANT] -> -> Each entry in `phases` uses the **identical schema** as Ray's standard `pip` -> runtime environment field. If you know how to configure `pip`, you already -> know how to configure each phase in `ordered_pip`. - ### Availability -The `OrderedPipPlugin` is pre-installed in ado Docker images (both CPU and GPU -variants). It is also available in any local virtual environment that has -`ado-core` installed. However, it is switched off by default. +The `OrderedPipPlugin` is pre-installed in ado Docker images. +However, it is switched off by default. ### Enabling the Plugin @@ -351,12 +339,6 @@ variable before starting Ray: export RAY_RUNTIME_ENV_PLUGINS='[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' ``` -> [!NOTE] -> -> Even if you are using ado Docker images for your RayCluster, or if you have -> `ado-core` installed in your local virtual environment, the plugin is not -> switched on by default. You need to set the environment variable to enable it. - ### Usage Examples #### Using ordered_pip in Python Code From 51b6c5f26e0ae27caed87664b4aefed9e65d8e69 Mon Sep 17 00:00:00 2001 From: Vassilis Vassiliadis Date: Fri, 27 Mar 2026 11:32:46 +0000 Subject: [PATCH 3/3] docs(ordered_pip): decouple Kuberay docs from ordered_pip docs Signed-off-by: Vassilis Vassiliadis --- backend/kuberay/README.md | 99 ++++----------- backend/kuberay/vanilla-ray.yaml | 11 +- orchestrator/utilities/ray_env/README.md | 147 +++++++++++++++++++++++ 3 files changed, 181 insertions(+), 76 deletions(-) create mode 100644 orchestrator/utilities/ray_env/README.md diff --git a/backend/kuberay/README.md b/backend/kuberay/README.md index f1c0b5891..ce5068a5f 100644 --- a/backend/kuberay/README.md +++ b/backend/kuberay/README.md @@ -134,6 +134,8 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor num-gpus: '1' resources: '"{\"NVIDIA-A100-SXM4-80GB\": 1}"' containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' - name: OMP_NUM_THREADS value: "1" - name: OPENBLAS_NUM_THREADS @@ -173,6 +175,8 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor num-gpus: '2' resources: '"{\"NVIDIA-A100-SXM4-80GB\": 2}"' containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' - name: OMP_NUM_THREADS value: "1" - name: OPENBLAS_NUM_THREADS @@ -212,6 +216,8 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor num-gpus: '4' resources: '"{\"NVIDIA-A100-SXM4-80GB\": 4}"' containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' - name: OMP_NUM_THREADS value: "1" - name: OPENBLAS_NUM_THREADS @@ -251,6 +257,8 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor num-gpus: '8' resources: '"{\"NVIDIA-A100-SXM4-80GB\": 8, \"full-worker\": 1}"' containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' - name: OMP_NUM_THREADS value: "1" - name: OPENBLAS_NUM_THREADS @@ -305,79 +313,31 @@ with 4 Nodes each with 8 NVIDIA-A100-SXM4-80GB GPUs, 64 CPU cores, and 1TB memor ## Using the OrderedPip Ray Runtime Environment Plugin -The `OrderedPipPlugin` is a Ray RuntimeEnvPlugin that enables you to control the -build order of Python packages. This is useful when installing packages packages -with build-time dependencies. For example, `mamba-ssm` and `torch`. - -### Configuration Details - -> [!IMPORTANT] -> -> Each entry in `phases` uses the **identical schema** as Ray's standard `pip` -> runtime environment field. If you know how to configure `pip`, you already -> know how to configure each phase in `ordered_pip`. - -The `ordered_pip` runtime environment accepts a dictionary with a `phases` key: - -- **`phases`**: A list where **each element follows the exact same schema as the - standard Ray `pip` field**. This means each phase can be: - - A list of package names (e.g., `["torch==2.6.0"]`) - - A dictionary with `packages` and optional `pip_install_options` fields - - Any other valid `pip` specification format - -### Availability - -The `OrderedPipPlugin` is pre-installed in ado Docker images. -However, it is switched off by default. +The `OrderedPipPlugin` is a Ray RuntimeEnvPlugin bundled with `ado-core` that +enables you to control the build order of Python packages. This is useful when +installing packages with build-time dependencies, such as `mamba-ssm` which +requires `torch` to be installed before it can be built. ### Enabling the Plugin To enable the `OrderedPipPlugin`, set the `RAY_RUNTIME_ENV_PLUGINS` environment -variable before starting Ray: +variable before starting the Ray head node and workers: ```bash export RAY_RUNTIME_ENV_PLUGINS='[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' ``` -### Usage Examples - -#### Using ordered_pip in Python Code - -Here's a complete example showing how to use `ordered_pip` in a Ray task: - -```python -import ray - -@ray.remote( - runtime_env={ - "ordered_pip": { - "phases": [ - # Phase 1: Install PyTorch first - ["torch==2.6.0"], - # Phase 2: Install packages that depend on PyTorch during build - { - "packages": ["mamba-ssm==2.2.5"], - # IMPORTANT. - # --no-build-isolation tells pip to build the wheel - # in the same venv where torch is already installed - "pip_install_options": ["--no-build-isolation"], - } - ] - } - } -) -def my_task(): - import torch - import mamba_ssm - return torch.__version__ - -result = ray.get(my_task.remote()) -print(f"PyTorch version: {result}") -``` +When deploying a RayCluster via KubeRay, add this environment variable to both +head and worker node configurations (see examples below). + +### Documentation and Usage + +For detailed documentation, configuration details, and usage examples, see the +[OrderedPip Plugin README](https://github.com/IBM/ado/blob/main/orchestrator/utilities/ray_env/README.md). -#### Using ordered_pip with ray job submit +### Example: Using ordered_pip with ray job submit -You can also use `ordered_pip` with `ray job submit` by providing a runtime +You can use `ordered_pip` with `ray job submit` by providing a runtime environment YAML file: ```yaml @@ -403,14 +363,7 @@ Then submit your job with: ray job submit --runtime-env-json ray_runtime_env.yaml -- python my_script.py ``` -**Key points:** - -- Phases execute sequentially in the order specified -- All phases reuse the same virtual environment -- The `--no-build-isolation` flag is critical for packages that need build-time - dependencies. It instructs pip to build wheels in the existing virtual - environment rather than in an isolated one -- Package order within a phase doesn't matter, but the order of phases does - -Actuators like `SFTTrainer` automatically use `OrderedPipPlugin` when available -to ensure correct installation of their dependencies. +> [!NOTE] +> +> Actuators like `SFTTrainer` automatically use `OrderedPipPlugin` when available +> to ensure correct installation of their dependencies. diff --git a/backend/kuberay/vanilla-ray.yaml b/backend/kuberay/vanilla-ray.yaml index a1e0eca7c..cfde201f7 100644 --- a/backend/kuberay/vanilla-ray.yaml +++ b/backend/kuberay/vanilla-ray.yaml @@ -18,14 +18,17 @@ head: requests: cpu: "500m" memory: "512Mi" + containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' lifecycle: #https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/config.html#pod-and-container-lifecyle-prestophook preStop: exec: command: ["/bin/sh", "-c", "ray stop"] rayStartParams: - dashboard-host: '0.0.0.0' + dashboard-host: "0.0.0.0" num-cpus: "0" - block: 'true' + block: "true" resources: limits: cpu: 4 @@ -39,8 +42,10 @@ worker: minReplicas: 1 maxReplicas: 4 rayStartParams: - block: 'true' + block: "true" containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' - name: OMP_NUM_THREADS value: "1" - name: OPENBLAS_NUM_THREADS diff --git a/orchestrator/utilities/ray_env/README.md b/orchestrator/utilities/ray_env/README.md new file mode 100644 index 000000000..d1bb83954 --- /dev/null +++ b/orchestrator/utilities/ray_env/README.md @@ -0,0 +1,147 @@ +# OrderedPip Ray Runtime Environment Plugin + +The `OrderedPipPlugin` is a Ray RuntimeEnvPlugin that enables you to control the +build order of Python packages. This is useful when installing packages with +build-time dependencies. For example, `mamba-ssm` requires `torch` to be +installed before it can be built. We suggest using +`pip_install_options: ["--no-build-isolation"]` +which ensures that `pip` will use the same virtual environment to build and +install the wheels it builds. + +## Overview + +The plugin allows you to define multiple installation phases, where each phase +is executed sequentially. All phases install the wheels in the same virtual +environment, ensuring that packages installed in earlier phases are available +during the build of packages in later phases provided you also use +`--no-build-isolation`. + +## Availability + +The `OrderedPipPlugin` is pre-installed in ado Docker images and bundled with +`ado-core`. + +## Enabling the Plugin + +To enable the `OrderedPipPlugin`, set the `RAY_RUNTIME_ENV_PLUGINS` environment +variable before starting the Ray head and workers. + +```bash +export RAY_RUNTIME_ENV_PLUGINS='[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' +``` + +### Enabling in KubeRay + +When deploying a RayCluster via KubeRay, add the environment variable to both +head and worker node configurations: + +```yaml +head: + containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' + +worker: + containerEnv: + - name: RAY_RUNTIME_ENV_PLUGINS + value: '[{"class":"orchestrator.utilities.ray_env.ordered_pip.OrderedPipPlugin"}]' +``` + +## Configuration Details + +> [!IMPORTANT] +> +> Each entry in `phases` uses the **identical schema** as Ray's standard `pip` +> runtime environment field. If you know how to configure `pip`, you already +> know how to configure each phase in `ordered_pip`. + +The `ordered_pip` runtime environment accepts a dictionary with a `phases` key: + +- **`phases`**: Each phase can be one of: + - A list of package names (e.g., `["torch==2.6.0"]`) + - A dictionary with `packages` and optional `pip_install_options` fields + - Any other valid `pip` specification format + +## Usage Examples + +### Using ordered_pip in Python Code + +Here's a complete example showing how to use `ordered_pip` in a Ray task: + +```python +import ray + +@ray.remote( + runtime_env={ + "ordered_pip": { + "phases": [ + # Phase 1: Install PyTorch first + ["torch==2.6.0"], + # Phase 2: Install packages that depend on PyTorch during build + { + "packages": ["mamba-ssm==2.2.5"], + # IMPORTANT. + # --no-build-isolation tells pip to build the wheel + # in the same venv where torch is already installed + "pip_install_options": ["--no-build-isolation"], + } + ] + } + } +) +def my_task(): + import torch + import mamba_ssm + return torch.__version__ + +result = ray.get(my_task.remote()) +print(f"PyTorch version: {result}") +``` + +### Using ordered_pip with ray job submit + +You can also use `ordered_pip` with `ray job submit` by providing a runtime +environment YAML file: + +```yaml +# ray_runtime_env.yaml +ordered_pip: + phases: + # Phase 1: Install PyTorch first + - packages: + - torch==2.6.0 + # Phase 2: Install packages that depend on PyTorch during build + - packages: + - mamba-ssm==2.2.5 + pip_install_options: + # IMPORTANT. + # --no-build-isolation tells pip to build the wheel + # in the same venv where torch is already installed + - --no-build-isolation +``` + +Then submit your job with: + +```bash +ray job submit --runtime-env-json ray_runtime_env.yaml -- python my_script.py +``` + +## Key Points + +- **Sequential Execution**: Phases execute sequentially in the order specified +- **Shared Environment**: All phases reuse the same virtual environment +- **Build Isolation**: The `--no-build-isolation` flag is critical for packages + that need build-time dependencies. It instructs pip to build wheels in the + existing virtual environment rather than in an isolated one +- **Phase Order Matters**: Package order within a phase doesn't matter, but the + order of phases does + +## Integration with ado Actuators + +Actuators like `SFTTrainer` automatically use `OrderedPipPlugin` when available +to ensure correct installation of their dependencies. + +## Technical Details + +For implementation details, see the source code in +[`ordered_pip.py`](./ordered_pip.py).