cp: fix: update the custom vllm instructions (1116) into r0.4.0#1377
cp: fix: update the custom vllm instructions (1116) into r0.4.0#1377
fix: update the custom vllm instructions (1116) into r0.4.0#1377Conversation
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughThis PR introduces an optional custom vLLM build flow for NeMo RL by adding a conditional Docker build flag ( Changes
Sequence DiagramsequenceDiagram
participant User
participant Docker as Docker Build
participant Script as build-custom-vllm.sh
participant PyProj as pyproject.toml
participant vLLM as 3rdparty/vllm
User->>Docker: docker build --build-arg BUILD_CUSTOM_VLLM=true
Docker->>Script: Copy & conditionally execute
alt BUILD_CUSTOM_VLLM set
Script->>vLLM: Clone/verify repository
Script->>PyProj: Update with local vLLM path
PyProj->>PyProj: Add setuptools_scm dependency<br/>Unpin vllm<br/>Configure editable source<br/>Set no-build-isolation
Script->>vLLM: Build custom vLLM
Script->>Docker: Generate nemo-rl.env
Docker->>Docker: Source nemo-rl.env<br/>Continue install
else BUILD_CUSTOM_VLLM not set
Docker->>Docker: Use default vLLM installation
end
Docker-->>User: Build complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The build script rewrite introduces substantial logic density with embedded Python configuration via tomlkit, parameter scheme changes, and environmental setup. Docker conditional flow adds moderate complexity. Documentation requires verification for correctness and completeness. The heterogeneous nature of changes across build automation, Docker orchestration, and documentation necessitates separate reasoning for each component. Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.gitignore(1 hunks)docker/Dockerfile(2 hunks)docs/guides/use-custom-vllm.md(1 hunks)tools/build-custom-vllm.sh(2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tools/build-custom-vllm.sh
docs/**/*.md
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section
Files:
docs/guides/use-custom-vllm.md
🪛 LanguageTool
docs/guides/use-custom-vllm.md
[grammar] ~31-~31: There might be a mistake here.
Context: ... ## Verify Your Custom vLLM in Isolation Test your setup to ensure your custom vL...
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
| OLD_UV_PROJECT_ENVIRONMENT=$UV_PROJECT_ENVIRONMENT | ||
| unset UV_PROJECT_ENVIRONMENT | ||
| uv venv | ||
|
|
||
| # Remove all comments from requirements files to prevent use_existing_torch.py from incorrectly removing xformers | ||
| echo "Removing comments from requirements files..." | ||
| find requirements/ -name "*.txt" -type f -exec sed -i 's/#.*$//' {} \; 2>/dev/null || true | ||
| find requirements/ -name "*.txt" -type f -exec sed -i '/^[[:space:]]*$/d' {} \; 2>/dev/null || true | ||
| # Replace xformers==.* (but preserve any platform markers at the end) | ||
| # NOTE: that xformers is bumped from 0.0.30 to 0.0.31 to work with torch==2.7.1. This version may need to change to change when we upgrade torch. | ||
| find requirements/ -name "*.txt" -type f -exec sed -i -E 's/^(xformers)==[^;[:space:]]*/\1==0.0.31/' {} \; 2>/dev/null || true | ||
|
|
||
| uv run --no-project use_existing_torch.py | ||
|
|
||
| # Install dependencies | ||
| echo "Installing dependencies..." | ||
| uv pip install --upgrade pip | ||
| uv pip install numpy setuptools setuptools_scm | ||
| uv pip install torch==2.7.0 --torch-backend=cu128 | ||
| uv pip install torch==2.7.1 --torch-backend=cu128 | ||
|
|
||
| # Install vLLM using precompiled wheel | ||
| echo "Installing vLLM with precompiled wheel..." | ||
| uv pip install --no-build-isolation -e . | ||
|
|
||
| echo "Build completed successfully!" | ||
| echo "The built vLLM is available in: $BUILD_DIR" | ||
| echo "You can now update your pyproject.toml to use this local version." | ||
| echo "Follow instructions on https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/use-custom-vllm.md for how to configure your local NeMo RL environment to use this custom vLLM." | ||
|
|
||
| echo "Updating repo pyproject.toml to point vLLM to local clone..." | ||
|
|
||
| PYPROJECT_TOML="$REPO_ROOT/pyproject.toml" | ||
| if [[ ! -f "$PYPROJECT_TOML" ]]; then | ||
| echo "[ERROR] pyproject.toml not found at $PYPROJECT_TOML. This script must be run from the repo root and pyproject.toml must exist." | ||
| exit 1 | ||
| fi | ||
|
|
||
| cd "$REPO_ROOT" | ||
|
|
||
| export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT | ||
| if [[ -n "$UV_PROJECT_ENVIRONMENT" ]]; then | ||
| # We optionally set this if the project environment is outside of the project directory. | ||
| # If we do not set this then uv pip install commands will fail | ||
| export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT | ||
| fi |
There was a problem hiding this comment.
Don't dereference an unset UV_PROJECT_ENVIRONMENT under set -u.
With set -u active, the assignment on Line 51 explodes whenever UV_PROJECT_ENVIRONMENT is not already exported (the common case), so the whole script aborts before cloning vLLM. Wrap the read in safe parameter expansion and only restore the variable when it previously existed.
-OLD_UV_PROJECT_ENVIRONMENT=$UV_PROJECT_ENVIRONMENT
-unset UV_PROJECT_ENVIRONMENT
+OLD_UV_PROJECT_ENVIRONMENT=${UV_PROJECT_ENVIRONMENT-}
+if [[ -v UV_PROJECT_ENVIRONMENT ]]; then
+ unset UV_PROJECT_ENVIRONMENT
+fi
@@
-export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT
-if [[ -n "$UV_PROJECT_ENVIRONMENT" ]]; then
- # We optionally set this if the project environment is outside of the project directory.
- # If we do not set this then uv pip install commands will fail
- export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT
-fi
+if [[ -n "$OLD_UV_PROJECT_ENVIRONMENT" ]]; then
+ export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT
+ # We optionally set this if the project environment is outside of the project directory.
+ # If we do not set this then uv pip install commands will fail
+ export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| OLD_UV_PROJECT_ENVIRONMENT=$UV_PROJECT_ENVIRONMENT | |
| unset UV_PROJECT_ENVIRONMENT | |
| uv venv | |
| # Remove all comments from requirements files to prevent use_existing_torch.py from incorrectly removing xformers | |
| echo "Removing comments from requirements files..." | |
| find requirements/ -name "*.txt" -type f -exec sed -i 's/#.*$//' {} \; 2>/dev/null || true | |
| find requirements/ -name "*.txt" -type f -exec sed -i '/^[[:space:]]*$/d' {} \; 2>/dev/null || true | |
| # Replace xformers==.* (but preserve any platform markers at the end) | |
| # NOTE: that xformers is bumped from 0.0.30 to 0.0.31 to work with torch==2.7.1. This version may need to change to change when we upgrade torch. | |
| find requirements/ -name "*.txt" -type f -exec sed -i -E 's/^(xformers)==[^;[:space:]]*/\1==0.0.31/' {} \; 2>/dev/null || true | |
| uv run --no-project use_existing_torch.py | |
| # Install dependencies | |
| echo "Installing dependencies..." | |
| uv pip install --upgrade pip | |
| uv pip install numpy setuptools setuptools_scm | |
| uv pip install torch==2.7.0 --torch-backend=cu128 | |
| uv pip install torch==2.7.1 --torch-backend=cu128 | |
| # Install vLLM using precompiled wheel | |
| echo "Installing vLLM with precompiled wheel..." | |
| uv pip install --no-build-isolation -e . | |
| echo "Build completed successfully!" | |
| echo "The built vLLM is available in: $BUILD_DIR" | |
| echo "You can now update your pyproject.toml to use this local version." | |
| echo "Follow instructions on https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/use-custom-vllm.md for how to configure your local NeMo RL environment to use this custom vLLM." | |
| echo "Updating repo pyproject.toml to point vLLM to local clone..." | |
| PYPROJECT_TOML="$REPO_ROOT/pyproject.toml" | |
| if [[ ! -f "$PYPROJECT_TOML" ]]; then | |
| echo "[ERROR] pyproject.toml not found at $PYPROJECT_TOML. This script must be run from the repo root and pyproject.toml must exist." | |
| exit 1 | |
| fi | |
| cd "$REPO_ROOT" | |
| export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT | |
| if [[ -n "$UV_PROJECT_ENVIRONMENT" ]]; then | |
| # We optionally set this if the project environment is outside of the project directory. | |
| # If we do not set this then uv pip install commands will fail | |
| export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT | |
| fi | |
| OLD_UV_PROJECT_ENVIRONMENT=${UV_PROJECT_ENVIRONMENT-} | |
| if [[ -v UV_PROJECT_ENVIRONMENT ]]; then | |
| unset UV_PROJECT_ENVIRONMENT | |
| fi | |
| uv venv | |
| # Remove all comments from requirements files to prevent use_existing_torch.py from incorrectly removing xformers | |
| echo "Removing comments from requirements files..." | |
| find requirements/ -name "*.txt" -type f -exec sed -i 's/#.*$//' {} \; 2>/dev/null || true | |
| find requirements/ -name "*.txt" -type f -exec sed -i '/^[[:space:]]*$/d' {} \; 2>/dev/null || true | |
| # Replace xformers==.* (but preserve any platform markers at the end) | |
| # NOTE: that xformers is bumped from 0.0.30 to 0.0.31 to work with torch==2.7.1. This version may need to change to change when we upgrade torch. | |
| find requirements/ -name "*.txt" -type f -exec sed -i -E 's/^(xformers)==[^;[:space:]]*/\1==0.0.31/' {} \; 2>/dev/null || true | |
| uv run --no-project use_existing_torch.py | |
| # Install dependencies | |
| echo "Installing dependencies..." | |
| uv pip install --upgrade pip | |
| uv pip install numpy setuptools setuptools_scm | |
| uv pip install torch==2.7.1 --torch-backend=cu128 | |
| # Install vLLM using precompiled wheel | |
| echo "Installing vLLM with precompiled wheel..." | |
| uv pip install --no-build-isolation -e . | |
| echo "Build completed successfully!" | |
| echo "The built vLLM is available in: $BUILD_DIR" | |
| echo "Updating repo pyproject.toml to point vLLM to local clone..." | |
| PYPROJECT_TOML="$REPO_ROOT/pyproject.toml" | |
| if [[ ! -f "$PYPROJECT_TOML" ]]; then | |
| echo "[ERROR] pyproject.toml not found at $PYPROJECT_TOML. This script must be run from the repo root and pyproject.toml must exist." | |
| exit 1 | |
| fi | |
| cd "$REPO_ROOT" | |
| if [[ -n "$OLD_UV_PROJECT_ENVIRONMENT" ]]; then | |
| export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT | |
| # We optionally set this if the project environment is outside of the project directory. | |
| # If we do not set this then uv pip install commands will fail | |
| export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT | |
| fi |
beep boop [🤖]: Hi @terrykong 👋,
Summary by CodeRabbit
New Features
BUILD_CUSTOM_VLLMbuild flag for Docker to enable custom vLLM installation.VLLM_PRECOMPILED_WHEEL_LOCATION,NRL_FORCE_REBUILD_VENVS) for custom vLLM workflows.Documentation