cp: `fix: update the custom vllm instructions (1116)` into `r0.4.0` by chtruong814 · Pull Request #1377 · NVIDIA-NeMo/RL

chtruong814 · 2025-10-16T17:02:27Z

beep boop [🤖]: Hi @terrykong 👋,

we've cherry picked #1116 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added optional BUILD_CUSTOM_VLLM build flag for Docker to enable custom vLLM installation.
- Introduced new environment variables (VLLM_PRECOMPILED_WHEEL_LOCATION, NRL_FORCE_REBUILD_VENVS) for custom vLLM workflows.
Documentation
- Updated guides with streamlined custom vLLM build process, new parameter signatures, and Docker rebuild instructions.
- Added expanded verification steps and example commands for validating custom vLLM installations.

Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2025-10-16T17:02:57Z

📝 Walkthrough

Walkthrough

This PR introduces an optional custom vLLM build flow for NeMo RL by adding a conditional Docker build flag (BUILD_CUSTOM_VLLM), substantially rewriting the build script with new parameter handling and pyproject.toml modification capabilities, and providing comprehensive documentation for the workflow.

Changes

Cohort / File(s)	Summary
Git configuration `.gitignore`	Updated pattern from `.git` to `/.git` for repository-root specificity; removed `3rdparty/vllm` entry.
Docker build system `docker/Dockerfile`	Added optional `BUILD_CUSTOM_VLLM` build argument to the hermetic stage; copies `tools/build-custom-vllm.sh` and conditionally executes custom vLLM build script with environment sourcing from `3rdparty/vllm/nemo-rl.env` when flag is set.
Build automation `tools/build-custom-vllm.sh`	Replaced hard-coded defaults with explicit parameters (`GIT_URL`, `GIT_REF`, `VLLM_WHEEL_COMMIT`); added idempotent pyproject.toml update logic via embedded Python script using tomlkit to configure local vLLM path, unpinn vllm dependencies, ensure setuptools_scm inclusion, and set no-build-isolation flags; added repository root discovery and environment file generation for Docker integration.
Documentation `docs/guides/use-custom-vllm.md`	Restructured and expanded usage guide with updated script signature, condensed workflow steps, detailed verification procedures, new environment variable guidance (`VLLM_PRECOMPILED_WHEEL_LOCATION`, `NRL_FORCE_REBUILD_VENVS`), and Docker rebuild instructions with `BUILD_CUSTOM_VLLM` flag example.

Sequence Diagram

sequenceDiagram
    participant User
    participant Docker as Docker Build
    participant Script as build-custom-vllm.sh
    participant PyProj as pyproject.toml
    participant vLLM as 3rdparty/vllm

    User->>Docker: docker build --build-arg BUILD_CUSTOM_VLLM=true
    Docker->>Script: Copy & conditionally execute
    alt BUILD_CUSTOM_VLLM set
        Script->>vLLM: Clone/verify repository
        Script->>PyProj: Update with local vLLM path
        PyProj->>PyProj: Add setuptools_scm dependency<br/>Unpin vllm<br/>Configure editable source<br/>Set no-build-isolation
        Script->>vLLM: Build custom vLLM
        Script->>Docker: Generate nemo-rl.env
        Docker->>Docker: Source nemo-rl.env<br/>Continue install
    else BUILD_CUSTOM_VLLM not set
        Docker->>Docker: Use default vLLM installation
    end
    Docker-->>User: Build complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

The build script rewrite introduces substantial logic density with embedded Python configuration via tomlkit, parameter scheme changes, and environmental setup. Docker conditional flow adds moderate complexity. Documentation requires verification for correctness and completeness. The heterogeneous nature of changes across build automation, Docker orchestration, and documentation necessitates separate reasoning for each component.

Possibly related PRs

NVIDIA-NeMo/RL#1299: Modifies Docker build flow to adjust vLLM-related build and install steps, complementing the custom vLLM build infrastructure introduced here.

Suggested labels

CI:docs, r0.4.0

Suggested reviewers

terrykong
yfw

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The title, though referencing the cherry-pick and change, is cluttered with backticks, a PR number, and target branch details, making it overly verbose and not a clear, concise summary of the primary update to the custom vLLM instructions.	Please simplify the title to clearly and concisely state the main change without backticks or branch/PR metadata, for example “Update custom vLLM instructions for r0.4.0 release.”

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Test Results For Major Changes	✅ Passed	The changes in this PR are limited to build configuration, scripts, and documentation for an optional custom vLLM installation flow and do not modify core algorithmic code, numerical behavior, convergence, or performance of the library. As these are primarily tooling and documentation updates rather than new features or breaking changes to the runtime, they are considered minor and do not require performance benchmarks or regression test data in the PR description.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-1116-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9088e55 and 3c0b098.

📒 Files selected for processing (4)

.gitignore (1 hunks)
docker/Dockerfile (2 hunks)
docs/guides/use-custom-vllm.md (1 hunks)
tools/build-custom-vllm.sh (2 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

tools/build-custom-vllm.sh

docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

docs/guides/use-custom-vllm.md

🪛 LanguageTool

docs/guides/use-custom-vllm.md

[grammar] ~31-~31: There might be a mistake here.
Context: ... ## Verify Your Custom vLLM in Isolation Test your setup to ensure your custom vL...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

coderabbitai · 2025-10-16T17:09:20Z

tools/build-custom-vllm.sh

+OLD_UV_PROJECT_ENVIRONMENT=$UV_PROJECT_ENVIRONMENT
+unset UV_PROJECT_ENVIRONMENT
 uv venv

 # Remove all comments from requirements files to prevent use_existing_torch.py from incorrectly removing xformers
 echo "Removing comments from requirements files..."
 find requirements/ -name "*.txt" -type f -exec sed -i 's/#.*$//' {} \; 2>/dev/null || true
 find requirements/ -name "*.txt" -type f -exec sed -i '/^[[:space:]]*$/d' {} \; 2>/dev/null || true
+# Replace xformers==.* (but preserve any platform markers at the end)
+# NOTE: that xformers is bumped from 0.0.30 to 0.0.31 to work with torch==2.7.1. This version may need to change to change when we upgrade torch.
+find requirements/ -name "*.txt" -type f -exec sed -i -E 's/^(xformers)==[^;[:space:]]*/\1==0.0.31/' {} \; 2>/dev/null || true

 uv run --no-project use_existing_torch.py

 # Install dependencies
 echo "Installing dependencies..."
 uv pip install --upgrade pip
 uv pip install numpy setuptools setuptools_scm
-uv pip install torch==2.7.0 --torch-backend=cu128
+uv pip install torch==2.7.1 --torch-backend=cu128

 # Install vLLM using precompiled wheel
 echo "Installing vLLM with precompiled wheel..."
 uv pip install --no-build-isolation -e .

 echo "Build completed successfully!"
 echo "The built vLLM is available in: $BUILD_DIR"
-echo "You can now update your pyproject.toml to use this local version."
-echo "Follow instructions on https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/use-custom-vllm.md for how to configure your local NeMo RL environment to use this custom vLLM."
+
+echo "Updating repo pyproject.toml to point vLLM to local clone..."
+
+PYPROJECT_TOML="$REPO_ROOT/pyproject.toml"
+if [[ ! -f "$PYPROJECT_TOML" ]]; then
+  echo "[ERROR] pyproject.toml not found at $PYPROJECT_TOML. This script must be run from the repo root and pyproject.toml must exist."
+  exit 1
+fi
+
+cd "$REPO_ROOT"
+
+export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT
+if [[ -n "$UV_PROJECT_ENVIRONMENT" ]]; then
+    # We optionally set this if the project environment is outside of the project directory.
+    # If we do not set this then uv pip install commands will fail
+    export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT
+fi


⚠️ Potential issue | 🔴 Critical

Don't dereference an unset UV_PROJECT_ENVIRONMENT under set -u.

With set -u active, the assignment on Line 51 explodes whenever UV_PROJECT_ENVIRONMENT is not already exported (the common case), so the whole script aborts before cloning vLLM. Wrap the read in safe parameter expansion and only restore the variable when it previously existed.

-OLD_UV_PROJECT_ENVIRONMENT=$UV_PROJECT_ENVIRONMENT -unset UV_PROJECT_ENVIRONMENT +OLD_UV_PROJECT_ENVIRONMENT=${UV_PROJECT_ENVIRONMENT-} +if [[ -v UV_PROJECT_ENVIRONMENT ]]; then + unset UV_PROJECT_ENVIRONMENT +fi @@ -export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT -if [[ -n "$UV_PROJECT_ENVIRONMENT" ]]; then - # We optionally set this if the project environment is outside of the project directory. - # If we do not set this then uv pip install commands will fail - export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT -fi +if [[ -n "$OLD_UV_PROJECT_ENVIRONMENT" ]]; then + export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT + # We optionally set this if the project environment is outside of the project directory. + # If we do not set this then uv pip install commands will fail + export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

OLD_UV_PROJECT_ENVIRONMENT=$UV_PROJECT_ENVIRONMENT

unset UV_PROJECT_ENVIRONMENT

uv venv

# Remove all comments from requirements files to prevent use_existing_torch.py from incorrectly removing xformers

echo "Removing comments from requirements files..."

find requirements/ -name "*.txt" -type f -exec sed -i 's/#.*$//' {} \; 2>/dev/null || true

find requirements/ -name "*.txt" -type f -exec sed -i '/^[[:space:]]*$/d' {} \; 2>/dev/null || true

# Replace xformers==.* (but preserve any platform markers at the end)

# NOTE: that xformers is bumped from 0.0.30 to 0.0.31 to work with torch==2.7.1. This version may need to change to change when we upgrade torch.

find requirements/ -name "*.txt" -type f -exec sed -i -E 's/^(xformers)==[^;[:space:]]*/\1==0.0.31/' {} \; 2>/dev/null || true

uv run --no-project use_existing_torch.py

# Install dependencies

echo "Installing dependencies..."

uv pip install --upgrade pip

uv pip install numpy setuptools setuptools_scm

uv pip install torch==2.7.0 --torch-backend=cu128

uv pip install torch==2.7.1 --torch-backend=cu128

# Install vLLM using precompiled wheel

echo "Installing vLLM with precompiled wheel..."

uv pip install --no-build-isolation -e .

echo "Build completed successfully!"

echo "The built vLLM is available in: $BUILD_DIR"

echo "You can now update your pyproject.toml to use this local version."

echo "Follow instructions on https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/use-custom-vllm.md for how to configure your local NeMo RL environment to use this custom vLLM."

echo "Updating repo pyproject.toml to point vLLM to local clone..."

PYPROJECT_TOML="$REPO_ROOT/pyproject.toml"

if [[ ! -f "$PYPROJECT_TOML" ]]; then

echo "[ERROR] pyproject.toml not found at $PYPROJECT_TOML. This script must be run from the repo root and pyproject.toml must exist."

exit 1

fi

cd "$REPO_ROOT"

export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT

if [[ -n "$UV_PROJECT_ENVIRONMENT" ]]; then

# We optionally set this if the project environment is outside of the project directory.

# If we do not set this then uv pip install commands will fail

export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT

fi

OLD_UV_PROJECT_ENVIRONMENT=${UV_PROJECT_ENVIRONMENT-}

if [[ -v UV_PROJECT_ENVIRONMENT ]]; then

unset UV_PROJECT_ENVIRONMENT

fi

uv venv

# Remove all comments from requirements files to prevent use_existing_torch.py from incorrectly removing xformers

echo "Removing comments from requirements files..."

find requirements/ -name "*.txt" -type f -exec sed -i 's/#.*$//' {} \; 2>/dev/null || true

find requirements/ -name "*.txt" -type f -exec sed -i '/^[[:space:]]*$/d' {} \; 2>/dev/null || true

# Replace xformers==.* (but preserve any platform markers at the end)

# NOTE: that xformers is bumped from 0.0.30 to 0.0.31 to work with torch==2.7.1. This version may need to change to change when we upgrade torch.

find requirements/ -name "*.txt" -type f -exec sed -i -E 's/^(xformers)==[^;[:space:]]*/\1==0.0.31/' {} \; 2>/dev/null || true

uv run --no-project use_existing_torch.py

# Install dependencies

echo "Installing dependencies..."

uv pip install --upgrade pip

uv pip install numpy setuptools setuptools_scm

uv pip install torch==2.7.1 --torch-backend=cu128

# Install vLLM using precompiled wheel

echo "Installing vLLM with precompiled wheel..."

uv pip install --no-build-isolation -e .

echo "Build completed successfully!"

echo "The built vLLM is available in: $BUILD_DIR"

echo "Updating repo pyproject.toml to point vLLM to local clone..."

PYPROJECT_TOML="$REPO_ROOT/pyproject.toml"

if [[ ! -f "$PYPROJECT_TOML" ]]; then

echo "[ERROR] pyproject.toml not found at $PYPROJECT_TOML. This script must be run from the repo root and pyproject.toml must exist."

exit 1

fi

cd "$REPO_ROOT"

if [[ -n "$OLD_UV_PROJECT_ENVIRONMENT" ]]; then

export UV_PROJECT_ENVIRONMENT=$OLD_UV_PROJECT_ENVIRONMENT

# We optionally set this if the project environment is outside of the project directory.

# If we do not set this then uv pip install commands will fail

export VIRTUAL_ENV=$UV_PROJECT_ENVIRONMENT

fi

…1377) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>

fix: update the custom vllm instructions (#1116)

3c0b098

Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

chtruong814 requested a review from a team as a code owner October 16, 2025 17:02

chtruong814 requested a review from terrykong October 16, 2025 17:02

chtruong814 requested review from a team as code owners October 16, 2025 17:02

chtruong814 added cherry-pick Run CICD labels Oct 16, 2025

github-actions bot added the documentation Improvements or additions to documentation label Oct 16, 2025

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

terrykong added the CI:docs Run doctest label Oct 16, 2025

terrykong enabled auto-merge (squash) October 16, 2025 17:14

terrykong approved these changes Oct 16, 2025

View reviewed changes

terrykong temporarily deployed to nemo-ci October 16, 2025 17:14 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci October 16, 2025 17:30 — with GitHub Actions Inactive

terrykong merged commit 6989bc3 into r0.4.0 Oct 16, 2025
66 of 69 checks passed

terrykong deleted the cherry-pick-1116-r0.4.0 branch October 16, 2025 17:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `fix: update the custom vllm instructions (1116)` into `r0.4.0`#1377

cp: `fix: update the custom vllm instructions (1116)` into `r0.4.0`#1377
terrykong merged 1 commit intor0.4.0from
cherry-pick-1116-r0.4.0

chtruong814 commented Oct 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 16, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chtruong814 commented Oct 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chtruong814 commented Oct 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 16, 2025 •

edited

Loading