[TRTLLM-10695][ci] add verl stage in CI by Superjomn · Pull Request #11306 · NVIDIA/TensorRT-LLM

Superjomn · 2026-02-05T06:43:14Z

Changes

This PR adds a CI stage for testing trtllm rollout-related tests.

The CI stage

This is modeled after the Triton server CI. Environment setup is consolidated in tests/integration/defs/verl/test_verl_cases.py, and each VERL test file is wrapped with a dedicated test wrapper so the stage can be plugged into TRT-LLM like a standard CI stage.

Activated test cases

Test	Status	Duration
`test_adapter`	PASSED	331.7s
`test_async_server`	PASSED	252.8s
`test_rollout_utils`	PASSED	356.6s

Here's the full inventory of verl TRT-LLM tests at tag 4cda6af:

test_async_server.py (4 tests) — all enabled

Test	Requirements
`test_placement_group_with_sub_ray_resource_pool`	mocked, no GPU
`test_placement_group_with_ray_resource_pool`	mocked, no GPU
`test_async_generate`	GPU + Qwen2.5-0.5B-Instruct
`test_async_memory_management`	GPU + Qwen2.5-0.5B-Instruct

test_adapter.py (5 tests) — all enabled

Test	Requirements
`test_make_async_request_get_method`	mocked
`test_make_async_request_post_method`	mocked
`test_make_async_request_http_error`	mocked
`test_make_async_request_max_attempts_exceeded`	mocked
`test_init_without_device_mesh`	GPU + Ray + Hydra config

test_trtllm_rollout_utils.py (8 tests, 23 after parametrize) — partially enabled

Test	Requirements	Status
`test_unimodal_generate` (×3 prompts)	GPU + Qwen2.5-Math-7B	excluded (`-k not ...`)
`test_unimodal_batch_generate`	GPU + Qwen2.5-Math-7B	excluded (`-k not ...`)
`test_multimodal_generate_with_image` (×3)	GPU + Qwen2.5-VL-7B-Instruct	enabled
`test_multimodal_different_image_sizes` (×3)	GPU + Qwen2.5-VL-7B-Instruct	enabled
`test_multimodal_text_only_fallback`	GPU + Qwen2.5-VL-7B-Instruct	enabled
`test_wake_sleep_cycle`	GPU + Qwen2.5-Math-7B	enabled*

Currently excluded: test_unimodal_generate and test_unimodal_batch_generate — they require Qwen2.5-Math-7B which isn't in the CI cache.

Note: test_wake_sleep_cycle also uses Qwen2.5-Math-7B. It passed in build #29880 so it may have a fallback, but it could be a potential issue. Want me to check its implementation more closely?

Summary by CodeRabbit

Release Notes

New Features
- Added support for Verl backend integration testing with new test configurations
- Enabled Verl-based testing on DGX B200 GPUs in post-merge pipelines
Tests
- Added Verl test suite configuration with environment setup for dependency installation and build steps
- Extended test infrastructure to recognize and process Verl-specific test paths

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Superjomn · 2026-02-10T06:16:14Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

coderabbitai · 2026-02-10T06:21:23Z

📝 Walkthrough

Walkthrough

This change adds Verl backend support to the Jenkins test framework by transforming verl-prefixed test paths to actual paths, extending test configuration to recognize Verl stage names, and implementing Verl environment setup with repository cloning and configuration management. New test shards and configuration files are introduced for Verl integration testing.

Changes

Cohort / File(s)	Summary
Jenkins Groovy Script `jenkins/L0_Test.groovy`	Added Verl backend recognition in getMakoArgsFromStageName by checking for "-Verl-" in stage names. Implemented processShardTestList to transform verl:: prefixed test paths using VERL_ROOT. Extended runLLMTestlistOnPlatformImpl with Verl environment setup including verl_config.yml parsing, repo cloning, and environment variable configuration. Added "DGX_B200-4_GPUs-Verl-Post-Merge-1" test shard entries for regular and Slurm mappings.
Verl Test Configuration `tests/integration/test_lists/test-db/l0_verl.yml`	New test selection file defining conditions for 4-GPU B200 systems running post-merge tests with Verl backend and MPI orchestration. Includes single test entry with verl:: prefix for async server rollout testing.
Verl Environment Setup `tests/integration/test_lists/test-db/verl_config.yml`	New Verl CI configuration specifying repository location and tag, install commands for gdrcopy, nvshmem, DeepEP with patching, and Python dependencies. Defines environment variables (NVSHMEM_DIR, LD_LIBRARY_PATH, PATH) for container setup.

Sequence Diagram

sequenceDiagram
    participant Jenkins as Jenkins Pipeline
    participant GroovyScript as L0_Test.groovy
    participant Config as verl_config.yml
    participant Repo as Verl Repository
    participant Env as Environment Setup
    participant TestRunner as Test Execution

    Jenkins->>GroovyScript: runLLMTestlistOnPlatformImpl(stageName="-Verl-")
    GroovyScript->>Config: Read verl_config.yml
    Config-->>GroovyScript: repo_url, install_commands
    GroovyScript->>Repo: Clone Verl repository
    Repo-->>GroovyScript: Repo cloned, set VERL_ROOT
    GroovyScript->>GroovyScript: processShardTestList: Transform verl:: paths
    GroovyScript->>Env: Export environment variables
    GroovyScript->>Env: Execute install_commands (gdrcopy, nvshmem, DeepEP)
    Env-->>GroovyScript: Environment configured
    GroovyScript->>GroovyScript: getMakoArgsFromStageName: backend=verl
    GroovyScript->>TestRunner: Execute Verl tests with MPI orchestration
    TestRunner-->>Jenkins: Test results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	PR description is incomplete, missing required sections. Only provides 'Changes' and 'Activated test cases'; lacks PR title format, Description, Test Coverage, and checklist completion.	Add PR title in format [TRTLLM-10695][ci] Add VERL stage in CI, fill Description section, list Test Coverage details, and verify all PR Checklist items are addressed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main change: adding a Verl stage to the CI pipeline, with proper JIRA ticket reference and infra type notation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/integration/test_lists/test-db/verl_config.yml`:
- Around line 1-6: Remove the unused test_dir key from the verl_config YAML
since L0_Test.groovy and the codebase only read repo_url, repo_tag,
install_commands and env_vars; open the file, delete the line containing
"test_dir: \"tests\"" so the verl_config block contains only the fields actually
consumed (repo_url and repo_tag), and run a quick grep for "test_dir" to confirm
there are no remaining references.

🧹 Nitpick comments (2)

tests/integration/test_lists/test-db/verl_config.yml (1)
18-21: Hardcoded Python 3.12 path is fragile.

Lines 19 and 39 embed /usr/local/lib/python3.12/dist-packages/.... If the CI container ever moves to a different Python version, these paths will silently break. Consider deriving the path dynamically, e.g.:
- >-
  NVSHMEM_SITE=$(python3 -c "import nvidia.nvshmem; print(nvidia.nvshmem.__path__[0])")
or at minimum add a comment noting the Python 3.12 dependency.

Also applies to: 39-39
jenkins/L0_Test.groovy (1)
2704-2723: Env var resolution is hardcoded for only $LD_LIBRARY_PATH and $PATH — fragile for future additions.

Lines 2717–2718 only resolve two specific bare $VAR references. If a future env_vars entry references a different existing env var (e.g., $HOME, $CUDA_HOME), it will be left as a literal string in the Jenkins environment. The ${VAR} syntax (curly-brace) is handled generically via resolvedVars on lines 2713–2715, but the bare $VAR syntax is not.

Consider a general resolution loop over resolvedVars and env to replace any $KEY pattern, or at minimum, document that only ${...} syntax should be used in verl_config.yml:
♻️ Suggested improvement for more general resolution
-                        // Resolve references to existing env vars
-                        value = value.replace('$LD_LIBRARY_PATH', env.LD_LIBRARY_PATH ?: '')
-                        value = value.replace('$PATH', env.PATH ?: '')
+                        // Resolve any $VAR references to previously resolved vars (bare syntax)
+                        resolvedVars.each { k, v ->
+                            value = value.replace('$' + k, v)
+                        }
+                        // Resolve remaining $VAR references against Jenkins env
+                        def varPattern = /\$([A-Za-z_][A-Za-z0-9_]*)/
+                        value = value.replaceAll(varPattern) { match, varName ->
+                            env."${varName}" ?: match
+                        }

tests/integration/defs/verl/verl_config.yml

tensorrt-cicd · 2026-02-10T06:21:56Z

PR_Github #35437 [ run ] triggered by Bot. Commit: 36ff776

tensorrt-cicd · 2026-02-10T06:56:08Z

PR_Github #35437 [ run ] completed with state FAILURE. Commit: 36ff776
/LLM/main/L0_MergeRequest_PR pipeline #27371 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Superjomn · 2026-02-10T10:24:39Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

tensorrt-cicd · 2026-02-10T10:30:13Z

PR_Github #35498 [ run ] triggered by Bot. Commit: bfbd3bf

tensorrt-cicd · 2026-02-10T11:02:31Z

PR_Github #35498 [ run ] completed with state FAILURE. Commit: bfbd3bf
/LLM/main/L0_MergeRequest_PR pipeline #27408 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

tests/integration/defs/verl/verl_config.yml

tests/integration/test_lists/test-db/l0_verl.yml

hchings · 2026-02-10T20:25:37Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

hchings · 2026-02-10T22:38:58Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

tensorrt-cicd · 2026-02-10T22:44:55Z

PR_Github #35550 [ run ] triggered by Bot. Commit: 0ad1836

tensorrt-cicd · 2026-02-10T23:29:26Z

PR_Github #35550 [ run ] completed with state FAILURE. Commit: 0ad1836
/LLM/main/L0_MergeRequest_PR pipeline #27454 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

jenkins/L0_Test.groovy

Superjomn · 2026-02-13T08:50:13Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

tensorrt-cicd · 2026-02-13T08:55:51Z

PR_Github #35896 [ run ] triggered by Bot. Commit: 70a215b

tensorrt-cicd · 2026-02-13T09:28:41Z

PR_Github #35896 [ run ] completed with state FAILURE. Commit: 70a215b
/LLM/main/L0_MergeRequest_PR pipeline #27721 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Superjomn · 2026-02-24T01:41:17Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

Tests with the verl:: prefix live in the external verl repository and are only available at Jenkins runtime (resolved to ${VERL_ROOT}/ by L0_Test.groovy). The local pre-merge validation script has no access to that repo, so these entries were flagged as invalid. Filter them out before pytest collection so the CI check passes cleanly. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com> Made-with: Cursor

Signed-off-by: Chunwei Yan <yanchunwei@outlook.com> Made-with: Cursor

…_config.yml The verl conftest.py runs install commands via subprocess.run(shell=True), which uses /bin/sh. pushd/popd are bash builtins and fail with exit code 127 under /bin/sh. Replace with POSIX-compatible (cd dir && ...) subshells. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com> Made-with: Cursor

Replace the verl:: prefix mechanism with a local wrapper test file that invokes verl repo tests via subprocess, eliminating the need for special CI infrastructure to handle external test paths. Signed-off-by: Chunwei Yan <chunweiy@nvidia.com> Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

…L_ROOT env Clone the verl repo into tests/integration/defs/verl/verl_repo/ so the wrapper test discovers it by relative path (__file__), avoiding Jenkins env var propagation issues in Docker-on-Slurm execution. Signed-off-by: Chunwei Yan <chunweiy@nvidia.com> Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

… fixture The Verl stage runs via the sbatch path which does not execute runLLMTestlistOnPlatformImpl, so the Groovy setup block never ran. Move all setup (env vars, dependency install, repo clone) into a session-scoped pytest fixture in test_verl_cases.py, following the triton-server-ci pattern. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

The verl test_async_server.py imports ray, which was not listed in verl_config.yml install_commands. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

The verl test imports verl.single_controller which requires the verl package to be installed. Add pip install -e after cloning the repo. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

Hydra resolves config paths relative to cwd. The verl tests need cwd=VERL_ROOT so the trainer/config directory is found correctly. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

The verl test uses TRTLLM_TEST_MODEL_PATH_ROOT to locate model weights (defaults to ~/models). In CI, models are at /scratch.trt_llm_data/llm-models. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

The verl test needs Qwen/Qwen2.5-0.5B-Instruct at a local path. Add model download step using huggingface_hub.snapshot_download to TRTLLM_TEST_MODEL_PATH_ROOT before running tests. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

The pinned verl commit 4ef45d0 uses OpenAIServer(llm=...) but TRT-LLM now expects OpenAIServer(generator=...). Update to 4cda6af which has the compatible API call. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

Add wrapper tests for test_adapter.py (HTTP adapter + server init) and test_trtllm_rollout_utils.py (multimodal rollout + lifecycle). Unimodal tests requiring Qwen2.5-Math-7B are excluded via -k filter since the model is not in the CI cache. Use CI model cache paths with symlinks to bridge HF-style naming to flat CI cache structure. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

The CI model cache at /scratch.trt_llm_data/llm-models is read-only. Instead of creating symlinks there, use /tmp/verl-models as a writable staging directory with symlinks pointing back to the read-only cache. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

Superjomn · 2026-03-11T13:17:04Z

/bot run --stage-list "DGX_B200-4_GPUs-Verl-Post-Merge-1"

tensorrt-cicd · 2026-03-11T13:22:52Z

PR_Github #38589 [ run ] triggered by Bot. Commit: 828a891 Link to invocation

tensorrt-cicd · 2026-03-11T14:23:43Z

PR_Github #38589 [ run ] completed with state SUCCESS. Commit: 828a891
/LLM/main/L0_MergeRequest_PR pipeline #29925 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

ZhanruiSunCh

LGTM for L0_Test.groovy. If you want this stage be auto triggerd in pre-merge, you need modify here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/jenkins/L0_MergeRequest.groovy#L642-L647

hchings

LGTM

Superjomn · 2026-03-12T05:38:05Z

/bot run

tensorrt-cicd · 2026-03-12T05:43:55Z

PR_Github #38676 [ run ] triggered by Bot. Commit: 828a891 Link to invocation

tensorrt-cicd · 2026-03-12T08:19:01Z

PR_Github #38676 [ run ] completed with state SUCCESS. Commit: 828a891
/LLM/main/L0_MergeRequest_PR pipeline #29999 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Superjomn · 2026-03-12T12:14:19Z

/bot run

tensorrt-cicd · 2026-03-12T12:20:45Z

PR_Github #38727 [ run ] triggered by Bot. Commit: 828a891 Link to invocation

tensorrt-cicd · 2026-03-12T19:04:32Z

PR_Github #38727 [ run ] completed with state SUCCESS. Commit: 828a891
/LLM/main/L0_MergeRequest_PR pipeline #30045 completed with status: 'SUCCESS'

CI Report

Link to invocation

Superjomn requested review from a team as code owners February 5, 2026 06:43

Superjomn requested review from mlefeb01 and ruodil February 5, 2026 06:43

Superjomn marked this pull request as draft February 5, 2026 06:43

Superjomn force-pushed the add-verl-stage branch 2 times, most recently from d1343f3 to 36ff776 Compare February 10, 2026 06:02

Superjomn marked this pull request as ready for review February 10, 2026 06:16

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

tests/integration/defs/verl/verl_config.yml Show resolved Hide resolved

Superjomn force-pushed the add-verl-stage branch from 36ff776 to bfbd3bf Compare February 10, 2026 10:24

hchings reviewed Feb 10, 2026

View reviewed changes

mlefeb01 approved these changes Feb 11, 2026

View reviewed changes

jenkins/L0_Test.groovy Outdated Show resolved Hide resolved

Superjomn force-pushed the add-verl-stage branch 2 times, most recently from 1aa18da to 70a215b Compare February 13, 2026 08:49

Superjomn force-pushed the add-verl-stage branch from 70a215b to 0128576 Compare February 24, 2026 01:41

Superjomn added 15 commits March 11, 2026 18:23

[None][fix] apply yapf formatting to check_test_list.py

3640a44

Signed-off-by: Chunwei Yan <yanchunwei@outlook.com> Made-with: Cursor

[None][fix] change verl orchestrator from ray to mpi in l0_verl.yml

e11eccb

Signed-off-by: Chunwei Yan <yanchunwei@outlook.com> Made-with: Cursor

[None][fix] Add ray to verl CI install commands

800b7f9

The verl test_async_server.py imports ray, which was not listed in verl_config.yml install_commands. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

[None][fix] Install verl package after cloning repo in pytest fixture

243f0f9

The verl test imports verl.single_controller which requires the verl package to be installed. Add pip install -e after cloning the repo. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

[None][fix] Set cwd to verl repo root when running verl tests

d05eb01

Hydra resolves config paths relative to cwd. The verl tests need cwd=VERL_ROOT so the trainer/config directory is found correctly. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

[None][fix] Set TRTLLM_TEST_MODEL_PATH_ROOT for verl CI

b43e614

The verl test uses TRTLLM_TEST_MODEL_PATH_ROOT to locate model weights (defaults to ~/models). In CI, models are at /scratch.trt_llm_data/llm-models. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

[None][fix] Update verl repo tag for OpenAIServer API compat

8d6e28f

The pinned verl commit 4ef45d0 uses OpenAIServer(llm=...) but TRT-LLM now expects OpenAIServer(generator=...). Update to 4cda6af which has the compatible API call. Signed-off-by: Chunwei Yan <yanchunwei@outlook.com>

Superjomn force-pushed the add-verl-stage branch from 0cd4b27 to 828a891 Compare March 11, 2026 13:17

ZhanruiSunCh approved these changes Mar 11, 2026

View reviewed changes

hchings approved these changes Mar 11, 2026

View reviewed changes

Superjomn enabled auto-merge (squash) March 13, 2026 00:52

Superjomn merged commit 0507609 into NVIDIA:main Mar 13, 2026
5 checks passed

Superjomn deleted the add-verl-stage branch March 13, 2026 00:55

Conversation

Superjomn commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

The CI stage

Activated test cases

test_async_server.py (4 tests) — all enabled

test_adapter.py (5 tests) — all enabled

test_trtllm_rollout_utils.py (8 tests, 23 after parametrize) — partially enabled

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

Superjomn commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

Superjomn commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hchings commented Feb 10, 2026

Uh oh!

hchings commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

tensorrt-cicd commented Feb 10, 2026

Uh oh!

Uh oh!

Superjomn commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

tensorrt-cicd commented Feb 13, 2026

Uh oh!

Superjomn commented Feb 24, 2026

Uh oh!

Superjomn commented Mar 11, 2026

Uh oh!

tensorrt-cicd commented Mar 11, 2026

Uh oh!

tensorrt-cicd commented Mar 11, 2026

Uh oh!

ZhanruiSunCh left a comment

Choose a reason for hiding this comment

Uh oh!

hchings left a comment

Choose a reason for hiding this comment

Uh oh!

Superjomn commented Mar 12, 2026

Uh oh!

tensorrt-cicd commented Mar 12, 2026

Uh oh!

Superjomn commented Feb 5, 2026 •

edited

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading