Skip to content

[None][chore] Add Dynamo configs to TRTLLM CI - Disagg - Part 1#13167

Merged
brb-nv merged 1 commit intoNVIDIA:mainfrom
brb-nv:user/brb/mirror-dynamo-configs-in-trtllm-disagg-part1
Apr 20, 2026
Merged

[None][chore] Add Dynamo configs to TRTLLM CI - Disagg - Part 1#13167
brb-nv merged 1 commit intoNVIDIA:mainfrom
brb-nv:user/brb/mirror-dynamo-configs-in-trtllm-disagg-part1

Conversation

@brb-nv
Copy link
Copy Markdown
Collaborator

@brb-nv brb-nv commented Apr 17, 2026

Description

This MR adds Dynamo configs to TRTLLM CI to catch issues early. This MR has disagg configs for gb200.

Test Coverage

N/A

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@brb-nv brb-nv requested a review from a team as a code owner April 17, 2026 22:27
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Extended performance sanity test infrastructure by adding two new model definitions to the test configuration and introducing three new disaggregated benchmark configurations for GB200 hardware with associated test list entries. Changes are configuration and data-driven additions with no control-flow modifications.

Changes

Cohort / File(s) Summary
Model Definition Extension
tests/integration/defs/perf/test_perf_sanity.py
Added two new model entries (super_fp8 and qwen3_32b_fp8) to MODEL_PATH_DICT for resolving model names to local paths.
Test List Updates
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu1_gen1_node1_gpu4.yml, tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node2_gpu8_gen1_node2_gpu8.yml, tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.yml
Added enabled and commented-out test entries for disaggregated perf sanity benchmarks with corresponding timeout configurations.
Benchmark Configuration Files
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp0_ccb-DEFAULT.yaml, tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep8_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml, tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_8k1k_con1024_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-UCX.yaml
Added three new disaggregated GB200 benchmark configurations defining Slurm parameters, model specifications, benchmark settings (concurrency, rounds, token lengths), worker/server parallelism, KV cache settings, MoE backends, and cache transceiver configurations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description contains only the template with empty sections and unchecked checkboxes; no actual implementation details, motivation, or test coverage information was provided by the author. Fill in the Description section explaining what Dynamo configs were added and why, and the Test Coverage section listing the relevant tests that validate these configuration changes.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding Dynamo configs to TRTLLM CI for the Disagg workflow (Part 1), which aligns with the file changes showing new disaggregated performance-sanity benchmark configurations and test list updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/integration/defs/perf/test_perf_sanity.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright year on this modified Python file.

Line 1 still ends at 2025, but this file has meaningful changes in this PR and should reflect the latest year.

🔧 Proposed fix
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines: All TensorRT-LLM source files must contain an NVIDIA copyright header with the year of latest meaningful modification.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/perf/test_perf_sanity.py` at line 1, Update the SPDX
header year in the top-of-file comment for
tests/integration/defs/perf/test_perf_sanity.py from 2025 to 2026 to reflect the
latest meaningful modification; edit the first line that currently reads
"...2022-2025 NVIDIA CORPORATION & AFFILIATES." and change the end year to 2026
so the copyright range becomes 2022-2026.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Line 1: Update the SPDX header year in the top-of-file comment for
tests/integration/defs/perf/test_perf_sanity.py from 2025 to 2026 to reflect the
latest meaningful modification; edit the first line that currently reads
"...2022-2025 NVIDIA CORPORATION & AFFILIATES." and change the end year to 2026
so the copyright range becomes 2022-2026.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 58b8261c-ab58-4e65-8222-365c6144c0c2

📥 Commits

Reviewing files that changed from the base of the PR and between 813d877 and c51bff8.

📒 Files selected for processing (7)
  • tests/integration/defs/perf/test_perf_sanity.py
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu1_gen1_node1_gpu4.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node2_gpu8_gen1_node2_gpu8.yml
  • tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp0_ccb-DEFAULT.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep8_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_8k1k_con1024_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-UCX.yaml

Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
@brb-nv brb-nv force-pushed the user/brb/mirror-dynamo-configs-in-trtllm-disagg-part1 branch from 8d85337 to a447b4e Compare April 18, 2026 00:14
@brb-nv
Copy link
Copy Markdown
Collaborator Author

brb-nv commented Apr 19, 2026

/bot skip --comment "Only updating perf test configs, no need to run the whole CI pipeline."

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44228 [ skip ] triggered by Bot. Commit: a447b4e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44228 [ skip ] completed with state SUCCESS. Commit: a447b4e
Skipping testing for commit a447b4e

Link to invocation

@brb-nv brb-nv merged commit a757580 into NVIDIA:main Apr 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants