[None][infra] Add K2.5 Perf Tests into CI by chenfeiz0326 · Pull Request #12931 · NVIDIA/TensorRT-LLM

chenfeiz0326 · 2026-04-10T10:05:12Z

Summary by CodeRabbit

New Features
- Added support for the k25_thinking_fp4 model variant (Kimi K2.5 with FP4 precision) on GB200 hardware for disaggregated inference.
Tests
- Expanded performance sanity test coverage with multiple new test configurations for the new model.
- Added benchmark configurations targeting various concurrency levels and performance scenarios.
- Updated CI pipeline parameters for multi-node performance testing on GB200.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-10T10:10:02Z

📝 Walkthrough

Walkthrough

Updates Jenkins test parameters for SBSA multi-node Disagg PerfSanity configurations, adds model mapping for k25_thinking_fp4, extends test lists to include new kimi-k25-thinking-fp4 test cases, and introduces six benchmark configuration files for k25_thinking_fp4 disaggregated performance sanity testing on GB200 hardware.

Changes

Cohort / File(s)	Summary
Jenkins Configuration `jenkins/L0_Test.groovy`	Four numeric parameters updated in buildStageConfigs calls for SBSA multi-node Disagg PerfSanity stages (3→4, 8→10, 1→2, 7→9).
Model Mapping `tests/integration/defs/perf/test_perf_sanity.py`	Added k25_thinking_fp4 model-name mapping to MODEL_PATH_DICT pointing to "Kimi-K2.5-NVFP4" directory.
Test Lists (GB200 Multi-GPU) `tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml`	Added seven new test cases for kimi-k25-thinking-fp4 with ctx_only mode and various concurrency/context configurations.
Test Lists (GB200 Multi-Node) `tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_*.yml` (4 files)	Added kimi-k25-thinking-fp4 test cases across four node/GPU configurations; existing kimi-k2-thinking entries unchanged.
Disaggregated Benchmark Configs `tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_*.yaml` (6 files)	Added six new YAML benchmark configurations with varying concurrency, context/generation parallelism, and input/output lengths; includes Slurm job settings, worker configuration, KV cache settings, UCX cache transceiver parameters, and MoE backend specifications.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

[TRTLLM-8263][feat] Add Disagg Perf Tests #10912: Introduced the same Jenkins test-generation logic and disaggregated perf test configuration framework being extended in this PR.

Suggested reviewers

longlee0622
niukuo

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description is mostly empty template boilerplate with only a checklist item marked; the Description and Test Coverage sections lack substantive detail about changes, rationale, and test safeguards.	Fill in the Description section explaining what K2.5/GLM5 tests are being added, why they're needed, and which CI stages/platforms are affected. Detail the Test Coverage section with specific test cases and configurations being introduced.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check	✅ Passed	The title clearly describes the main change: adding K2.5 (Kimi-K2.5) performance tests into the CI pipeline, which aligns with the file changes that add k25_thinking_fp4 model configurations and test entries.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml (1)
42-46: Align accuracy max_length with the 8k1k workload.

Line 46 sets max_length=4096, but this profile uses input_length=8192 (Line 24). If accuracy is later enabled, this can truncate prompts and skew validation.
Proposed adjustment
-  model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=4096
+  model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=9216
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml`
around lines 42 - 46, The accuracy block's model_args_extra currently sets
max_length=4096 which will truncate prompts for this 8k1k workload; update the
accuracy section's model_args_extra (the max_length key) to match the profile's
input_length (set max_length=8192) so accuracy runs use the full prompt length
(look for the accuracy: block and the model_args_extra line referencing
max_length).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Line 47: Update the SPDX/copyright header in test_perf_sanity.py to show the
latest modification year 2026 (replace 2025 with 2026); locate the file header
line that begins with the SPDX or copyright comment (e.g., the top-of-file
SPDX-FileCopyrightText/© line) and change the year token to 2026 so the file
header matches the recent modification.

---

Nitpick comments:
In
`@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml`:
- Around line 42-46: The accuracy block's model_args_extra currently sets
max_length=4096 which will truncate prompts for this 8k1k workload; update the
accuracy section's model_args_extra (the max_length key) to match the profile's
input_length (set max_length=8192) so accuracy runs use the full prompt length
(look for the accuracy: block and the model_args_extra line referencing
max_length).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5809c343-ca5c-4d53-9756-9edabcde1255

📥 Commits

Reviewing files that changed from the base of the PR and between 4811704 and 03db088.

📒 Files selected for processing (13)

jenkins/L0_Test.groovy
tests/integration/defs/perf/test_perf_sanity.py
tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu4.yml
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node2_gpu8.yml
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node4_gpu16.yml
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.yml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con2048_ctx1_dep4_gen1_dep32_eplb384_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con4_ctx1_dep4_gen1_tep4_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb416_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml

chenfeiz0326 · 2026-04-14T04:36:16Z

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-8"

tensorrt-cicd · 2026-04-14T04:44:02Z

PR_Github #43174 [ run ] triggered by Bot. Commit: af55b68 Link to invocation

tensorrt-cicd · 2026-04-14T09:31:37Z

PR_Github #43174 [ run ] completed with state SUCCESS. Commit: af55b68
/LLM/main/L0_MergeRequest_PR pipeline #33802 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 · 2026-04-15T06:03:44Z

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7"

tensorrt-cicd · 2026-04-15T06:10:54Z

PR_Github #43409 [ run ] triggered by Bot. Commit: 12cf3a3 Link to invocation

tensorrt-cicd · 2026-04-15T11:39:27Z

PR_Github #43409 [ run ] completed with state SUCCESS. Commit: 12cf3a3
/LLM/main/L0_MergeRequest_PR pipeline #33943 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

Already added use_low_precision_moe_combine: true in moe config

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 · 2026-04-15T14:02:10Z

/bot skip --comment "Only add CI perf sanity tests, no need to run the whole CI pipeline"

tensorrt-cicd · 2026-04-15T14:09:03Z

PR_Github #43505 [ skip ] triggered by Bot. Commit: 3de8362 Link to invocation

tensorrt-cicd · 2026-04-15T14:18:44Z

PR_Github #43505 [ skip ] completed with state SUCCESS. Commit: 3de8362
Skipping testing for commit 3de8362

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 requested review from a team as code owners April 10, 2026 10:05

chenfeiz0326 requested review from EmmaQiaoCh and mlefeb01 April 10, 2026 10:05

github-actions bot assigned chenfeiz0326 Apr 10, 2026

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

Comment thread tests/integration/defs/perf/test_perf_sanity.py

ruodil approved these changes Apr 13, 2026

View reviewed changes

chenfeiz0326 requested a review from a team as a code owner April 13, 2026 03:12

chenfeiz0326 requested a review from yuxianq April 13, 2026 03:12

chenfeiz0326 force-pushed the chenfeiz/add-k2.5-to-test-ci branch from 2837b39 to 9af44c1 Compare April 13, 2026 03:17

chenfeiz0326 changed the title ~~[None][infra] Add K2.5 Perf Tests into CI~~ [None][infra] Add K2.5 and GLM5 Perf Tests into CI Apr 13, 2026

yuxianq reviewed Apr 13, 2026

View reviewed changes

Comment thread tests/integration/defs/perf/test_perf_sanity.py Outdated

yuxianq reviewed Apr 13, 2026

View reviewed changes

Comment thread tests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.yml Outdated

chenfeiz0326 changed the title ~~[None][infra] Add K2.5 and GLM5 Perf Tests into CI~~ [None][infra] Add K2.5 Perf Tests into CI Apr 14, 2026

chenfeiz0326 force-pushed the chenfeiz/add-k2.5-to-test-ci branch from e4ba01d to af55b68 Compare April 14, 2026 03:38

chenfeiz0326 requested a review from ZhanruiSunCh April 14, 2026 06:11

dc3671 previously requested changes Apr 15, 2026

View reviewed changes

Comment thread ...f-sanity/disaggregated/gb200_glm-5-fp4_1k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml

Comment thread ...f-sanity/disaggregated/gb200_glm-5-fp4_1k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml

update

12cf3a3

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 force-pushed the chenfeiz/add-k2.5-to-test-ci branch from a1f99b0 to 12cf3a3 Compare April 15, 2026 06:00

chenfeiz0326 requested a review from dc3671 April 15, 2026 06:01

update

d6eaeac

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

ZhanruiSunCh approved these changes Apr 15, 2026

View reviewed changes

update

3de8362

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 enabled auto-merge (squash) April 15, 2026 14:02

chenfeiz0326 merged commit 67fde60 into NVIDIA:main Apr 15, 2026
5 checks passed

chienchunhung pushed a commit to chienchunhung/TensorRT-LLM that referenced this pull request Apr 16, 2026

[None][infra] Add K2.5 Perf Tests into CI (NVIDIA#12931)

aab6985

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

coderabbitai bot mentioned this pull request Apr 18, 2026

[None][chore] Add Dynamo configs to TRTLLM CI - Agg #13171

Open

1 task

Conversation

chenfeiz0326 commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenfeiz0326 commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

chenfeiz0326 commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

chenfeiz0326 commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

chenfeiz0326 commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading