Skip to content

[None][infra] Add K2.5 Perf Tests into CI#12931

Merged
chenfeiz0326 merged 3 commits intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/add-k2.5-to-test-ci
Apr 15, 2026
Merged

[None][infra] Add K2.5 Perf Tests into CI#12931
chenfeiz0326 merged 3 commits intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/add-k2.5-to-test-ci

Conversation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

@chenfeiz0326 chenfeiz0326 commented Apr 10, 2026

Summary by CodeRabbit

  • New Features

    • Added support for the k25_thinking_fp4 model variant (Kimi K2.5 with FP4 precision) on GB200 hardware for disaggregated inference.
  • Tests

    • Expanded performance sanity test coverage with multiple new test configurations for the new model.
    • Added benchmark configurations targeting various concurrency levels and performance scenarios.
    • Updated CI pipeline parameters for multi-node performance testing on GB200.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Updates Jenkins test parameters for SBSA multi-node Disagg PerfSanity configurations, adds model mapping for k25_thinking_fp4, extends test lists to include new kimi-k25-thinking-fp4 test cases, and introduces six benchmark configuration files for k25_thinking_fp4 disaggregated performance sanity testing on GB200 hardware.

Changes

Cohort / File(s) Summary
Jenkins Configuration
jenkins/L0_Test.groovy
Four numeric parameters updated in buildStageConfigs calls for SBSA multi-node Disagg PerfSanity stages (3→4, 8→10, 1→2, 7→9).
Model Mapping
tests/integration/defs/perf/test_perf_sanity.py
Added k25_thinking_fp4 model-name mapping to MODEL_PATH_DICT pointing to "Kimi-K2.5-NVFP4" directory.
Test Lists (GB200 Multi-GPU)
tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml
Added seven new test cases for kimi-k25-thinking-fp4 with ctx_only mode and various concurrency/context configurations.
Test Lists (GB200 Multi-Node)
tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_*.yml (4 files)
Added kimi-k25-thinking-fp4 test cases across four node/GPU configurations; existing kimi-k2-thinking entries unchanged.
Disaggregated Benchmark Configs
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_*.yaml (6 files)
Added six new YAML benchmark configurations with varying concurrency, context/generation parallelism, and input/output lengths; includes Slurm job settings, worker configuration, KV cache settings, UCX cache transceiver parameters, and MoE backend specifications.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~28 minutes

Possibly related PRs

Suggested reviewers

  • longlee0622
  • niukuo
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description is mostly empty template boilerplate with only a checklist item marked; the Description and Test Coverage sections lack substantive detail about changes, rationale, and test safeguards. Fill in the Description section explaining what K2.5/GLM5 tests are being added, why they're needed, and which CI stages/platforms are affected. Detail the Test Coverage section with specific test cases and configurations being introduced.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title clearly describes the main change: adding K2.5 (Kimi-K2.5) performance tests into the CI pipeline, which aligns with the file changes that add k25_thinking_fp4 model configurations and test entries.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml (1)

42-46: Align accuracy max_length with the 8k1k workload.

Line 46 sets max_length=4096, but this profile uses input_length=8192 (Line 24). If accuracy is later enabled, this can truncate prompts and skew validation.

Proposed adjustment
-  model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=4096
+  model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=9216
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml`
around lines 42 - 46, The accuracy block's model_args_extra currently sets
max_length=4096 which will truncate prompts for this 8k1k workload; update the
accuracy section's model_args_extra (the max_length key) to match the profile's
input_length (set max_length=8192) so accuracy runs use the full prompt length
(look for the accuracy: block and the model_args_extra line referencing
max_length).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Line 47: Update the SPDX/copyright header in test_perf_sanity.py to show the
latest modification year 2026 (replace 2025 with 2026); locate the file header
line that begins with the SPDX or copyright comment (e.g., the top-of-file
SPDX-FileCopyrightText/© line) and change the year token to 2026 so the file
header matches the recent modification.

---

Nitpick comments:
In
`@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml`:
- Around line 42-46: The accuracy block's model_args_extra currently sets
max_length=4096 which will truncate prompts for this 8k1k workload; update the
accuracy section's model_args_extra (the max_length key) to match the profile's
input_length (set max_length=8192) so accuracy runs use the full prompt length
(look for the accuracy: block and the model_args_extra line referencing
max_length).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5809c343-ca5c-4d53-9756-9edabcde1255

📥 Commits

Reviewing files that changed from the base of the PR and between 4811704 and 03db088.

📒 Files selected for processing (13)
  • jenkins/L0_Test.groovy
  • tests/integration/defs/perf/test_perf_sanity.py
  • tests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu4.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node2_gpu8.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node4_gpu16.yml
  • tests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.yml
  • tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con2048_ctx1_dep4_gen1_dep32_eplb384_mtp0_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con4_ctx1_dep4_gen1_tep4_eplb0_mtp0_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb416_mtp3_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml
  • tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml

Comment thread tests/integration/defs/perf/test_perf_sanity.py
@chenfeiz0326 chenfeiz0326 requested a review from a team as a code owner April 13, 2026 03:12
@chenfeiz0326 chenfeiz0326 requested a review from yuxianq April 13, 2026 03:12
@chenfeiz0326 chenfeiz0326 force-pushed the chenfeiz/add-k2.5-to-test-ci branch from 2837b39 to 9af44c1 Compare April 13, 2026 03:17
@chenfeiz0326 chenfeiz0326 changed the title [None][infra] Add K2.5 Perf Tests into CI [None][infra] Add K2.5 and GLM5 Perf Tests into CI Apr 13, 2026
Comment thread tests/integration/defs/perf/test_perf_sanity.py Outdated
Comment thread tests/integration/test_lists/test-db/l0_b200_multi_gpus_perf_sanity.yml Outdated
@chenfeiz0326 chenfeiz0326 changed the title [None][infra] Add K2.5 and GLM5 Perf Tests into CI [None][infra] Add K2.5 Perf Tests into CI Apr 14, 2026
@chenfeiz0326 chenfeiz0326 force-pushed the chenfeiz/add-k2.5-to-test-ci branch from e4ba01d to af55b68 Compare April 14, 2026 03:38
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-8"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43174 [ run ] triggered by Bot. Commit: af55b68 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43174 [ run ] completed with state SUCCESS. Commit: af55b68
/LLM/main/L0_MergeRequest_PR pipeline #33802 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

dc3671
dc3671 previously requested changes Apr 15, 2026
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
@chenfeiz0326 chenfeiz0326 force-pushed the chenfeiz/add-k2.5-to-test-ci branch from a1f99b0 to 12cf3a3 Compare April 15, 2026 06:00
@chenfeiz0326 chenfeiz0326 requested a review from dc3671 April 15, 2026 06:01
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43409 [ run ] triggered by Bot. Commit: 12cf3a3 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43409 [ run ] completed with state SUCCESS. Commit: 12cf3a3
/LLM/main/L0_MergeRequest_PR pipeline #33943 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
@chenfeiz0326 chenfeiz0326 dismissed dc3671’s stale review April 15, 2026 13:47

Already added use_low_precision_moe_combine: true in moe config

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "Only add CI perf sanity tests, no need to run the whole CI pipeline"

@chenfeiz0326 chenfeiz0326 enabled auto-merge (squash) April 15, 2026 14:02
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43505 [ skip ] triggered by Bot. Commit: 3de8362 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43505 [ skip ] completed with state SUCCESS. Commit: 3de8362
Skipping testing for commit 3de8362

Link to invocation

@chenfeiz0326 chenfeiz0326 merged commit 67fde60 into NVIDIA:main Apr 15, 2026
5 checks passed
chienchunhung pushed a commit to chienchunhung/TensorRT-LLM that referenced this pull request Apr 16, 2026
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants