[None][infra] Add K2.5 Perf Tests into CI#12931
Conversation
📝 WalkthroughWalkthroughUpdates Jenkins test parameters for SBSA multi-node Disagg PerfSanity configurations, adds model mapping for k25_thinking_fp4, extends test lists to include new kimi-k25-thinking-fp4 test cases, and introduces six benchmark configuration files for k25_thinking_fp4 disaggregated performance sanity testing on GB200 hardware. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~28 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml (1)
42-46: Align accuracymax_lengthwith the 8k1k workload.Line 46 sets
max_length=4096, but this profile usesinput_length=8192(Line 24). If accuracy is later enabled, this can truncate prompts and skew validation.Proposed adjustment
- model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=4096 + model_args_extra: num_concurrent=512,max_retries=3,tokenized_requests=false,timeout=1200,max_gen_toks=256,max_length=9216🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml` around lines 42 - 46, The accuracy block's model_args_extra currently sets max_length=4096 which will truncate prompts for this 8k1k workload; update the accuracy section's model_args_extra (the max_length key) to match the profile's input_length (set max_length=8192) so accuracy runs use the full prompt length (look for the accuracy: block and the model_args_extra line referencing max_length).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Line 47: Update the SPDX/copyright header in test_perf_sanity.py to show the
latest modification year 2026 (replace 2025 with 2026); locate the file header
line that begins with the SPDX or copyright comment (e.g., the top-of-file
SPDX-FileCopyrightText/© line) and change the year token to 2026 so the file
header matches the recent modification.
---
Nitpick comments:
In
`@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml`:
- Around line 42-46: The accuracy block's model_args_extra currently sets
max_length=4096 which will truncate prompts for this 8k1k workload; update the
accuracy section's model_args_extra (the max_length key) to match the profile's
input_length (set max_length=8192) so accuracy runs use the full prompt length
(look for the accuracy: block and the model_args_extra line referencing
max_length).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5809c343-ca5c-4d53-9756-9edabcde1255
📒 Files selected for processing (13)
jenkins/L0_Test.groovytests/integration/defs/perf/test_perf_sanity.pytests/integration/test_lists/test-db/l0_gb200_multi_gpus_perf_sanity.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node1_gpu4.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node2_gpu8.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node4_gpu16.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu4_gen1_node8_gpu32.ymltests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con2048_ctx1_dep4_gen1_dep32_eplb384_mtp0_ccb-UCX.yamltests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-UCX.yamltests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_1k1k_con4_ctx1_dep4_gen1_tep4_eplb0_mtp0_ccb-UCX.yamltests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb416_mtp3_ccb-UCX.yamltests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yamltests/scripts/perf-sanity/disaggregated/gb200_kimi-k25-thinking-fp4_8k1k_con4_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
2837b39 to
9af44c1
Compare
e4ba01d to
af55b68
Compare
|
/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-8" |
|
PR_Github #43174 [ run ] triggered by Bot. Commit: |
|
PR_Github #43174 [ run ] completed with state
|
a1f99b0 to
12cf3a3
Compare
|
/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-3,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE2-GPU8-Post-Merge-7" |
|
PR_Github #43409 [ run ] triggered by Bot. Commit: |
|
PR_Github #43409 [ run ] completed with state
|
Already added use_low_precision_moe_combine: true in moe config
|
/bot skip --comment "Only add CI perf sanity tests, no need to run the whole CI pipeline" |
|
PR_Github #43505 [ skip ] triggered by Bot. Commit: |
|
PR_Github #43505 [ skip ] completed with state |
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Summary by CodeRabbit
New Features
k25_thinking_fp4model variant (Kimi K2.5 with FP4 precision) on GB200 hardware for disaggregated inference.Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.