[https://nvbugs/5949033][fix] Add 3 Disagg gen_only tests back#12159
Conversation
📝 WalkthroughWalkthroughThis PR expands GB200 multi-node performance sanity test coverage by increasing test counts in two existing SBSA configurations, introducing a new 3-node test variant, enabling previously commented performance tests in test database files, and adding documentation for the process. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
tests/scripts/perf-sanity/README.md (4)
309-309: Remove extra spaces inside code spans.The code spans have trailing spaces that trigger MD038 linting warnings.
📝 Suggested fix
-- If the test line already exists but is **commented out** (prefixed with `# `), remove the `# ` prefix. +- If the test line already exists but is **commented out** (prefixed with `#`), remove the `#` prefix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/README.md` at line 309, The Markdown line contains code spans with trailing spaces (e.g., "`# `") that trigger MD038; edit the README.md line so code spans have no inner trailing spaces (change backtick-wrapped "`# `" to "`#`" and similarly remove any other extra spaces inside backticks on that line), ensuring the visible text and meaning remain the same but the code spans contain no leading/trailing spaces.
332-334: Add language identifier to fenced code block.This stage naming convention block should have a language identifier.
📝 Suggested fix
-``` +```text GB200-{gpuCount}_GPUs-{nodeCount}_Nodes-PyTorch-Disagg-PerfSanity-CTX{num_ctx}-NODE{nodes_per_ctx}-GPU{ctx_tp}-GEN{num_gen}-NODE{nodes_per_gen}-GPU{gen_tp}-Post-Merge -``` +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/README.md` around lines 332 - 334, The fenced code block containing the stage name string GB200-{gpuCount}_GPUs-{nodeCount}_Nodes-PyTorch-Disagg-PerfSanity-CTX{num_ctx}-NODE{nodes_per_ctx}-GPU{ctx_tp}-GEN{num_gen}-NODE{nodes_per_gen}-GPU{gen_tp}-Post-Merge should include a language identifier (e.g., add "text" after the opening triple backticks) so the block becomes ```text ... ```, ensuring consistent markdown rendering.
283-290: Add language identifiers to fenced code blocks.These code blocks showing file path patterns would benefit from language identifiers (e.g.,
text) for better rendering and to satisfy linting rules.📝 Suggested fix
-``` +```text l0_{gpu_type}_multi_nodes_perf_sanity_ctx{num_ctx}_node{nodes_per_ctx}_gpu{ctx_tp}_gen{num_gen}_node{nodes_per_gen}_gpu{gen_tp}.yml -``` +``` Example: for ctx_tp=1, gen_tp=8, 1 ctx worker, 1 gen worker on GB200: -``` +```text l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu1_gen1_node2_gpu8.yml -``` +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/README.md` around lines 283 - 290, Update the two fenced code blocks that show filename patterns by adding a language identifier (use "text") to each opening triple-backtick fence; specifically change the block starting with "```" before "l0_{gpu_type}_multi_nodes_perf_sanity..." and the block before "l0_gb200_multi_nodes_perf_sanity..." to "```text" so the example filename patterns are rendered and linted correctly.
246-248: Add language identifier to fenced code block.This code block shows a filename pattern and would benefit from a language identifier for better rendering.
📝 Suggested fix
-``` +```text {gpu_type}_{model}-{precision}_{ISL}k{OSL}k_con{concurrency}_ctx{ctx_count}_tp{ctx_tp}_gen{gen_count}_{gen_parallelism}_eplb{N}_mtp{N}_ccb-{transport}.yaml -``` +```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/README.md` around lines 246 - 248, The fenced code block containing the filename pattern "{gpu_type}_{model}-{precision}_{ISL}k{OSL}k_con{concurrency}_ctx{ctx_count}_tp{ctx_tp}_gen{gen_count}_{gen_parallelism}_eplb{N}_mtp{N}_ccb-{transport}.yaml" should include a language identifier for better rendering; change the opening fence from ``` to ```text so the block becomes a plain-text fenced block while leaving the content unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/scripts/perf-sanity/README.md`:
- Line 309: The Markdown line contains code spans with trailing spaces (e.g.,
"`# `") that trigger MD038; edit the README.md line so code spans have no inner
trailing spaces (change backtick-wrapped "`# `" to "`#`" and similarly remove
any other extra spaces inside backticks on that line), ensuring the visible text
and meaning remain the same but the code spans contain no leading/trailing
spaces.
- Around line 332-334: The fenced code block containing the stage name string
GB200-{gpuCount}_GPUs-{nodeCount}_Nodes-PyTorch-Disagg-PerfSanity-CTX{num_ctx}-NODE{nodes_per_ctx}-GPU{ctx_tp}-GEN{num_gen}-NODE{nodes_per_gen}-GPU{gen_tp}-Post-Merge
should include a language identifier (e.g., add "text" after the opening triple
backticks) so the block becomes ```text ... ```, ensuring consistent markdown
rendering.
- Around line 283-290: Update the two fenced code blocks that show filename
patterns by adding a language identifier (use "text") to each opening
triple-backtick fence; specifically change the block starting with "```" before
"l0_{gpu_type}_multi_nodes_perf_sanity..." and the block before
"l0_gb200_multi_nodes_perf_sanity..." to "```text" so the example filename
patterns are rendered and linted correctly.
- Around line 246-248: The fenced code block containing the filename pattern
"{gpu_type}_{model}-{precision}_{ISL}k{OSL}k_con{concurrency}_ctx{ctx_count}_tp{ctx_tp}_gen{gen_count}_{gen_parallelism}_eplb{N}_mtp{N}_ccb-{transport}.yaml"
should include a language identifier for better rendering; change the opening
fence from ``` to ```text so the block becomes a plain-text fenced block while
leaving the content unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ed514eb3-5211-4b5e-8443-cc8bfc1b1ec3
📒 Files selected for processing (5)
jenkins/L0_Test.groovyjenkins/scripts/perf/README.mdtests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu1_gen1_node1_gpu2.ymltests/integration/test_lists/test-db/l0_gb200_multi_nodes_perf_sanity_ctx1_node1_gpu1_gen1_node1_gpu4.ymltests/scripts/perf-sanity/README.md
|
/bot run --disable-fail-fast --stage-list "GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU4-Post-Merge-4,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1" |
|
PR_Github #38715 [ run ] triggered by Bot. Commit: |
|
PR_Github #38715 [ run ] completed with state |
chienchunhung
left a comment
There was a problem hiding this comment.
Thanks for the PR!
|
/bot run --disable-fail-fast |
|
PR_Github #38757 [ run ] triggered by Bot. Commit: |
|
PR_Github #38757 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #38795 [ run ] triggered by Bot. Commit: |
|
PR_Github #38795 [ run ] completed with state |
Summary by CodeRabbit
Release Notes
New Features
Documentation
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.