[None][test] local wheel installation support and add gb300 cases demo#11742
Conversation
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
📝 WalkthroughWalkthroughThese changes introduce an install mode selection mechanism for performance test scripts that supports both source and wheel-based installation, alongside new performance benchmarking configuration files for Deepseek models on GB300 hardware. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml (1)
7-10: Use a single source of truth forscript_file.Line 7 and Line 10 duplicate the same launch script value. Keeping this in two places increases drift risk if one is edited later.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml` around lines 7 - 10, The file defines script_file in two places (top-level key "script_file" and "slurm.script_file"); remove the duplicate and keep a single source of truth (preferably under "slurm.script_file") so updates won’t drift—delete the top-level "script_file" entry (or conversely remove "slurm.script_file" if your codebase expects the top-level) and ensure any code that reads this config (look for code that references "script_file" or "slurm.script_file") uses the retained location.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@jenkins/scripts/perf/local/slurm_install.sh`:
- Around line 53-57: The WHEEL_FILE selection is nondeterministic because it
uses find ... | head -1; change the assignment of WHEEL_FILE (the variable set
where find "$llmSrcNode/build" -name "tensorrt_llm-*.whl" ... | head -1) to
deterministically pick the intended wheel (for example sort by version or
modification time) before piping to head: e.g., use find to list all matching
wheels and then select the newest (ls -t or sort -V) or explicitly filter for
the desired version, and keep the subsequent retry_command pip install
"$WHEEL_FILE" call unchanged so the install uses the deterministic selection.
In
`@tests/scripts/perf-sanity/gb300_deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml`:
- Around line 96-99: The kv_cache_config block is missing the explicit
enable_block_reuse key; add kv_cache_config.enable_block_reuse with the same
boolean value used in the other two configs (so it matches their behavior) to
avoid relying on defaults and ensure reproducible perf comparisons; update the
kv_cache_config section containing dtype and free_gpu_memory_fraction to include
enable_block_reuse.
---
Nitpick comments:
In
`@tests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml`:
- Around line 7-10: The file defines script_file in two places (top-level key
"script_file" and "slurm.script_file"); remove the duplicate and keep a single
source of truth (preferably under "slurm.script_file") so updates won’t
drift—delete the top-level "script_file" entry (or conversely remove
"slurm.script_file" if your codebase expects the top-level) and ensure any code
that reads this config (look for code that references "script_file" or
"slurm.script_file") uses the retained location.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
jenkins/scripts/perf/local/README.mdjenkins/scripts/perf/local/slurm_install.shjenkins/scripts/perf/local/submit.pytests/integration/defs/perf/disagg/test_configs/disagg/perf-sanity/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yamltests/scripts/perf-sanity/gb300_deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml
tests/scripts/perf-sanity/gb300_deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml
Show resolved
Hide resolved
|
/bot skip --comment "skip test as just modify local mode installation, no affect sanity check" |
|
PR_Github #36896 [ skip ] triggered by Bot. Commit: |
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
PR_Github #36896 [ skip ] completed with state |
|
/bot skip --comment "skip test as just modify local mode installation, no affect sanity check" |
|
PR_Github #36906 [ skip ] triggered by Bot. Commit: |
|
PR_Github #36906 [ skip ] completed with state |
|
/bot reuse-pipeline |
|
PR_Github #36994 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #36994 [ reuse-pipeline ] completed with state |
|
/bot run --skip-test |
|
PR_Github #37003 [ run ] triggered by Bot. Commit: |
|
PR_Github #37003 [ run ] completed with state |
|
/bot reuse-pipeline |
|
PR_Github #37060 [ reuse-pipeline ] triggered by Bot. Commit: |
|
PR_Github #37060 [ reuse-pipeline ] completed with state |
NVIDIA#11742) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Summary by CodeRabbit
New Features
--install-modeoption to support source (default) or wheel installation methods during setupChores