[https://nvbugs/6143811][fix] AutoDeploy gate quantization tests#13846
Conversation
📝 WalkthroughWalkthroughThis PR updates the AutoDeploy accuracy test module with consolidated imports from a shared conftest, refactored test parameterization using explicit skip markers for precision variants (FP8/NVFP4), and hardware-gated IR test classes. The waive list is adjusted accordingly. ChangesAutoDeploy Accuracy Test Module Updates
🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integration/defs/accuracy/test_llm_api_autodeploy.py (1)
559-566: Hardware gating changes look coherent; QA list updates should be unnecessary if nodeids remain unchanged.These changes gate unsupported architectures via marks/decorators without renaming test classes/functions or explicit param IDs, so scheduled list references should remain stable. A quick sanity check against
tests/integration/test_lists/qa/llm_function_core.txtand test-db nodeids is enough.As per coding guidelines: “If a PR changes hardware gating/skip behavior for specific accuracy tests, update the corresponding node IDs here (or ensure they already match)… keep the exact node-id string consistent.”
Also applies to: 628-635, 1067-1067, 1217-1217, 1321-1321, 1374-1374
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py` around lines 559 - 566, The parametrize change for "model_id" (values "bf16", pytest.param("fp8", marks=skip_pre_hopper), pytest.param("nvfp4", marks=skip_pre_blackwell)) alters skip behavior but may affect scheduled QA node-ids; verify that the test node-id strings for these parametrized cases still match the QA list (llm_function_core.txt) and the test-db nodeids, and if they differ either add explicit ids to the pytest.param entries or update the QA list/test-db nodeids to the new exact node-id strings so scheduled references remain stable.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 572-577: The check that computes free_memory_mib only queries GPU
0 and may miss a lower-memory device; replace the single-device call by
computing the minimum free memory across all CUDA devices (e.g., iterate
torch.cuda.device_count() and call torch.cuda.mem_get_info(i)[0] for each) and
use that min value for the existing conditional that compares to 80000 and the
model_id "bf16"; ensure the variable free_memory_mib name stays the same and
keep the downstream call to low_memory_overrides(kwargs) and the model_id check
unchanged.
---
Nitpick comments:
In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 559-566: The parametrize change for "model_id" (values "bf16",
pytest.param("fp8", marks=skip_pre_hopper), pytest.param("nvfp4",
marks=skip_pre_blackwell)) alters skip behavior but may affect scheduled QA
node-ids; verify that the test node-id strings for these parametrized cases
still match the QA list (llm_function_core.txt) and the test-db nodeids, and if
they differ either add explicit ids to the pytest.param entries or update the QA
list/test-db nodeids to the new exact node-id strings so scheduled references
remain stable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1c032a1b-add3-48fa-903b-dff336d6ecbc
📒 Files selected for processing (2)
tests/integration/defs/accuracy/test_llm_api_autodeploy.pytests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #47177 [ run ] triggered by Bot. Commit: |
|
PR_Github #47177 [ run ] completed with state |
MrGeva
left a comment
There was a problem hiding this comment.
see my comment. other than that LGTM
|
/bot run --stage-list "DGX_H100-4_GPUs-AutoDeploy-1" |
|
PR_Github #47204 [ run ] triggered by Bot. Commit: |
|
PR_Github #47204 [ run ] completed with state |
|
/bot run |
|
PR_Github #47223 [ run ] triggered by Bot. Commit: |
|
PR_Github #47223 [ run ] completed with state
|
|
/bot run |
|
PR_Github #47264 [ run ] triggered by Bot. Commit: |
|
PR_Github #47264 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #47319 [ run ] triggered by Bot. Commit: |
|
PR_Github #47319 [ run ] completed with state |
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run skip --comment "pipeline passed, commit SHA updated due to waives.txt conflict" |
|
/bot skip --comment "pipeline passed, commit SHA updated due to waives.txt conflict" |
|
PR_Github #47655 [ run ] triggered by Bot. Commit: |
|
PR_Github #47655 [ run ] completed with state
|
|
/bot run |
|
/bot kill |
|
/bot run --stage-list "DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" |
|
PR_Github #47683 [ run ] triggered by Bot. Commit: |
|
PR_Github #47683 [ run ] completed with state |
|
/bot run --extra-stage "DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" |
|
PR_Github #47760 [ run ] triggered by Bot. Commit: |
|
PR_Github #47760 [ run ] completed with state
|
|
/bot run --extra-stage "DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" |
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #48110 [ run ] triggered by Bot. Commit: |
|
PR_Github #48110 [ run ] completed with state
|
Gate AutoDeploy FP8 and NVFP4 accuracy cases by supported GPU architecture. Keep lower-memory Nano V3 runs on reduced settings and remove the obsolete waive. Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
67fd647 to
6634c47
Compare
|
/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" |
|
PR_Github #48172 [ run ] triggered by Bot. Commit: |
|
PR_Github #48172 [ run ] completed with state |
|
/bot run |
|
PR_Github #48189 [ run ] triggered by Bot. Commit: |
|
PR_Github #48189 [ run ] completed with state
|
|
/bot run |
|
PR_Github #48220 [ run ] triggered by Bot. Commit: |
|
PR_Github #48220 [ run ] completed with state |
Summary by CodeRabbit
Tests
Chores
Description
Gate AutoDeploy FP8 and NVFP4 accuracy cases by supported GPU architecture.
Replace Nano V3 check for SM<90 aimed to catch architectures with low memory with a direct GPU memory check.
Unwaive affected test.
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.