[https://nvbugs/6093714][fix] Reduce batch size and add memory guard for test#13402
Conversation
abcd454 to
3c3e724
Compare
📝 WalkthroughWalkthroughThese changes modify test configurations for an LLM autodeploy test. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 248-260: Increase the torch memory guard to 65000 MiB and make it
tuple-specific by removing the broad skip on the standalone "torch" param and
instead apply pytest.param(...,
marks=pytest.mark.skip_less_device_memory(65000)) to the specific failing combo
(the "torch-True-1" tuple) in the parametrization; update the attn_backend
parametrization so the generic "torch" entry is unmarked and the exact tuple
"torch-True-1" uses pytest.param with skip_less_device_memory(65000).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b00bc46a-e537-4581-a496-c5808996c55d
📒 Files selected for processing (2)
tests/integration/defs/accuracy/test_llm_api_autodeploy.pytests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
- tests/integration/test_lists/waives.txt
|
/bot run |
|
PR_Github #45285 [ run ] triggered by Bot. Commit: |
|
PR_Github #45285 [ run ] completed with state
|
bef7c1b to
110b96a
Compare
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1" |
|
PR_Github #45431 [ run ] triggered by Bot. Commit: |
|
PR_Github #45431 [ run ] completed with state
|
…ry. Include comment so that memory gating can be updated properly in the future Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…orch attn backend
Reducing torch attention's max_batch_size to 32 in the test_auto_dtype
parametrization triggered a pydantic ValidationError on construction:
The top-level `max_batch_size` (32) must be greater than or equal to
`cuda_graph_config.max_batch_size` (128).
Root cause: the test set cuda_graph_batch_sizes inside
transforms.compile_model, but the LlmArgs validator
(sync_cuda_graph_batch_sizes_to_compile_config) unconditionally overrides
that field with cuda_graph_config.batch_sizes. cuda_graph_config has
default_factory=CudaGraphConfig, which auto-generates batch_sizes from
max_batch_size=128 when no value is provided. With max_batch_size=32,
cg.max_batch_size=128 violated the validator's invariant.
Fix: declare cuda_graph_config.batch_sizes directly so the validator
accepts the configured values and derives cuda_graph_config.max_batch_size
from max(batch_sizes), matching the top-level max_batch_size.
Verified locally on H100: test_auto_dtype[torch-True-1] passes with
rouge1=25.172 (threshold 22.370).
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
389af1c to
7ec2660
Compare
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1" |
|
PR_Github #45460 [ run ] triggered by Bot. Commit: |
|
PR_Github #45460 [ run ] completed with state
|
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1" |
|
PR_Github #45513 [ run ] triggered by Bot. Commit: |
|
PR_Github #45513 [ run ] completed with state |
|
/bot skip --comment "fixes unit test by reducing batch size for un-paged attention case. Other misc fixes to the test related to bug fixes that merged while the test was waived." |
|
PR_Github #45543 [ skip ] triggered by Bot. Commit: |
|
PR_Github #45543 [ skip ] completed with state |
waive with VRAM guard
The global waive for TestLlama3_1_8B::test_auto_dtype[torch-True-1] was covering a GPU-memory bug: the torch attention backend is unpaged and materializes a dense KV tensor plus cuda-graph workspace (~47 GiB before KV alloc on Llama-3.1-8B), which OOMs on 48 GiB cards (L40S/L20/RTX 6000 Ada) during KV-cache probing.
Reduced batch size so test is able to run on L40S in Post-Merge. Replace the blanket waive with an exact-tuple inline skip on get_device_memory() < 32000 MiB so the test runs (and guards against regressions) on 48 GiB+ cards . Verified on local H100 80 GB: test passes (rouge1=25.099, threshold=22.370), peak memory 50.7 GiB. Verified on L40S that test does not run OOM.
Summary by CodeRabbit
Tests
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.