Skip to content

[https://nvbugs/6093714][fix] Reduce batch size and add memory guard for test#13402

Merged
govind-ramnarayan merged 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:gramnarayan/nvbug-6093714
Apr 26, 2026
Merged

[https://nvbugs/6093714][fix] Reduce batch size and add memory guard for test#13402
govind-ramnarayan merged 3 commits intoNVIDIA:mainfrom
nv-auto-deploy:gramnarayan/nvbug-6093714

Conversation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator

@govind-ramnarayan govind-ramnarayan commented Apr 24, 2026

waive with VRAM guard

The global waive for TestLlama3_1_8B::test_auto_dtype[torch-True-1] was covering a GPU-memory bug: the torch attention backend is unpaged and materializes a dense KV tensor plus cuda-graph workspace (~47 GiB before KV alloc on Llama-3.1-8B), which OOMs on 48 GiB cards (L40S/L20/RTX 6000 Ada) during KV-cache probing.

Reduced batch size so test is able to run on L40S in Post-Merge. Replace the blanket waive with an exact-tuple inline skip on get_device_memory() < 32000 MiB so the test runs (and guards against regressions) on 48 GiB+ cards . Verified on local H100 80 GB: test passes (rouge1=25.099, threshold=22.370), peak memory 50.7 GiB. Verified on L40S that test does not run OOM.

Summary by CodeRabbit

Tests

  • Updated and optimized test configurations for model auto-deployment accuracy validation with improved batch sizing logic and enhanced hardware compatibility handling.
  • Re-enabled a previously waived test case for automatic data type selection to expand validation coverage and ensure more thorough testing across varied hardware configurations.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

These changes modify test configurations for an LLM autodeploy test. The torch backend's maximum batch size is reduced from 128 to 32, CUDA graph batch sizes are now dynamically computed based on the backend's limit, and test parametrization conditionally skips the torch backend on machines with insufficient VRAM. A corresponding test waiver is removed.

Changes

Cohort / File(s) Summary
Test Configuration Updates
tests/integration/defs/accuracy/test_llm_api_autodeploy.py
Reduced torch backend max_batch_size from 128 to 32; modified cuda_graph_batch_sizes to be dynamically derived from candidate sizes filtered by the backend's maximum batch size; updated test_auto_dtype parametrization to skip torch backend on machines with <32GB device memory.
Test Waiver Removal
tests/integration/test_lists/waives.txt
Removed test waiver line for Llama3.1 8B auto-dtype torch-True-1, enabling the previously skipped test to run.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The PR description provided by the author addresses the issue and solution, but does not follow the required template structure with distinct sections for Description, Test Coverage, and PR Checklist items. Please structure the description using the template sections: add a clear 'Description' section explaining the issue and solution, add a 'Test Coverage' section listing relevant tests, and ensure PR Checklist items are properly filled before submission.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately reflects the main changes: reducing batch size for the torch backend and adding a memory guard (skip) for the test_auto_dtype test.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 248-260: Increase the torch memory guard to 65000 MiB and make it
tuple-specific by removing the broad skip on the standalone "torch" param and
instead apply pytest.param(...,
marks=pytest.mark.skip_less_device_memory(65000)) to the specific failing combo
(the "torch-True-1" tuple) in the parametrization; update the attn_backend
parametrization so the generic "torch" entry is unmarked and the exact tuple
"torch-True-1" uses pytest.param with skip_less_device_memory(65000).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b00bc46a-e537-4581-a496-c5808996c55d

📥 Commits

Reviewing files that changed from the base of the PR and between 0a27cf9 and abcd454.

📒 Files selected for processing (2)
  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
  • tests/integration/test_lists/waives.txt
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/waives.txt

Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py
@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45285 [ run ] triggered by Bot. Commit: bef7c1b Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45285 [ run ] completed with state SUCCESS. Commit: bef7c1b
/LLM/main/L0_MergeRequest_PR pipeline #35540 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan govind-ramnarayan changed the title [https://nvbugs/6093714][fix] Replace test_auto_dtype[torch-True-1] [https://nvbugs/6093714][fix] Reduce batch size and add memory guard for test Apr 24, 2026
Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py
@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/nvbug-6093714 branch from bef7c1b to 110b96a Compare April 24, 2026 18:08
@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45431 [ run ] triggered by Bot. Commit: 110b96a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45431 [ run ] completed with state SUCCESS. Commit: 110b96a
/LLM/main/L0_MergeRequest_PR pipeline #35664 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…ry. Include comment so that memory gating can be updated properly in the future

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
…orch attn backend

Reducing torch attention's max_batch_size to 32 in the test_auto_dtype
parametrization triggered a pydantic ValidationError on construction:

    The top-level `max_batch_size` (32) must be greater than or equal to
    `cuda_graph_config.max_batch_size` (128).

Root cause: the test set cuda_graph_batch_sizes inside
transforms.compile_model, but the LlmArgs validator
(sync_cuda_graph_batch_sizes_to_compile_config) unconditionally overrides
that field with cuda_graph_config.batch_sizes. cuda_graph_config has
default_factory=CudaGraphConfig, which auto-generates batch_sizes from
max_batch_size=128 when no value is provided. With max_batch_size=32,
cg.max_batch_size=128 violated the validator's invariant.

Fix: declare cuda_graph_config.batch_sizes directly so the validator
accepts the configured values and derives cuda_graph_config.max_batch_size
from max(batch_sizes), matching the top-level max_batch_size.

Verified locally on H100: test_auto_dtype[torch-True-1] passes with
rouge1=25.172 (threshold 22.370).

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
@govind-ramnarayan govind-ramnarayan force-pushed the gramnarayan/nvbug-6093714 branch from 389af1c to 7ec2660 Compare April 24, 2026 23:38
@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1"

@govind-ramnarayan govind-ramnarayan enabled auto-merge (squash) April 24, 2026 23:41
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45460 [ run ] triggered by Bot. Commit: 7ec2660 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45460 [ run ] completed with state SUCCESS. Commit: 7ec2660
/LLM/main/L0_MergeRequest_PR pipeline #35692 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45513 [ run ] triggered by Bot. Commit: 933f477 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45513 [ run ] completed with state SUCCESS. Commit: 933f477
/LLM/main/L0_MergeRequest_PR pipeline #35737 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@govind-ramnarayan
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "fixes unit test by reducing batch size for un-paged attention case. Other misc fixes to the test related to bug fixes that merged while the test was waived."

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45543 [ skip ] triggered by Bot. Commit: 933f477 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #45543 [ skip ] completed with state SUCCESS. Commit: 933f477
Skipping testing for commit 933f477

Link to invocation

@govind-ramnarayan govind-ramnarayan merged commit eeba2eb into NVIDIA:main Apr 26, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants