[https://nvbugs/6093714][fix] Reduce batch size and add memory guard for test by govind-ramnarayan · Pull Request #13402 · NVIDIA/TensorRT-LLM

govind-ramnarayan · 2026-04-24T01:30:03Z

waive with VRAM guard

The global waive for TestLlama3_1_8B::test_auto_dtype[torch-True-1] was covering a GPU-memory bug: the torch attention backend is unpaged and materializes a dense KV tensor plus cuda-graph workspace (~47 GiB before KV alloc on Llama-3.1-8B), which OOMs on 48 GiB cards (L40S/L20/RTX 6000 Ada) during KV-cache probing.

Reduced batch size so test is able to run on L40S in Post-Merge. Replace the blanket waive with an exact-tuple inline skip on get_device_memory() < 32000 MiB so the test runs (and guards against regressions) on 48 GiB+ cards . Verified on local H100 80 GB: test passes (rouge1=25.099, threshold=22.370), peak memory 50.7 GiB. Verified on L40S that test does not run OOM.

Summary by CodeRabbit

Tests

Updated and optimized test configurations for model auto-deployment accuracy validation with improved batch sizing logic and enhanced hardware compatibility handling.
Re-enabled a previously waived test case for automatic data type selection to expand validation coverage and ensure more thorough testing across varied hardware configurations.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-24T01:34:25Z

📝 Walkthrough

Walkthrough

These changes modify test configurations for an LLM autodeploy test. The torch backend's maximum batch size is reduced from 128 to 32, CUDA graph batch sizes are now dynamically computed based on the backend's limit, and test parametrization conditionally skips the torch backend on machines with insufficient VRAM. A corresponding test waiver is removed.

Changes

Cohort / File(s)	Summary
Test Configuration Updates `tests/integration/defs/accuracy/test_llm_api_autodeploy.py`	Reduced `torch` backend `max_batch_size` from 128 to 32; modified `cuda_graph_batch_sizes` to be dynamically derived from candidate sizes filtered by the backend's maximum batch size; updated `test_auto_dtype` parametrization to skip `torch` backend on machines with <32GB device memory.
Test Waiver Removal `tests/integration/test_lists/waives.txt`	Removed test waiver line for Llama3.1 8B auto-dtype `torch-True-1`, enabling the previously skipped test to run.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description provided by the author addresses the issue and solution, but does not follow the required template structure with distinct sections for Description, Test Coverage, and PR Checklist items.	Please structure the description using the template sections: add a clear 'Description' section explaining the issue and solution, add a 'Test Coverage' section listing relevant tests, and ensure PR Checklist items are properly filled before submission.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately reflects the main changes: reducing batch size for the torch backend and adding a memory guard (skip) for the test_auto_dtype test.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 248-260: Increase the torch memory guard to 65000 MiB and make it
tuple-specific by removing the broad skip on the standalone "torch" param and
instead apply pytest.param(...,
marks=pytest.mark.skip_less_device_memory(65000)) to the specific failing combo
(the "torch-True-1" tuple) in the parametrization; update the attn_backend
parametrization so the generic "torch" entry is unmarked and the exact tuple
"torch-True-1" uses pytest.param with skip_less_device_memory(65000).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b00bc46a-e537-4581-a496-c5808996c55d

📥 Commits

Reviewing files that changed from the base of the PR and between 0a27cf9 and abcd454.

📒 Files selected for processing (2)

tests/integration/defs/accuracy/test_llm_api_autodeploy.py
tests/integration/test_lists/waives.txt

💤 Files with no reviewable changes (1)

tests/integration/test_lists/waives.txt

govind-ramnarayan · 2026-04-24T01:35:58Z

/bot run

tensorrt-cicd · 2026-04-24T01:43:16Z

PR_Github #45285 [ run ] triggered by Bot. Commit: bef7c1b Link to invocation

tensorrt-cicd · 2026-04-24T04:35:37Z

PR_Github #45285 [ run ] completed with state SUCCESS. Commit: bef7c1b
/LLM/main/L0_MergeRequest_PR pipeline #35540 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

govind-ramnarayan · 2026-04-24T18:11:09Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1"

tensorrt-cicd · 2026-04-24T18:18:10Z

PR_Github #45431 [ run ] triggered by Bot. Commit: 110b96a Link to invocation

tensorrt-cicd · 2026-04-24T21:11:26Z

PR_Github #45431 [ run ] completed with state SUCCESS. Commit: 110b96a
/LLM/main/L0_MergeRequest_PR pipeline #35664 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

…ry. Include comment so that memory gating can be updated properly in the future Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

…orch attn backend Reducing torch attention's max_batch_size to 32 in the test_auto_dtype parametrization triggered a pydantic ValidationError on construction: The top-level `max_batch_size` (32) must be greater than or equal to `cuda_graph_config.max_batch_size` (128). Root cause: the test set cuda_graph_batch_sizes inside transforms.compile_model, but the LlmArgs validator (sync_cuda_graph_batch_sizes_to_compile_config) unconditionally overrides that field with cuda_graph_config.batch_sizes. cuda_graph_config has default_factory=CudaGraphConfig, which auto-generates batch_sizes from max_batch_size=128 when no value is provided. With max_batch_size=32, cg.max_batch_size=128 violated the validator's invariant. Fix: declare cuda_graph_config.batch_sizes directly so the validator accepts the configured values and derives cuda_graph_config.max_batch_size from max(batch_sizes), matching the top-level max_batch_size. Verified locally on H100: test_auto_dtype[torch-True-1] passes with rouge1=25.172 (threshold 22.370). Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

govind-ramnarayan · 2026-04-24T23:38:35Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1"

tensorrt-cicd · 2026-04-24T23:44:36Z

PR_Github #45460 [ run ] triggered by Bot. Commit: 7ec2660 Link to invocation

tensorrt-cicd · 2026-04-25T02:52:55Z

PR_Github #45460 [ run ] completed with state SUCCESS. Commit: 7ec2660
/LLM/main/L0_MergeRequest_PR pipeline #35692 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

govind-ramnarayan · 2026-04-25T21:34:58Z

/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1"

tensorrt-cicd · 2026-04-25T21:42:23Z

PR_Github #45513 [ run ] triggered by Bot. Commit: 933f477 Link to invocation

tensorrt-cicd · 2026-04-26T01:10:20Z

PR_Github #45513 [ run ] completed with state SUCCESS. Commit: 933f477
/LLM/main/L0_MergeRequest_PR pipeline #35737 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

govind-ramnarayan · 2026-04-26T03:17:21Z

/bot skip --comment "fixes unit test by reducing batch size for un-paged attention case. Other misc fixes to the test related to bug fixes that merged while the test was waived."

tensorrt-cicd · 2026-04-26T03:24:54Z

PR_Github #45543 [ skip ] triggered by Bot. Commit: 933f477 Link to invocation

tensorrt-cicd · 2026-04-26T03:37:13Z

PR_Github #45543 [ skip ] completed with state SUCCESS. Commit: 933f477
Skipping testing for commit 933f477

Link to invocation

govind-ramnarayan requested review from a team as code owners April 24, 2026 01:30

govind-ramnarayan requested a review from Fridah-nv April 24, 2026 01:30

github-actions Bot assigned govind-ramnarayan Apr 24, 2026

govind-ramnarayan force-pushed the gramnarayan/nvbug-6093714 branch from abcd454 to 3c3e724 Compare April 24, 2026 01:33

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py

govind-ramnarayan changed the title ~~[https://nvbugs/6093714][fix] Replace test_auto_dtype[torch-True-1]~~ [https://nvbugs/6093714][fix] Reduce batch size and add memory guard for test Apr 24, 2026

crazydemo approved these changes Apr 24, 2026

View reviewed changes

nvchenghaoz approved these changes Apr 24, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py

govind-ramnarayan force-pushed the gramnarayan/nvbug-6093714 branch from bef7c1b to 110b96a Compare April 24, 2026 18:08

suyoggupta approved these changes Apr 24, 2026

View reviewed changes

govind-ramnarayan added 2 commits April 24, 2026 16:37

Fix bug by reducing batch size on torch-attention test to reduce memo…

5712215

…ry. Include comment so that memory gating can be updated properly in the future Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>

govind-ramnarayan force-pushed the gramnarayan/nvbug-6093714 branch from 389af1c to 7ec2660 Compare April 24, 2026 23:38

govind-ramnarayan enabled auto-merge (squash) April 24, 2026 23:41

Merge branch 'main' into gramnarayan/nvbug-6093714

933f477

govind-ramnarayan merged commit eeba2eb into NVIDIA:main Apr 26, 2026
5 checks passed

Conversation

govind-ramnarayan commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Tests

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

govind-ramnarayan commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

Uh oh!

govind-ramnarayan commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

govind-ramnarayan commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 25, 2026

Uh oh!

govind-ramnarayan commented Apr 25, 2026

Uh oh!

tensorrt-cicd commented Apr 25, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

govind-ramnarayan commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

tensorrt-cicd commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

govind-ramnarayan commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading