[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 #9054

peaceh-nv · 2025-11-11T01:35:55Z

Use less kv cache memory on SM120 for nemotron test

Summary by CodeRabbit

Tests
- Updated internal test configuration to optimize memory cache performance testing.

No user-facing changes in this release.

… nemotron test Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>

peaceh-nv · 2025-11-11T01:36:12Z

/bot run

coderabbitai · 2025-11-11T01:38:40Z

📝 Walkthrough

Walkthrough

A command-line argument --kv_cache_fraction={_MEM_FRACTION_50} was added to the test_ptp_quickstart_advanced_2gpus_sm120 test invocation, setting the kv_cache_fraction parameter to 0.5 for that specific test run.

Changes

Cohort / File(s)	Summary
Test parameter configuration `tests/integration/defs/test_e2e.py`	Added `--kv_cache_fraction={_MEM_FRACTION_50}` command-line argument to the test_ptp_quickstart_advanced_2gpus_sm120 test invocation to set kv_cache_fraction to 0.5

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Single parameter addition to an existing test invocation with no logic or control flow changes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is minimal and lacks required sections from the template (Description details, Test Coverage, PR Checklist).	Add detailed description explaining the issue and solution, list relevant test cases, and complete the PR checklist items.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the issue (NVBugs ticket), type (fix), and main change (reduce KV cache memory on SM120).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0649b77 and 086146f.

📒 Files selected for processing (1)

tests/integration/defs/test_e2e.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tests/integration/defs/test_e2e.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tests/integration/defs/test_e2e.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tests/integration/defs/test_e2e.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tests/integration/defs/test_e2e.py (1)

2416-2416: LGTM! Memory reduction change is appropriate for SM120 architecture.

The addition of --kv_cache_fraction={_MEM_FRACTION_50} effectively reduces KV cache memory usage to 50% for this SM120-specific test, addressing the PR objective. The change is consistent with similar memory management patterns used throughout the test suite (e.g., lines 2018, 2257, 3395).

Given that this is a temporary workaround test for 2-GPU SM120 configurations (as noted in the TODO comment at line 2394), applying a uniform 0.5 fraction across all parameterized models is a reasonable approach.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2025-11-11T01:43:31Z

PR_Github #24068 [ run ] triggered by Bot. Commit: 086146f

tensorrt-cicd · 2025-11-11T05:02:45Z

PR_Github #24068 [ run ] completed with state SUCCESS. Commit: 086146f
/LLM/release-1.1/L0_MergeRequest_PR pipeline #468 completed with status: 'SUCCESS'

…IDIA#9054) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>

…IDIA#9054) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>

…IDIA#9054) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com>

[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 for…

086146f

… nemotron test Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>

peaceh-nv requested a review from a team as a code owner November 11, 2025 01:35

litaotju approved these changes Nov 11, 2025

View reviewed changes

peaceh-nv merged commit f1d02b5 into NVIDIA:release/1.1 Nov 11, 2025
7 checks passed

mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 20, 2025

[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 (NV…

2b1e833

…IDIA#9054) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 #9054

[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 #9054

Uh oh!

peaceh-nv commented Nov 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

peaceh-nv commented Nov 11, 2025

Uh oh!

coderabbitai bot commented Nov 11, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

tensorrt-cicd commented Nov 11, 2025

Uh oh!

tensorrt-cicd commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 #9054

[https://nvbugs/5570575][fix] : Use less kv cache memory on SM120 #9054

Uh oh!

Conversation

peaceh-nv commented Nov 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

peaceh-nv commented Nov 11, 2025

Uh oh!

coderabbitai bot commented Nov 11, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

tensorrt-cicd commented Nov 11, 2025

Uh oh!

tensorrt-cicd commented Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

peaceh-nv commented Nov 11, 2025 •

edited by coderabbitai bot

Loading