Skip to content

Conversation

@peaceh-nv
Copy link
Collaborator

@peaceh-nv peaceh-nv commented Nov 11, 2025

Use less kv cache memory on SM120 for nemotron test

Summary by CodeRabbit

  • Tests
    • Updated internal test configuration to optimize memory cache performance testing.

No user-facing changes in this release.

… nemotron test

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
@peaceh-nv peaceh-nv requested a review from a team as a code owner November 11, 2025 01:35
@peaceh-nv
Copy link
Collaborator Author

/bot run

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 11, 2025

📝 Walkthrough

Walkthrough

A command-line argument --kv_cache_fraction={_MEM_FRACTION_50} was added to the test_ptp_quickstart_advanced_2gpus_sm120 test invocation, setting the kv_cache_fraction parameter to 0.5 for that specific test run.

Changes

Cohort / File(s) Summary
Test parameter configuration
tests/integration/defs/test_e2e.py
Added --kv_cache_fraction={_MEM_FRACTION_50} command-line argument to the test_ptp_quickstart_advanced_2gpus_sm120 test invocation to set kv_cache_fraction to 0.5

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

  • Single parameter addition to an existing test invocation with no logic or control flow changes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is minimal and lacks required sections from the template (Description details, Test Coverage, PR Checklist). Add detailed description explaining the issue and solution, list relevant test cases, and complete the PR checklist items.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the issue (NVBugs ticket), type (fix), and main change (reduce KV cache memory on SM120).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0649b77 and 086146f.

📒 Files selected for processing (1)
  • tests/integration/defs/test_e2e.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

  • tests/integration/defs/test_e2e.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

  • tests/integration/defs/test_e2e.py
**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

  • tests/integration/defs/test_e2e.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/integration/defs/test_e2e.py (1)

2416-2416: LGTM! Memory reduction change is appropriate for SM120 architecture.

The addition of --kv_cache_fraction={_MEM_FRACTION_50} effectively reduces KV cache memory usage to 50% for this SM120-specific test, addressing the PR objective. The change is consistent with similar memory management patterns used throughout the test suite (e.g., lines 2018, 2257, 3395).

Given that this is a temporary workaround test for 2-GPU SM120 configurations (as noted in the TODO comment at line 2394), applying a uniform 0.5 fraction across all parameterized models is a reasonable approach.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24068 [ run ] triggered by Bot. Commit: 086146f

@tensorrt-cicd
Copy link
Collaborator

PR_Github #24068 [ run ] completed with state SUCCESS. Commit: 086146f
/LLM/release-1.1/L0_MergeRequest_PR pipeline #468 completed with status: 'SUCCESS'

@peaceh-nv peaceh-nv merged commit f1d02b5 into NVIDIA:release/1.1 Nov 11, 2025
7 checks passed
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 20, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 20, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 20, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 20, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 21, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 21, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 23, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 23, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 23, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 23, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
mikeiovine pushed a commit to mikeiovine/TensorRT-LLM that referenced this pull request Nov 24, 2025
…IDIA#9054)

Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants