Skip to content

[None][chore] Remove onboard block switch for KV cache manager#12449

Merged
eopXD merged 1 commit intoNVIDIA:mainfrom
eopXD:remove-onboard-blocks-switch
Apr 16, 2026
Merged

[None][chore] Remove onboard block switch for KV cache manager#12449
eopXD merged 1 commit intoNVIDIA:mainfrom
eopXD:remove-onboard-blocks-switch

Conversation

@eopXD
Copy link
Copy Markdown
Collaborator

@eopXD eopXD commented Mar 23, 2026

Description

This MR has no functional change intended.

Dead code elimination. The secondary block pool is derived when kv_cache_config::host_cache_size is specified. Whether we onboard/offload a kv cache block can be implicated from whether the manager has secondary block or not. The onboardBlocks toggle itself only adds complication. This commit removes it.

Follows up on #7469, rebased onto current main.

Changes

  • Removed onboardBlocks / onboard_blocks parameter from KvCacheConfig, BlockManager, WindowBlockManager, and KVCacheManager constructors
  • Removed mOnboardBlocks member variable and associated getter/setter
  • Simplified offloading/onboarding conditions to check secondary pool existence directly
  • Updated serialization, Python/nanobind bindings, benchmarks, triton backend, and all tests
  • 31 files changed, 129 insertions, 233 deletions

Test Coverage

Since no functional change is intended, existing tests are updated to remove the parameter. No new test logic is needed.

PR Checklist

  • PR description clearly explains what and why.
  • PR Follows TRT-LLM CODING GUIDELINES
  • Test cases updated for changed code paths
  • No new dependencies
  • Documentation not affected (internal API only)
  • The reviewers assigned automatically/manually are appropriate for the PR.

Summary by CodeRabbit

Release Notes

  • Chores
    • Removed the onboard_blocks configuration parameter from KV cache settings across all benchmarks, bindings, and configuration interfaces. The system now automatically handles KV cache block onboarding behavior without requiring explicit configuration. This simplifies the KV cache setup while maintaining core functionality.

@eopXD eopXD requested review from a team as code owners March 23, 2026 09:51
@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from 255d4e6 to bda8824 Compare March 23, 2026 09:54
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Systematic removal of onboardBlocks parameter from KV cache configuration throughout the codebase. Constructor signatures, public accessors/mutators, and all call sites are updated to eliminate this boolean configuration flag.

Changes

Cohort / File(s) Summary
KV Cache Manager Headers
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
Removed bool onboardBlocks parameter from WindowBlockManager, BlockManager, and all KVCacheManager constructor overloads. Updated parameter ordering with CacheType and retained defaults.
Executor KV Cache Config Headers
cpp/include/tensorrt_llm/executor/executor.h
Removed onboardBlocks parameter from KvCacheConfig constructor and deleted public getOnboardBlocks() and setOnboardBlocks() methods.
KV Cache Manager Implementation
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
Updated constructors to remove onboardBlocks parameter forwarding. Removed mOnboardBlocks member checks from getFreeBlock(), onboardBlock(), and offloadBlock() methods. Removed onboarding status from logging.
Executor Configuration Implementation
cpp/tensorrt_llm/executor/kvCacheConfig.cpp
Removed onboardBlocks parameter from constructor and deleted corresponding getter/setter method implementations.
Serialization
cpp/tensorrt_llm/executor/serialization.cpp
Updated serialization and deserialization to stop reading/writing onboardBlocks field. Adjusted serializedSize calculation.
Model Initialization
cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
Removed validation requiring onboardBlocks true for non-paged context FMHA. Updated KVCacheManager constructor call to omit onboardBlocks argument.
Python Bindings - Batch Manager
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
Removed bool parameter from nanobind init<...> template and removed nb::arg("onboard_blocks") = true declaration. Shifted remaining parameter bindings.
Python Bindings - Executor
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
Removed onboard_blocks parameter from constructor binding. Removed Python property onboard_blocks. Updated __setstate__ to validate tuple length 14 instead of 15. Adjusted state indexing.
Python API
tensorrt_llm/llmapi/llm_args.py, tensorrt_llm/_torch/pyexecutor/resource_manager.py
Removed onboard_blocks field from KvCacheConfig dataclass and stopped passing it to C++ backend implementations.
C++ Test Files
cpp/tests/unit_tests/batch_manager/cacheTransBufferTest.cpp, cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp, cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp, cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp
Removed onboardBlocks variable declarations and updated BlockManager/KVCacheManager constructor calls to omit the parameter.
Additional C++ Tests
cpp/tests/unit_tests/executor/agentCommTest.cpp, cpp/tests/unit_tests/executor/serializeUtilsTest.cpp, cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
Removed onboardBlocks variable declarations and constructor arguments. Removed equality assertions involving getOnboardBlocks().
Benchmark Code
benchmarks/cpp/disaggServerBenchmark.cpp, benchmarks/cpp/gptManagerBenchmark.cpp, benchmarks/cpp/utils/utils.h
Removed kv_onboard_blocks CLI option parsing and kvOnboardBlocks field from BenchmarkParams. Updated KvCacheConfig construction calls.
Python Test Files
tests/unittest/llmapi/test_llm_args.py, tests/unittest/llmapi/test_llm_kv_cache_events.py, tests/unittest/_torch/executor/test_resource_manager.py, tests/unittest/bindings/test_bindings_ut.py, tests/unittest/bindings/test_executor_bindings.py, tests/unittest/disaggregated/test_extractor.py, tests/unittest/disaggregated/test_kv_transfer.py
Removed onboard_blocks parameter from KvCacheConfig construction and removed assertions validating onboarding behavior.
Triton Backend
triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py, triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt, triton_backend/all_models/tests/test_python_backend.py, triton_backend/inflight_batcher_llm/src/model_instance_state.cc
Removed kv_cache_onboard_blocks parameter reading from configuration. Removed protobuf parameter definition. Updated KvCacheConfig instantiation to pass std::nullopt for onboarding position. Removed test assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main objective: removing the onboard block switch from the KV cache manager. It is concise, specific, and directly reflects the primary change.
Description check ✅ Passed The PR description adequately covers the purpose (dead code elimination), changes made, test coverage approach, and includes a completed checklist. All required sections are addressed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from bda8824 to ddb19ab Compare March 23, 2026 09:59
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 23, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39920 [ run ] triggered by Bot. Commit: ddb19ab Link to invocation

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (17)
cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the SPDX copyright year in the modified test files.

This file is changed in this PR, but the header still ends at 2025. The same update is needed in cpp/tests/unit_tests/batch_manager/cacheTransBufferTest.cpp and cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp` at line 2,
Update the SPDX copyright header year in the modified test files from 2025 to
2026: edit the top-of-file header in capacitySchedulerTest.cpp and also make the
same change in cacheTransBufferTest.cpp and cacheTransceiverTest.cpp so the
SPDX-FileCopyrightText year range ends with 2026.
cpp/tensorrt_llm/executor/serialization.cpp (2)

1255-1273: ⚠️ Potential issue | 🟠 Major

Version or preserve the removed onboardBlocks field in the wire format.

This change shortens the serialized KvCacheConfig payload, but the reader still has no format discriminator. Any buffer produced by an older build will now be misaligned from hostCacheSize onward, which can break persisted configs/pickles or mixed-version components. Please either keep a reserved bool in the wire format for compatibility or gate the new layout behind an explicit serialization version.

Also applies to: 1276-1310

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/executor/serialization.cpp` around lines 1255 - 1273,
deserializeKvCacheConfig currently removes the onboardBlocks field from the wire
format causing older serialized buffers to be parsed incorrectly; update
deserializeKvCacheConfig (and the corresponding serialization) to preserve
compatibility by either: 1) reading a reserved bool placeholder (e.g., read a
dummy "onboardBlocks" bool value) before hostCacheSize so the field order
matches older binaries, or 2) introduce and read an explicit serialization
version discriminator at the start of the KvCacheConfig wire format and branch
deserialization accordingly; ensure the KvCacheConfig constructor ordering and
the sequence of su::deserialize calls (including onboardBlocks or a version
check) match the writer so older persisted configs remain correctly parsed.

1-16: ⚠️ Potential issue | 🟡 Minor

Update the SPDX copyright year to include 2026.

This file is modified in this PR, so the header should reflect the latest meaningful modification year.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files" and "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/executor/serialization.cpp` around lines 1 - 16, Update the
SPDX header year range in the file's top comment block from "2025" to include
2026 (e.g., change "Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES" to
"Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES" or similar) so the
header in serialization.cpp reflects the latest modification year; preserve the
SPDX-License-Identifier and surrounding license text exactly while only
adjusting the year in the comment block.
cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1)

1-16: ⚠️ Potential issue | 🟡 Minor

Update the SPDX copyright year to include 2026.

This file is modified in this PR, so the header should reflect the latest meaningful modification year.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files" and "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp` around lines
1 - 16, Update the SPDX copyright year in the file header from 2025 to 2026 by
changing the year value in the SPDX-FileCopyrightText comment at the top of the
file (the SPDX header block containing "SPDX-FileCopyrightText" and
"SPDX-License-Identifier"), ensuring the header now reflects 2026 as the latest
meaningful modification year.
cpp/tests/unit_tests/executor/agentCommTest.cpp (1)

1-16: ⚠️ Potential issue | 🟡 Minor

Update the SPDX copyright year to include 2026.

This file is modified in this PR, so the header should reflect the latest meaningful modification year.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files" and "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/executor/agentCommTest.cpp` around lines 1 - 16, Update
the SPDX header year range to include 2026 by changing the
"SPDX-FileCopyrightText" line that currently reads "Copyright (c) 2023-2025
NVIDIA CORPORATION & AFFILIATES." to include 2026 (e.g., "2023-2026"), and
ensure any other occurrences of the 2023-2025 year range in the file
header/comments are updated similarly so the SPDX and license header reflect the
latest modification year.
benchmarks/cpp/disaggServerBenchmark.cpp (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2026.

The copyright header shows 2022-2024, but this file is being modified in 2026. As per coding guidelines, the copyright year should reflect the year of the latest meaningful modification (2022-2026).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/disaggServerBenchmark.cpp` at line 2, Update the copyright
header comment at the top of disaggServerBenchmark.cpp from "2022-2024" to
"2022-2026" so the header reflects the latest modification year; locate the
top-of-file copyright block (the file header comment containing
"SPDX-FileCopyrightText" and the NVIDIA CORPORATION & AFFILIATES line) and
change the year range accordingly.
triton_backend/inflight_batcher_llm/src/model_instance_state.cc (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2026.

The copyright header shows 2024, but this file is being modified in 2026. As per coding guidelines, "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@triton_backend/inflight_batcher_llm/src/model_instance_state.cc` at line 1,
Update the copyright header at the top of the file by changing the year from
2024 to 2026; locate the top-of-file copyright comment (the initial comment line
starting with "Copyright 2024, NVIDIA CORPORATION & AFFILIATES") and replace
"2024" with "2026" so the header reflects the latest modification year.
benchmarks/cpp/gptManagerBenchmark.cpp (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2026.

The copyright header shows 2022-2024, but this file is being modified in 2026. As per coding guidelines, the copyright year should reflect the year of the latest meaningful modification (2022-2026).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/cpp/gptManagerBenchmark.cpp` at line 2, Update the copyright
header string in gptManagerBenchmark.cpp from "2022-2024" to "2022-2026" so the
top-of-file SPDX header reflects the current modification year; locate the
header comment (the SPDX-FileCopyrightText line) and replace the end year
accordingly.
cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp (1)

2-2: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2026.

The copyright header shows 2023-2025, but this file is being modified in 2026. As per coding guidelines, the copyright year should reflect the year of the latest meaningful modification (2023-2026).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp` at line 2, Update
the file header's copyright year range by replacing the existing "2023-2025"
occurrence in the SPDX header line (the line containing
"SPDX-FileCopyrightText") with "2023-2026" so the header reflects the 2026
modification.
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)

2-2: ⚠️ Potential issue | 🟠 Major

Update the NVIDIA copyright year for this modified file.

The header still ends at 2025; this file was modified and should include 2026.

Suggested fix
- * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` at line 2, Update
the copyright header in the modified test file by changing the year range in the
SPDX header string from "2023-2025" to "2023-2026" in
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp; locate the line
containing "SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION &
AFFILIATES." and replace 2025 with 2026 so the file header reflects the
modification year.
cpp/tensorrt_llm/executor/kvCacheConfig.cpp (1)

1-15: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to include 2026.

This file is modified in this PR, but the header still ends at 2025.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/executor/kvCacheConfig.cpp` around lines 1 - 15, Update the
file header in kvCacheConfig.cpp to include 2026 in the SPDX copyright line(s)
so the copyright range reads through 2026 (e.g., change "2025" to "2025-2026" or
similar per project convention); ensure the SPDX-License-Identifier and
surrounding license block remain unchanged and keep the header formatting
consistent with other modified files.
cpp/include/tensorrt_llm/executor/executor.h (1)

1038-1048: ⚠️ Potential issue | 🟠 Major

Preserve a compatibility path for the removed onboardBlocks slot.

Line 1042 now places hostCacheSize where the old boolean used to live. Because the trailing parameters are mostly implicitly convertible std::optional numerics, older positional callers can still compile and silently reinterpret onboardBlocks as hostCacheSize, shifting the rest of the offload arguments. The nanobind constructor in cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp Lines 131-142 mirrors the same positional layout, so older Python positional calls have the same risk. Please keep a deprecated forwarding overload/signature that ignores the legacy flag instead of removing this slot outright.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/tensorrt_llm/executor/executor.h` around lines 1038 - 1048, The
new KvCacheConfig constructor removed the boolean onboardBlocks slot causing
legacy positional calls to misbind subsequent std::optional numeric parameters;
restore a deprecated overload that accepts the original bool onboardBlocks as
the same positional parameter list and simply forwards to the new
KvCacheConfig(...) ignoring onboardBlocks, so existing C++ callers still compile
and maintain correct argument alignment; also update the nanobind constructor
mapping in cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (the binding
around lines 131-142) to expose the deprecated positional signature for Python
callers to preserve compatibility while marking it as deprecated.
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)

1-15: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to include 2026.

This file is modified in this PR, but the header still ends at 2025.

As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp` around lines 1 - 15,
The file header's copyright range in the SPDX-FileCopyrightText line currently
reads "2022-2025"; update that range to include 2026 (e.g., "2022-2026") and
ensure the SPDX-License-Identifier and license block remain unchanged so the
header matches the repo guideline; locate the header by the
SPDX-FileCopyrightText and SPDX-License-Identifier lines at the top of
executorConfig.cpp and only change the year range.
cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp (1)

1104-1131: ⚠️ Potential issue | 🟠 Major

Mirror the secondary-pool availability check before offloading.

Unlike getFreeBlock() at Lines 984-1004, this path now calls getFreeBlock(kSecondaryLevel) for any primary block without first checking that a secondary slot is actually available. With hostCacheSize == 0 or an exhausted secondary pool, offloadBlock() now depends on a secondary block that may not exist. This should no-op unless secondary capacity is available.

Suggested guard
 void WindowBlockManager::offloadBlock(
     BlockPtr const& block, executor::KvCacheTransferMode mode, std::string const& directory)
 {
@@
-    if (block->isPrimary())
+    if (block->isPrimary() && mNumSecondaryBlocks > 0
+        && mEvictionPolicy->getNumFreeBlocks(kSecondaryLevel) > 0)
     {
         // Offload block in primary memory before repurposing
         auto offloadBlock = std::get<0>(mEvictionPolicy->getFreeBlock(kSecondaryLevel));
         // If we're swapping a block to secondary memory, maintain the prior priority values.
         mEvictionPolicy->claimBlock(offloadBlock);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp` around lines 1104 - 1131,
The offloadBlock path calls mEvictionPolicy->getFreeBlock(kSecondaryLevel)
unguarded which can return no secondary slot (e.g., hostCacheSize==0) and lead
to invalid offload; mirror the availability check used earlier by first
verifying a secondary free slot (either via
mEvictionPolicy->hasFreeBlock(kSecondaryLevel) or by checking the result of
getFreeBlock for a null/invalid BlockPtr) and no-op (return) if none exists;
keep the rest of the logic (claimBlock(offloadBlock),
mTransferManager->offload(...), block->swapMemoryPoolBlockOffset(offloadBlock),
event enqueue, releaseBlock(offloadBlock)) unchanged and only run them when a
valid offloadBlock is obtained.
cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (1)

589-598: ⚠️ Potential issue | 🟠 Major

These public constructor removals are source-breaking for downstream C++ code.

Because this is an installed header under cpp/include, removing onboardBlocks from WindowBlockManager, BlockManager, and KVCacheManager breaks out-of-tree callers that still pass the old bool. Since the flag is now redundant rather than semantically removed, a deprecated forwarding overload would preserve source compatibility while still cleaning up the internal state.

Also applies to: 1059-1070, 1754-1800

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h` around lines 589 -
598, Add deprecated forwarding overloads of the public constructors that accept
the removed bool parameter (onboardBlocks) for WindowBlockManager, BlockManager,
and KVCacheManager so existing downstream callers remain source-compatible;
implement each overload to call the new constructor signature, map the old
onboardBlocks boolean to the new internal behavior (same default behavior as
before), mark the overloads as deprecated (e.g., [[deprecated]] or a macro) and
forward all other parameters unchanged; update the three constructor
declarations for WindowBlockManager, BlockManager, and KVCacheManager to include
these deprecated overloads to preserve source compatibility while keeping the
new cleaned-up constructors as the primary implementations.
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (2)

1-15: ⚠️ Potential issue | 🟡 Minor

Update the copyright year on this modified file.

This file changed in 2026, but the header still ends at 2025. Please bump the latest year to match the current modification. As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp` around lines 1 -
15, Update the SPDX header years in the file header of kvCacheManager.cpp to
include 2026 (e.g., change "2022-2025" to "2022-2026" or the appropriate range),
ensuring the SPDX-FileCopyrightText line and any copyright-year occurrences
reflect the new modification year; locate the top-of-file comment block
containing SPDX-FileCopyrightText and SPDX-License-Identifier and update the end
year accordingly.

525-541: ⚠️ Potential issue | 🟡 Minor

Update the copyright year to 2022-2026 in the file header.

This is a new file being created in a 2026 PR. Per the coding guidelines, the copyright header should reflect the year of the latest meaningful modification (2026).

The binding change itself is correct—the onboard_blocks parameter was removed from the underlying C++ KVCacheManager class as dead code elimination per the commit message, and these new Python bindings properly reflect the current C++ class signature. Since the binding did not exist before, there is no backward compatibility concern for existing Python callers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp` around lines 525
- 541, Update the file header copyright range to "2022-2026" at the top of this
new file; locate the header comment block near the top of
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (above the
nb::class_<tbk::KVCacheManager,...> binding and includes) and change the year
range accordingly so the header reflects the 2026 modification.
🧹 Nitpick comments (1)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (1)

223-226: Annotate the trailing KVCacheManager arguments here.

Dropping one positional bool makes the final std::nullopt, nullptr, true tail hard to audit. The later AsymmetricalCacheTest::setUpCacheManager call in this file already uses inline parameter comments for the boolean/pointer tail; mirroring that style here would make future signature churn less brittle.

As per coding guidelines, "In C++ function calls where parameters are not obvious from inspection, use inline C comments to document the parameter (e.g., doSomeOperation(/* checkForErrors = */ false);)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp` around lines 223 -
226, Annotate the trailing arguments in the KVCacheManager constructor call to
make the boolean/pointer semantics explicit: replace the tail "std::nullopt,
nullptr, true" with inline C-style comments naming each parameter (e.g.,
"std::nullopt /* optional_param_name = */, nullptr /* allocator_or_callback =
*/, true /* enable_something = */") matching the style used in
AsymmetricalCacheTest::setUpCacheManager so the intent of these last parameters
is clear and robust to signature churn.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`:
- Around line 117-128: kvCacheConfigSetstate currently rejects legacy 15-field
pickle tuples; update the logic in the lambda for
tle::KvCacheConfig::__setstate__ (kvCacheConfigSetstate) to accept both size 14
and 15 tuples, treat the legacy extra field (the removed onboard_blocks slot) as
ignorable when present, and map remaining fields to the existing constructor
arguments accordingly (i.e., if state.size()==15 skip the legacy index when
casting subsequent fields, otherwise use the existing indices). Ensure you still
throw for any other sizes and keep the constructor/new placement for
tle::KvCacheConfig unchanged.

---

Outside diff comments:
In `@benchmarks/cpp/disaggServerBenchmark.cpp`:
- Line 2: Update the copyright header comment at the top of
disaggServerBenchmark.cpp from "2022-2024" to "2022-2026" so the header reflects
the latest modification year; locate the top-of-file copyright block (the file
header comment containing "SPDX-FileCopyrightText" and the NVIDIA CORPORATION &
AFFILIATES line) and change the year range accordingly.

In `@benchmarks/cpp/gptManagerBenchmark.cpp`:
- Line 2: Update the copyright header string in gptManagerBenchmark.cpp from
"2022-2024" to "2022-2026" so the top-of-file SPDX header reflects the current
modification year; locate the header comment (the SPDX-FileCopyrightText line)
and replace the end year accordingly.

In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`:
- Around line 589-598: Add deprecated forwarding overloads of the public
constructors that accept the removed bool parameter (onboardBlocks) for
WindowBlockManager, BlockManager, and KVCacheManager so existing downstream
callers remain source-compatible; implement each overload to call the new
constructor signature, map the old onboardBlocks boolean to the new internal
behavior (same default behavior as before), mark the overloads as deprecated
(e.g., [[deprecated]] or a macro) and forward all other parameters unchanged;
update the three constructor declarations for WindowBlockManager, BlockManager,
and KVCacheManager to include these deprecated overloads to preserve source
compatibility while keeping the new cleaned-up constructors as the primary
implementations.

In `@cpp/include/tensorrt_llm/executor/executor.h`:
- Around line 1038-1048: The new KvCacheConfig constructor removed the boolean
onboardBlocks slot causing legacy positional calls to misbind subsequent
std::optional numeric parameters; restore a deprecated overload that accepts the
original bool onboardBlocks as the same positional parameter list and simply
forwards to the new KvCacheConfig(...) ignoring onboardBlocks, so existing C++
callers still compile and maintain correct argument alignment; also update the
nanobind constructor mapping in
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (the binding around lines
131-142) to expose the deprecated positional signature for Python callers to
preserve compatibility while marking it as deprecated.

In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Around line 1104-1131: The offloadBlock path calls
mEvictionPolicy->getFreeBlock(kSecondaryLevel) unguarded which can return no
secondary slot (e.g., hostCacheSize==0) and lead to invalid offload; mirror the
availability check used earlier by first verifying a secondary free slot (either
via mEvictionPolicy->hasFreeBlock(kSecondaryLevel) or by checking the result of
getFreeBlock for a null/invalid BlockPtr) and no-op (return) if none exists;
keep the rest of the logic (claimBlock(offloadBlock),
mTransferManager->offload(...), block->swapMemoryPoolBlockOffset(offloadBlock),
event enqueue, releaseBlock(offloadBlock)) unchanged and only run them when a
valid offloadBlock is obtained.

In `@cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp`:
- Around line 1-16: Update the SPDX copyright year in the file header from 2025
to 2026 by changing the year value in the SPDX-FileCopyrightText comment at the
top of the file (the SPDX header block containing "SPDX-FileCopyrightText" and
"SPDX-License-Identifier"), ensuring the header now reflects 2026 as the latest
meaningful modification year.

In `@cpp/tensorrt_llm/executor/kvCacheConfig.cpp`:
- Around line 1-15: Update the file header in kvCacheConfig.cpp to include 2026
in the SPDX copyright line(s) so the copyright range reads through 2026 (e.g.,
change "2025" to "2025-2026" or similar per project convention); ensure the
SPDX-License-Identifier and surrounding license block remain unchanged and keep
the header formatting consistent with other modified files.

In `@cpp/tensorrt_llm/executor/serialization.cpp`:
- Around line 1255-1273: deserializeKvCacheConfig currently removes the
onboardBlocks field from the wire format causing older serialized buffers to be
parsed incorrectly; update deserializeKvCacheConfig (and the corresponding
serialization) to preserve compatibility by either: 1) reading a reserved bool
placeholder (e.g., read a dummy "onboardBlocks" bool value) before hostCacheSize
so the field order matches older binaries, or 2) introduce and read an explicit
serialization version discriminator at the start of the KvCacheConfig wire
format and branch deserialization accordingly; ensure the KvCacheConfig
constructor ordering and the sequence of su::deserialize calls (including
onboardBlocks or a version check) match the writer so older persisted configs
remain correctly parsed.
- Around line 1-16: Update the SPDX header year range in the file's top comment
block from "2025" to include 2026 (e.g., change "Copyright (c) 2025 NVIDIA
CORPORATION & AFFILIATES" to "Copyright (c) 2025-2026 NVIDIA CORPORATION &
AFFILIATES" or similar) so the header in serialization.cpp reflects the latest
modification year; preserve the SPDX-License-Identifier and surrounding license
text exactly while only adjusting the year in the comment block.

In `@cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp`:
- Around line 1-15: Update the SPDX header years in the file header of
kvCacheManager.cpp to include 2026 (e.g., change "2022-2025" to "2022-2026" or
the appropriate range), ensuring the SPDX-FileCopyrightText line and any
copyright-year occurrences reflect the new modification year; locate the
top-of-file comment block containing SPDX-FileCopyrightText and
SPDX-License-Identifier and update the end year accordingly.
- Around line 525-541: Update the file header copyright range to "2022-2026" at
the top of this new file; locate the header comment block near the top of
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (above the
nb::class_<tbk::KVCacheManager,...> binding and includes) and change the year
range accordingly so the header reflects the 2026 modification.

In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`:
- Around line 1-15: The file header's copyright range in the
SPDX-FileCopyrightText line currently reads "2022-2025"; update that range to
include 2026 (e.g., "2022-2026") and ensure the SPDX-License-Identifier and
license block remain unchanged so the header matches the repo guideline; locate
the header by the SPDX-FileCopyrightText and SPDX-License-Identifier lines at
the top of executorConfig.cpp and only change the year range.

In `@cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp`:
- Line 2: Update the SPDX copyright header year in the modified test files from
2025 to 2026: edit the top-of-file header in capacitySchedulerTest.cpp and also
make the same change in cacheTransBufferTest.cpp and cacheTransceiverTest.cpp so
the SPDX-FileCopyrightText year range ends with 2026.

In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Line 2: Update the copyright header in the modified test file by changing the
year range in the SPDX header string from "2023-2025" to "2023-2026" in
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp; locate the line
containing "SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION &
AFFILIATES." and replace 2025 with 2026 so the file header reflects the
modification year.

In `@cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp`:
- Line 2: Update the file header's copyright year range by replacing the
existing "2023-2025" occurrence in the SPDX header line (the line containing
"SPDX-FileCopyrightText") with "2023-2026" so the header reflects the 2026
modification.

In `@cpp/tests/unit_tests/executor/agentCommTest.cpp`:
- Around line 1-16: Update the SPDX header year range to include 2026 by
changing the "SPDX-FileCopyrightText" line that currently reads "Copyright (c)
2023-2025 NVIDIA CORPORATION & AFFILIATES." to include 2026 (e.g., "2023-2026"),
and ensure any other occurrences of the 2023-2025 year range in the file
header/comments are updated similarly so the SPDX and license header reflect the
latest modification year.

In `@triton_backend/inflight_batcher_llm/src/model_instance_state.cc`:
- Line 1: Update the copyright header at the top of the file by changing the
year from 2024 to 2026; locate the top-of-file copyright comment (the initial
comment line starting with "Copyright 2024, NVIDIA CORPORATION & AFFILIATES")
and replace "2024" with "2026" so the header reflects the latest modification
year.

---

Nitpick comments:
In `@cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp`:
- Around line 223-226: Annotate the trailing arguments in the KVCacheManager
constructor call to make the boolean/pointer semantics explicit: replace the
tail "std::nullopt, nullptr, true" with inline C-style comments naming each
parameter (e.g., "std::nullopt /* optional_param_name = */, nullptr /*
allocator_or_callback = */, true /* enable_something = */") matching the style
used in AsymmetricalCacheTest::setUpCacheManager so the intent of these last
parameters is clear and robust to signature churn.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0d50f1d8-c5be-4370-99f5-59ba4c0eff5d

📥 Commits

Reviewing files that changed from the base of the PR and between 4f929fe and ddb19ab.

📒 Files selected for processing (31)
  • benchmarks/cpp/disaggServerBenchmark.cpp
  • benchmarks/cpp/gptManagerBenchmark.cpp
  • benchmarks/cpp/utils/utils.h
  • cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h
  • cpp/include/tensorrt_llm/executor/executor.h
  • cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp
  • cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp
  • cpp/tensorrt_llm/executor/kvCacheConfig.cpp
  • cpp/tensorrt_llm/executor/serialization.cpp
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
  • cpp/tests/unit_tests/batch_manager/cacheTransBufferTest.cpp
  • cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp
  • cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp
  • cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp
  • cpp/tests/unit_tests/executor/agentCommTest.cpp
  • cpp/tests/unit_tests/executor/serializeUtilsTest.cpp
  • cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp
  • tensorrt_llm/_torch/pyexecutor/resource_manager.py
  • tensorrt_llm/llmapi/llm_args.py
  • tests/unittest/_torch/executor/test_resource_manager.py
  • tests/unittest/bindings/test_bindings_ut.py
  • tests/unittest/bindings/test_executor_bindings.py
  • tests/unittest/disaggregated/test_extractor.py
  • tests/unittest/disaggregated/test_kv_transfer.py
  • tests/unittest/llmapi/test_llm_args.py
  • tests/unittest/llmapi/test_llm_kv_cache_events.py
  • triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py
  • triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt
  • triton_backend/all_models/tests/test_python_backend.py
  • triton_backend/inflight_batcher_llm/src/model_instance_state.cc
💤 Files with no reviewable changes (14)
  • tests/unittest/disaggregated/test_extractor.py
  • benchmarks/cpp/utils/utils.h
  • triton_backend/all_models/tests/test_python_backend.py
  • tests/unittest/bindings/test_bindings_ut.py
  • tests/unittest/llmapi/test_llm_kv_cache_events.py
  • tests/unittest/bindings/test_executor_bindings.py
  • triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt
  • cpp/tests/unit_tests/executor/serializeUtilsTest.cpp
  • tests/unittest/_torch/executor/test_resource_manager.py
  • tensorrt_llm/_torch/pyexecutor/resource_manager.py
  • tests/unittest/disaggregated/test_kv_transfer.py
  • tensorrt_llm/llmapi/llm_args.py
  • triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py
  • tests/unittest/llmapi/test_llm_args.py

Comment thread cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39920 [ run ] completed with state SUCCESS. Commit: ddb19ab
/LLM/main/L0_MergeRequest_PR pipeline #31087 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 23, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39955 [ run ] triggered by Bot. Commit: 7324262 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39955 [ run ] completed with state SUCCESS. Commit: 7324262
/LLM/main/L0_MergeRequest_PR pipeline #31120 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from 7324262 to 43faad7 Compare March 24, 2026 02:42
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 24, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40036 [ run ] triggered by Bot. Commit: 43faad7 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40036 [ run ] completed with state SUCCESS. Commit: 43faad7
/LLM/main/L0_MergeRequest_PR pipeline #31190 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 24, 2026

/bot run --disable-fail-fast

@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from 43faad7 to 01ee2af Compare March 24, 2026 07:14
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 24, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40079 [ run ] triggered by Bot. Commit: 01ee2af Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40081 [ run ] triggered by Bot. Commit: 01ee2af Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40081 [ run ] completed with state SUCCESS. Commit: 01ee2af
/LLM/main/L0_MergeRequest_PR pipeline #31233 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from 01ee2af to 7cea747 Compare March 25, 2026 05:29
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 25, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40262 [ run ] triggered by Bot. Commit: 7cea747 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40262 [ run ] completed with state SUCCESS. Commit: 7cea747
/LLM/main/L0_MergeRequest_PR pipeline #31385 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Mar 31, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40830 [ run ] triggered by Bot. Commit: 2097c62 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40830 [ run ] completed with state SUCCESS. Commit: 2097c62
/LLM/main/L0_MergeRequest_PR pipeline #31841 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch 3 times, most recently from a7010dc to b341e8d Compare April 7, 2026 08:04
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Apr 7, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42112 [ run ] triggered by Bot. Commit: b341e8d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42112 [ run ] completed with state FAILURE. Commit: b341e8d
/LLM/main/L0_MergeRequest_PR pipeline #32950 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from b341e8d to 67b87cf Compare April 8, 2026 02:45
Dead code elimination. The secondary block pool is derived when
kv_cache_config::host_cache_size is specified. Whether we
onboard/offload a kv cache block can be implicated from whether
the manager has secondary block or not. The `onboardBlocks` toggle
itself only adds complication. This commit removes it.

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
@eopXD eopXD force-pushed the remove-onboard-blocks-switch branch from 67b87cf to 19639ed Compare April 8, 2026 02:47
@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Apr 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42247 [ run ] triggered by Bot. Commit: 19639ed Link to invocation

@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Apr 9, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42425 [ run ] triggered by Bot. Commit: 19639ed Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42425 [ run ] completed with state ABORTED. Commit: 19639ed

Link to invocation

@eopXD
Copy link
Copy Markdown
Collaborator Author

eopXD commented Apr 10, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42627 [ run ] triggered by Bot. Commit: 19639ed Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42627 [ run ] completed with state SUCCESS. Commit: 19639ed
/LLM/main/L0_MergeRequest_PR pipeline #33345 completed with status: 'SUCCESS'

CI Report

Link to invocation

Copy link
Copy Markdown
Collaborator

@thorjohnsen thorjohnsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field isn't entirely useless, it allows models that wouldn't otherwise fit to run on dinky hardware since KV cache can reside in CPU memory. In practice the performance penalty of doing this was so severe that nobody used it. We can let it go now.

@pcastonguay
Copy link
Copy Markdown
Collaborator

llm_args and backend changes lgtm.

Copy link
Copy Markdown
Collaborator

@SimengLiu-nv SimengLiu-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@eopXD eopXD merged commit ec34644 into NVIDIA:main Apr 16, 2026
5 checks passed
chienchunhung pushed a commit to chienchunhung/TensorRT-LLM that referenced this pull request Apr 16, 2026
…A#12449)

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
alyosha-swamy pushed a commit to alyosha-swamy/TensorRT-LLM that referenced this pull request Apr 17, 2026
…A#12449)

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
alyosha-swamy pushed a commit to alyosha-swamy/TensorRT-LLM that referenced this pull request Apr 17, 2026
…A#12449)

Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants