[None][chore] Remove onboard block switch for KV cache manager#12449
[None][chore] Remove onboard block switch for KV cache manager#12449eopXD merged 1 commit intoNVIDIA:mainfrom
Conversation
255d4e6 to
bda8824
Compare
📝 WalkthroughWalkthroughSystematic removal of Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
bda8824 to
ddb19ab
Compare
|
/bot run |
|
PR_Github #39920 [ run ] triggered by Bot. Commit: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (17)
cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the SPDX copyright year in the modified test files.
This file is changed in this PR, but the header still ends at
2025. The same update is needed incpp/tests/unit_tests/batch_manager/cacheTransBufferTest.cppandcpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp.As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp` at line 2, Update the SPDX copyright header year in the modified test files from 2025 to 2026: edit the top-of-file header in capacitySchedulerTest.cpp and also make the same change in cacheTransBufferTest.cpp and cacheTransceiverTest.cpp so the SPDX-FileCopyrightText year range ends with 2026.cpp/tensorrt_llm/executor/serialization.cpp (2)
1255-1273:⚠️ Potential issue | 🟠 MajorVersion or preserve the removed
onboardBlocksfield in the wire format.This change shortens the serialized
KvCacheConfigpayload, but the reader still has no format discriminator. Any buffer produced by an older build will now be misaligned fromhostCacheSizeonward, which can break persisted configs/pickles or mixed-version components. Please either keep a reserved bool in the wire format for compatibility or gate the new layout behind an explicit serialization version.Also applies to: 1276-1310
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/executor/serialization.cpp` around lines 1255 - 1273, deserializeKvCacheConfig currently removes the onboardBlocks field from the wire format causing older serialized buffers to be parsed incorrectly; update deserializeKvCacheConfig (and the corresponding serialization) to preserve compatibility by either: 1) reading a reserved bool placeholder (e.g., read a dummy "onboardBlocks" bool value) before hostCacheSize so the field order matches older binaries, or 2) introduce and read an explicit serialization version discriminator at the start of the KvCacheConfig wire format and branch deserialization accordingly; ensure the KvCacheConfig constructor ordering and the sequence of su::deserialize calls (including onboardBlocks or a version check) match the writer so older persisted configs remain correctly parsed.
1-16:⚠️ Potential issue | 🟡 MinorUpdate the SPDX copyright year to include 2026.
This file is modified in this PR, so the header should reflect the latest meaningful modification year.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files" and "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/executor/serialization.cpp` around lines 1 - 16, Update the SPDX header year range in the file's top comment block from "2025" to include 2026 (e.g., change "Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES" to "Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES" or similar) so the header in serialization.cpp reflects the latest modification year; preserve the SPDX-License-Identifier and surrounding license text exactly while only adjusting the year in the comment block.cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp (1)
1-16:⚠️ Potential issue | 🟡 MinorUpdate the SPDX copyright year to include 2026.
This file is modified in this PR, so the header should reflect the latest meaningful modification year.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files" and "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp` around lines 1 - 16, Update the SPDX copyright year in the file header from 2025 to 2026 by changing the year value in the SPDX-FileCopyrightText comment at the top of the file (the SPDX header block containing "SPDX-FileCopyrightText" and "SPDX-License-Identifier"), ensuring the header now reflects 2026 as the latest meaningful modification year.cpp/tests/unit_tests/executor/agentCommTest.cpp (1)
1-16:⚠️ Potential issue | 🟡 MinorUpdate the SPDX copyright year to include 2026.
This file is modified in this PR, so the header should reflect the latest meaningful modification year.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files" and "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/executor/agentCommTest.cpp` around lines 1 - 16, Update the SPDX header year range to include 2026 by changing the "SPDX-FileCopyrightText" line that currently reads "Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES." to include 2026 (e.g., "2023-2026"), and ensure any other occurrences of the 2023-2025 year range in the file header/comments are updated similarly so the SPDX and license header reflect the latest modification year.benchmarks/cpp/disaggServerBenchmark.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2026.
The copyright header shows 2022-2024, but this file is being modified in 2026. As per coding guidelines, the copyright year should reflect the year of the latest meaningful modification (2022-2026).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/cpp/disaggServerBenchmark.cpp` at line 2, Update the copyright header comment at the top of disaggServerBenchmark.cpp from "2022-2024" to "2022-2026" so the header reflects the latest modification year; locate the top-of-file copyright block (the file header comment containing "SPDX-FileCopyrightText" and the NVIDIA CORPORATION & AFFILIATES line) and change the year range accordingly.triton_backend/inflight_batcher_llm/src/model_instance_state.cc (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2026.
The copyright header shows 2024, but this file is being modified in 2026. As per coding guidelines, "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@triton_backend/inflight_batcher_llm/src/model_instance_state.cc` at line 1, Update the copyright header at the top of the file by changing the year from 2024 to 2026; locate the top-of-file copyright comment (the initial comment line starting with "Copyright 2024, NVIDIA CORPORATION & AFFILIATES") and replace "2024" with "2026" so the header reflects the latest modification year.benchmarks/cpp/gptManagerBenchmark.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2026.
The copyright header shows 2022-2024, but this file is being modified in 2026. As per coding guidelines, the copyright year should reflect the year of the latest meaningful modification (2022-2026).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@benchmarks/cpp/gptManagerBenchmark.cpp` at line 2, Update the copyright header string in gptManagerBenchmark.cpp from "2022-2024" to "2022-2026" so the top-of-file SPDX header reflects the current modification year; locate the header comment (the SPDX-FileCopyrightText line) and replace the end year accordingly.cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp (1)
2-2:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2026.
The copyright header shows 2023-2025, but this file is being modified in 2026. As per coding guidelines, the copyright year should reflect the year of the latest meaningful modification (2023-2026).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp` at line 2, Update the file header's copyright year range by replacing the existing "2023-2025" occurrence in the SPDX header line (the line containing "SPDX-FileCopyrightText") with "2023-2026" so the header reflects the 2026 modification.cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp (1)
2-2:⚠️ Potential issue | 🟠 MajorUpdate the NVIDIA copyright year for this modified file.
The header still ends at
2025; this file was modified and should include2026.Suggested fix
- * SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-FileCopyrightText: Copyright (c) 2023-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp` at line 2, Update the copyright header in the modified test file by changing the year range in the SPDX header string from "2023-2025" to "2023-2026" in cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp; locate the line containing "SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION & AFFILIATES." and replace 2025 with 2026 so the file header reflects the modification year.cpp/tensorrt_llm/executor/kvCacheConfig.cpp (1)
1-15:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to include 2026.
This file is modified in this PR, but the header still ends at 2025.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/executor/kvCacheConfig.cpp` around lines 1 - 15, Update the file header in kvCacheConfig.cpp to include 2026 in the SPDX copyright line(s) so the copyright range reads through 2026 (e.g., change "2025" to "2025-2026" or similar per project convention); ensure the SPDX-License-Identifier and surrounding license block remain unchanged and keep the header formatting consistent with other modified files.cpp/include/tensorrt_llm/executor/executor.h (1)
1038-1048:⚠️ Potential issue | 🟠 MajorPreserve a compatibility path for the removed
onboardBlocksslot.Line 1042 now places
hostCacheSizewhere the old boolean used to live. Because the trailing parameters are mostly implicitly convertiblestd::optionalnumerics, older positional callers can still compile and silently reinterpretonboardBlocksashostCacheSize, shifting the rest of the offload arguments. The nanobind constructor incpp/tensorrt_llm/nanobind/executor/executorConfig.cppLines 131-142 mirrors the same positional layout, so older Python positional calls have the same risk. Please keep a deprecated forwarding overload/signature that ignores the legacy flag instead of removing this slot outright.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/tensorrt_llm/executor/executor.h` around lines 1038 - 1048, The new KvCacheConfig constructor removed the boolean onboardBlocks slot causing legacy positional calls to misbind subsequent std::optional numeric parameters; restore a deprecated overload that accepts the original bool onboardBlocks as the same positional parameter list and simply forwards to the new KvCacheConfig(...) ignoring onboardBlocks, so existing C++ callers still compile and maintain correct argument alignment; also update the nanobind constructor mapping in cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (the binding around lines 131-142) to expose the deprecated positional signature for Python callers to preserve compatibility while marking it as deprecated.cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)
1-15:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to include 2026.
This file is modified in this PR, but the header still ends at 2025.
As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp` around lines 1 - 15, The file header's copyright range in the SPDX-FileCopyrightText line currently reads "2022-2025"; update that range to include 2026 (e.g., "2022-2026") and ensure the SPDX-License-Identifier and license block remain unchanged so the header matches the repo guideline; locate the header by the SPDX-FileCopyrightText and SPDX-License-Identifier lines at the top of executorConfig.cpp and only change the year range.cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp (1)
1104-1131:⚠️ Potential issue | 🟠 MajorMirror the secondary-pool availability check before offloading.
Unlike
getFreeBlock()at Lines 984-1004, this path now callsgetFreeBlock(kSecondaryLevel)for any primary block without first checking that a secondary slot is actually available. WithhostCacheSize == 0or an exhausted secondary pool,offloadBlock()now depends on a secondary block that may not exist. This should no-op unless secondary capacity is available.Suggested guard
void WindowBlockManager::offloadBlock( BlockPtr const& block, executor::KvCacheTransferMode mode, std::string const& directory) { @@ - if (block->isPrimary()) + if (block->isPrimary() && mNumSecondaryBlocks > 0 + && mEvictionPolicy->getNumFreeBlocks(kSecondaryLevel) > 0) { // Offload block in primary memory before repurposing auto offloadBlock = std::get<0>(mEvictionPolicy->getFreeBlock(kSecondaryLevel)); // If we're swapping a block to secondary memory, maintain the prior priority values. mEvictionPolicy->claimBlock(offloadBlock);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp` around lines 1104 - 1131, The offloadBlock path calls mEvictionPolicy->getFreeBlock(kSecondaryLevel) unguarded which can return no secondary slot (e.g., hostCacheSize==0) and lead to invalid offload; mirror the availability check used earlier by first verifying a secondary free slot (either via mEvictionPolicy->hasFreeBlock(kSecondaryLevel) or by checking the result of getFreeBlock for a null/invalid BlockPtr) and no-op (return) if none exists; keep the rest of the logic (claimBlock(offloadBlock), mTransferManager->offload(...), block->swapMemoryPoolBlockOffset(offloadBlock), event enqueue, releaseBlock(offloadBlock)) unchanged and only run them when a valid offloadBlock is obtained.cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h (1)
589-598:⚠️ Potential issue | 🟠 MajorThese public constructor removals are source-breaking for downstream C++ code.
Because this is an installed header under
cpp/include, removingonboardBlocksfromWindowBlockManager,BlockManager, andKVCacheManagerbreaks out-of-tree callers that still pass the old bool. Since the flag is now redundant rather than semantically removed, a deprecated forwarding overload would preserve source compatibility while still cleaning up the internal state.Also applies to: 1059-1070, 1754-1800
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h` around lines 589 - 598, Add deprecated forwarding overloads of the public constructors that accept the removed bool parameter (onboardBlocks) for WindowBlockManager, BlockManager, and KVCacheManager so existing downstream callers remain source-compatible; implement each overload to call the new constructor signature, map the old onboardBlocks boolean to the new internal behavior (same default behavior as before), mark the overloads as deprecated (e.g., [[deprecated]] or a macro) and forward all other parameters unchanged; update the three constructor declarations for WindowBlockManager, BlockManager, and KVCacheManager to include these deprecated overloads to preserve source compatibility while keeping the new cleaned-up constructors as the primary implementations.cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (2)
1-15:⚠️ Potential issue | 🟡 MinorUpdate the copyright year on this modified file.
This file changed in 2026, but the header still ends at 2025. Please bump the latest year to match the current modification. As per coding guidelines, "Add NVIDIA copyright header on ALL new files, and update year on modified files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp` around lines 1 - 15, Update the SPDX header years in the file header of kvCacheManager.cpp to include 2026 (e.g., change "2022-2025" to "2022-2026" or the appropriate range), ensuring the SPDX-FileCopyrightText line and any copyright-year occurrences reflect the new modification year; locate the top-of-file comment block containing SPDX-FileCopyrightText and SPDX-License-Identifier and update the end year accordingly.
525-541:⚠️ Potential issue | 🟡 MinorUpdate the copyright year to 2022-2026 in the file header.
This is a new file being created in a 2026 PR. Per the coding guidelines, the copyright header should reflect the year of the latest meaningful modification (2026).
The binding change itself is correct—the
onboard_blocksparameter was removed from the underlying C++KVCacheManagerclass as dead code elimination per the commit message, and these new Python bindings properly reflect the current C++ class signature. Since the binding did not exist before, there is no backward compatibility concern for existing Python callers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp` around lines 525 - 541, Update the file header copyright range to "2022-2026" at the top of this new file; locate the header comment block near the top of cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (above the nb::class_<tbk::KVCacheManager,...> binding and includes) and change the year range accordingly so the header reflects the 2026 modification.
🧹 Nitpick comments (1)
cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp (1)
223-226: Annotate the trailingKVCacheManagerarguments here.Dropping one positional bool makes the final
std::nullopt, nullptr, truetail hard to audit. The laterAsymmetricalCacheTest::setUpCacheManagercall in this file already uses inline parameter comments for the boolean/pointer tail; mirroring that style here would make future signature churn less brittle.As per coding guidelines, "In C++ function calls where parameters are not obvious from inspection, use inline C comments to document the parameter (e.g.,
doSomeOperation(/* checkForErrors = */ false);)."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp` around lines 223 - 226, Annotate the trailing arguments in the KVCacheManager constructor call to make the boolean/pointer semantics explicit: replace the tail "std::nullopt, nullptr, true" with inline C-style comments naming each parameter (e.g., "std::nullopt /* optional_param_name = */, nullptr /* allocator_or_callback = */, true /* enable_something = */") matching the style used in AsymmetricalCacheTest::setUpCacheManager so the intent of these last parameters is clear and robust to signature churn.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`:
- Around line 117-128: kvCacheConfigSetstate currently rejects legacy 15-field
pickle tuples; update the logic in the lambda for
tle::KvCacheConfig::__setstate__ (kvCacheConfigSetstate) to accept both size 14
and 15 tuples, treat the legacy extra field (the removed onboard_blocks slot) as
ignorable when present, and map remaining fields to the existing constructor
arguments accordingly (i.e., if state.size()==15 skip the legacy index when
casting subsequent fields, otherwise use the existing indices). Ensure you still
throw for any other sizes and keep the constructor/new placement for
tle::KvCacheConfig unchanged.
---
Outside diff comments:
In `@benchmarks/cpp/disaggServerBenchmark.cpp`:
- Line 2: Update the copyright header comment at the top of
disaggServerBenchmark.cpp from "2022-2024" to "2022-2026" so the header reflects
the latest modification year; locate the top-of-file copyright block (the file
header comment containing "SPDX-FileCopyrightText" and the NVIDIA CORPORATION &
AFFILIATES line) and change the year range accordingly.
In `@benchmarks/cpp/gptManagerBenchmark.cpp`:
- Line 2: Update the copyright header string in gptManagerBenchmark.cpp from
"2022-2024" to "2022-2026" so the top-of-file SPDX header reflects the current
modification year; locate the header comment (the SPDX-FileCopyrightText line)
and replace the end year accordingly.
In `@cpp/include/tensorrt_llm/batch_manager/kvCacheManager.h`:
- Around line 589-598: Add deprecated forwarding overloads of the public
constructors that accept the removed bool parameter (onboardBlocks) for
WindowBlockManager, BlockManager, and KVCacheManager so existing downstream
callers remain source-compatible; implement each overload to call the new
constructor signature, map the old onboardBlocks boolean to the new internal
behavior (same default behavior as before), mark the overloads as deprecated
(e.g., [[deprecated]] or a macro) and forward all other parameters unchanged;
update the three constructor declarations for WindowBlockManager, BlockManager,
and KVCacheManager to include these deprecated overloads to preserve source
compatibility while keeping the new cleaned-up constructors as the primary
implementations.
In `@cpp/include/tensorrt_llm/executor/executor.h`:
- Around line 1038-1048: The new KvCacheConfig constructor removed the boolean
onboardBlocks slot causing legacy positional calls to misbind subsequent
std::optional numeric parameters; restore a deprecated overload that accepts the
original bool onboardBlocks as the same positional parameter list and simply
forwards to the new KvCacheConfig(...) ignoring onboardBlocks, so existing C++
callers still compile and maintain correct argument alignment; also update the
nanobind constructor mapping in
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (the binding around lines
131-142) to expose the deprecated positional signature for Python callers to
preserve compatibility while marking it as deprecated.
In `@cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp`:
- Around line 1104-1131: The offloadBlock path calls
mEvictionPolicy->getFreeBlock(kSecondaryLevel) unguarded which can return no
secondary slot (e.g., hostCacheSize==0) and lead to invalid offload; mirror the
availability check used earlier by first verifying a secondary free slot (either
via mEvictionPolicy->hasFreeBlock(kSecondaryLevel) or by checking the result of
getFreeBlock for a null/invalid BlockPtr) and no-op (return) if none exists;
keep the rest of the logic (claimBlock(offloadBlock),
mTransferManager->offload(...), block->swapMemoryPoolBlockOffset(offloadBlock),
event enqueue, releaseBlock(offloadBlock)) unchanged and only run them when a
valid offloadBlock is obtained.
In `@cpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cpp`:
- Around line 1-16: Update the SPDX copyright year in the file header from 2025
to 2026 by changing the year value in the SPDX-FileCopyrightText comment at the
top of the file (the SPDX header block containing "SPDX-FileCopyrightText" and
"SPDX-License-Identifier"), ensuring the header now reflects 2026 as the latest
meaningful modification year.
In `@cpp/tensorrt_llm/executor/kvCacheConfig.cpp`:
- Around line 1-15: Update the file header in kvCacheConfig.cpp to include 2026
in the SPDX copyright line(s) so the copyright range reads through 2026 (e.g.,
change "2025" to "2025-2026" or similar per project convention); ensure the
SPDX-License-Identifier and surrounding license block remain unchanged and keep
the header formatting consistent with other modified files.
In `@cpp/tensorrt_llm/executor/serialization.cpp`:
- Around line 1255-1273: deserializeKvCacheConfig currently removes the
onboardBlocks field from the wire format causing older serialized buffers to be
parsed incorrectly; update deserializeKvCacheConfig (and the corresponding
serialization) to preserve compatibility by either: 1) reading a reserved bool
placeholder (e.g., read a dummy "onboardBlocks" bool value) before hostCacheSize
so the field order matches older binaries, or 2) introduce and read an explicit
serialization version discriminator at the start of the KvCacheConfig wire
format and branch deserialization accordingly; ensure the KvCacheConfig
constructor ordering and the sequence of su::deserialize calls (including
onboardBlocks or a version check) match the writer so older persisted configs
remain correctly parsed.
- Around line 1-16: Update the SPDX header year range in the file's top comment
block from "2025" to include 2026 (e.g., change "Copyright (c) 2025 NVIDIA
CORPORATION & AFFILIATES" to "Copyright (c) 2025-2026 NVIDIA CORPORATION &
AFFILIATES" or similar) so the header in serialization.cpp reflects the latest
modification year; preserve the SPDX-License-Identifier and surrounding license
text exactly while only adjusting the year in the comment block.
In `@cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp`:
- Around line 1-15: Update the SPDX header years in the file header of
kvCacheManager.cpp to include 2026 (e.g., change "2022-2025" to "2022-2026" or
the appropriate range), ensuring the SPDX-FileCopyrightText line and any
copyright-year occurrences reflect the new modification year; locate the
top-of-file comment block containing SPDX-FileCopyrightText and
SPDX-License-Identifier and update the end year accordingly.
- Around line 525-541: Update the file header copyright range to "2022-2026" at
the top of this new file; locate the header comment block near the top of
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (above the
nb::class_<tbk::KVCacheManager,...> binding and includes) and change the year
range accordingly so the header reflects the 2026 modification.
In `@cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp`:
- Around line 1-15: The file header's copyright range in the
SPDX-FileCopyrightText line currently reads "2022-2025"; update that range to
include 2026 (e.g., "2022-2026") and ensure the SPDX-License-Identifier and
license block remain unchanged so the header matches the repo guideline; locate
the header by the SPDX-FileCopyrightText and SPDX-License-Identifier lines at
the top of executorConfig.cpp and only change the year range.
In `@cpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cpp`:
- Line 2: Update the SPDX copyright header year in the modified test files from
2025 to 2026: edit the top-of-file header in capacitySchedulerTest.cpp and also
make the same change in cacheTransBufferTest.cpp and cacheTransceiverTest.cpp so
the SPDX-FileCopyrightText year range ends with 2026.
In `@cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp`:
- Line 2: Update the copyright header in the modified test file by changing the
year range in the SPDX header string from "2023-2025" to "2023-2026" in
cpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cpp; locate the line
containing "SPDX-FileCopyrightText: Copyright (c) 2023-2025 NVIDIA CORPORATION &
AFFILIATES." and replace 2025 with 2026 so the file header reflects the
modification year.
In `@cpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cpp`:
- Line 2: Update the file header's copyright year range by replacing the
existing "2023-2025" occurrence in the SPDX header line (the line containing
"SPDX-FileCopyrightText") with "2023-2026" so the header reflects the 2026
modification.
In `@cpp/tests/unit_tests/executor/agentCommTest.cpp`:
- Around line 1-16: Update the SPDX header year range to include 2026 by
changing the "SPDX-FileCopyrightText" line that currently reads "Copyright (c)
2023-2025 NVIDIA CORPORATION & AFFILIATES." to include 2026 (e.g., "2023-2026"),
and ensure any other occurrences of the 2023-2025 year range in the file
header/comments are updated similarly so the SPDX and license header reflect the
latest modification year.
In `@triton_backend/inflight_batcher_llm/src/model_instance_state.cc`:
- Line 1: Update the copyright header at the top of the file by changing the
year from 2024 to 2026; locate the top-of-file copyright comment (the initial
comment line starting with "Copyright 2024, NVIDIA CORPORATION & AFFILIATES")
and replace "2024" with "2026" so the header reflects the latest modification
year.
---
Nitpick comments:
In `@cpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpp`:
- Around line 223-226: Annotate the trailing arguments in the KVCacheManager
constructor call to make the boolean/pointer semantics explicit: replace the
tail "std::nullopt, nullptr, true" with inline C-style comments naming each
parameter (e.g., "std::nullopt /* optional_param_name = */, nullptr /*
allocator_or_callback = */, true /* enable_something = */") matching the style
used in AsymmetricalCacheTest::setUpCacheManager so the intent of these last
parameters is clear and robust to signature churn.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0d50f1d8-c5be-4370-99f5-59ba4c0eff5d
📒 Files selected for processing (31)
benchmarks/cpp/disaggServerBenchmark.cppbenchmarks/cpp/gptManagerBenchmark.cppbenchmarks/cpp/utils/utils.hcpp/include/tensorrt_llm/batch_manager/kvCacheManager.hcpp/include/tensorrt_llm/executor/executor.hcpp/tensorrt_llm/batch_manager/kvCacheManager.cppcpp/tensorrt_llm/batch_manager/trtGptModelInflightBatching.cppcpp/tensorrt_llm/executor/kvCacheConfig.cppcpp/tensorrt_llm/executor/serialization.cppcpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cppcpp/tensorrt_llm/nanobind/executor/executorConfig.cppcpp/tests/unit_tests/batch_manager/cacheTransBufferTest.cppcpp/tests/unit_tests/batch_manager/capacitySchedulerTest.cppcpp/tests/unit_tests/batch_manager/kvCacheManagerTest.cppcpp/tests/unit_tests/batch_manager/kvCacheUtilsTest.cppcpp/tests/unit_tests/executor/agentCommTest.cppcpp/tests/unit_tests/executor/serializeUtilsTest.cppcpp/tests/unit_tests/multi_gpu/cacheTransceiverTest.cpptensorrt_llm/_torch/pyexecutor/resource_manager.pytensorrt_llm/llmapi/llm_args.pytests/unittest/_torch/executor/test_resource_manager.pytests/unittest/bindings/test_bindings_ut.pytests/unittest/bindings/test_executor_bindings.pytests/unittest/disaggregated/test_extractor.pytests/unittest/disaggregated/test_kv_transfer.pytests/unittest/llmapi/test_llm_args.pytests/unittest/llmapi/test_llm_kv_cache_events.pytriton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.pytriton_backend/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxttriton_backend/all_models/tests/test_python_backend.pytriton_backend/inflight_batcher_llm/src/model_instance_state.cc
💤 Files with no reviewable changes (14)
- tests/unittest/disaggregated/test_extractor.py
- benchmarks/cpp/utils/utils.h
- triton_backend/all_models/tests/test_python_backend.py
- tests/unittest/bindings/test_bindings_ut.py
- tests/unittest/llmapi/test_llm_kv_cache_events.py
- tests/unittest/bindings/test_executor_bindings.py
- triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt
- cpp/tests/unit_tests/executor/serializeUtilsTest.cpp
- tests/unittest/_torch/executor/test_resource_manager.py
- tensorrt_llm/_torch/pyexecutor/resource_manager.py
- tests/unittest/disaggregated/test_kv_transfer.py
- tensorrt_llm/llmapi/llm_args.py
- triton_backend/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py
- tests/unittest/llmapi/test_llm_args.py
|
PR_Github #39920 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #39955 [ run ] triggered by Bot. Commit: |
|
PR_Github #39955 [ run ] completed with state
|
7324262 to
43faad7
Compare
|
/bot run |
|
PR_Github #40036 [ run ] triggered by Bot. Commit: |
|
PR_Github #40036 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
43faad7 to
01ee2af
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #40079 [ run ] triggered by Bot. Commit: |
|
PR_Github #40081 [ run ] triggered by Bot. Commit: |
|
PR_Github #40081 [ run ] completed with state
|
01ee2af to
7cea747
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #40262 [ run ] triggered by Bot. Commit: |
|
PR_Github #40262 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40830 [ run ] triggered by Bot. Commit: |
|
PR_Github #40830 [ run ] completed with state
|
a7010dc to
b341e8d
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #42112 [ run ] triggered by Bot. Commit: |
|
PR_Github #42112 [ run ] completed with state
|
b341e8d to
67b87cf
Compare
Dead code elimination. The secondary block pool is derived when kv_cache_config::host_cache_size is specified. Whether we onboard/offload a kv cache block can be implicated from whether the manager has secondary block or not. The `onboardBlocks` toggle itself only adds complication. This commit removes it. Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
67b87cf to
19639ed
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #42247 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #42425 [ run ] triggered by Bot. Commit: |
|
PR_Github #42425 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #42627 [ run ] triggered by Bot. Commit: |
|
PR_Github #42627 [ run ] completed with state |
thorjohnsen
left a comment
There was a problem hiding this comment.
This field isn't entirely useless, it allows models that wouldn't otherwise fit to run on dinky hardware since KV cache can reside in CPU memory. In practice the performance penalty of doing this was so severe that nobody used it. We can let it go now.
|
llm_args and backend changes lgtm. |
…A#12449) Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com>
…A#12449) Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com> Signed-off-by: alyosha-swamy <raghav@arcee.ai>
…A#12449) Signed-off-by: Yueh-Ting Chen <yueh.ting.chen@gmail.com> Signed-off-by: alyosha-swamy <raghav@arcee.ai>
Description
This MR has no functional change intended.
Dead code elimination. The secondary block pool is derived when
kv_cache_config::host_cache_sizeis specified. Whether we onboard/offload a kv cache block can be implicated from whether the manager has secondary block or not. TheonboardBlockstoggle itself only adds complication. This commit removes it.Follows up on #7469, rebased onto current main.
Changes
onboardBlocks/onboard_blocksparameter fromKvCacheConfig,BlockManager,WindowBlockManager, andKVCacheManagerconstructorsmOnboardBlocksmember variable and associated getter/setterTest Coverage
Since no functional change is intended, existing tests are updated to remove the parameter. No new test logic is needed.
PR Checklist
Summary by CodeRabbit
Release Notes
onboard_blocksconfiguration parameter from KV cache settings across all benchmarks, bindings, and configuration interfaces. The system now automatically handles KV cache block onboarding behavior without requiring explicit configuration. This simplifies the KV cache setup while maintaining core functionality.