[https://nvbugs/5911143][fix] add async worker to MTP/Eagle3 sampler,…#11573
Conversation
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughIntroduces a pinned memory policy enforcement mechanism comprising a pre-commit hook configuration, a static analysis checker script that detects direct pinned memory API usage, new utility functions for conditional memory pinning based on confidential compute state, and systematic replacement of hard-coded pinned memory directives throughout the codebase. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (17)
tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate SPDX copyright year range to include 2026.
Line 1 still ends at 2024.🔧 Suggested update
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py` at line 1, Update the SPDX copyright header in tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py to include 2026 in the year range (e.g., change "2022-2024" to "2022-2026") so the file header complies with the repository's copyright/year-update guideline; locate the top-of-file SPDX comment and modify only the year range string.tensorrt_llm/_torch/attention_backend/sparse/rocket.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import mathAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/attention_backend/sparse/rocket.py` at line 1, Add the required NVIDIA Apache-2.0 copyright header to the top of tensorrt_llm/_torch/attention_backend/sparse/rocket.py (before any imports such as the existing "import math"); if this is a modified file update the copyright year accordingly and ensure the header matches the project's standard NVIDIA Apache-2.0 template.tensorrt_llm/_torch/models/modeling_clip.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from typing import Dict, Optional, Tuple, UnionAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_clip.py` at line 1, Add the required NVIDIA Apache-2.0 copyright header to the top of the file tensorrt_llm._torch.models.modeling_clip (modeling_clip.py); insert the standard NVIDIA header block before any imports and ensure the copyright year is updated to the current year if this is a modified file so the file begins with the full license/header followed by the existing "from typing ..." import line.tensorrt_llm/_torch/pyexecutor/sampling_utils.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate copyright year to 2026.
Line 1 still shows 2025 even though this file is modified in 2026.🔧 Suggested update
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/sampling_utils.py` at line 1, Update the file header copyright year from 2025 to 2026: modify the top-of-file copyright line in tensorrt_llm/_torch/pyexecutor/sampling_utils.py so it reads 2026 (ensuring the NVIDIA copyright header is present and reflects the current year).tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate SPDX copyright year range to include 2026.
Line 1 still ends at 2025.🔧 Suggested update
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py` at line 1, Update the SPDX copyright header in tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py by extending the year range to include 2026: modify the existing SPDX-FileCopyrightText line (the file header string "SPDX-FileCopyrightText") so the copyright year range ends with 2026 instead of 2025.tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import base64As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py` at line 1, This file (calibrator.py) is missing the required NVIDIA Apache‑2.0 copyright header; add the standard NVIDIA Apache‑2.0 header block at the very top of the file (before the first import such as import base64), ensuring it includes the correct copyright owner and updated year for modified files and matches the project's canonical header text.tensorrt_llm/_torch/attention_backend/sparse/dsa.py (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import mathAs per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/attention_backend/sparse/dsa.py` at line 1, This file (module tensorrt_llm._torch.attention_backend.sparse.dsa) is missing the required NVIDIA Apache-2.0 header; add the standard NVIDIA Apache‑2.0 license header block at the very top of the file (before any imports), include the correct copyright line and year (update the year if this is a modification), and ensure the header text matches the other repository files' exact wording/format.tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py (1)
1-1:⚠️ Potential issue | 🟡 MinorUpdate copyright year to 2026.
Line 1 still shows 2025 even though this file is modified in 2026.🔧 Suggested update
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. +# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py` at line 1, Update the file header comment string "# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved." to use the current year 2026 so the top-of-file NVIDIA copyright line reflects the modification year (i.e., change "2025" to "2026"); ensure the header remains exactly in the same format as the existing NVIDIA copyright header.tensorrt_llm/_torch/models/modeling_siglip.py (1)
1-12:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache‑2.0 header for this modified Python file.
This file is missing the required NVIDIA copyright header; please add the standard Apache‑2.0 header (with the correct start year and updated end year to 2026) at the top.
📄 Proposed header insertion
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from typing import Dict, Optional, TupleAs per coding guidelines, “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_siglip.py` around lines 1 - 12, Add the standard NVIDIA Apache-2.0 copyright header at the very top of tensorrt_llm/_torch/models/modeling_siglip.py (above all imports), using the correct original start year and updating the end year to 2026; ensure the header follows the Apache License 2.0 text format used across the repo and does not modify existing imports or symbols such as SiglipVisionConfig, SiglipVisionEmbeddings, or use_pinned_memory.tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)
1-32:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache‑2.0 header for this modified Python file.
This file is missing the required NVIDIA copyright header; please add the standard Apache‑2.0 header (with the correct start year and updated end year to 2026) at the top.
📄 Proposed header insertion
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + import copyAs per coding guidelines, “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py` around lines 1 - 32, This file is missing the required NVIDIA Apache-2.0 copyright header; add the standard NVIDIA Apache-2.0 license header (with the correct start year and updated end year 2026) as the very first lines of tensorrt_llm/_torch/models/modeling_qwen2vl.py so it precedes all imports and code (e.g., before the existing imports like "import copy" and symbols such as Qwen2_5_VisionPatchEmbed, Qwen2VisionTransformerPretrainedModel, Attention, Linear, RMSNorm); ensure the header exactly matches the project's canonical Apache-2.0 NVIDIA header format and includes the correct copyright years.tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py (1)
1-8:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache‑2.0 header for this modified Python file.
This file is missing the required NVIDIA copyright header; please add the standard Apache‑2.0 header (with the correct start year and updated end year to 2026) at the top.
📄 Proposed header insertion
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from collections import namedtupleAs per coding guidelines, “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py` around lines 1 - 8, Add the standard NVIDIA Apache-2.0 copyright header at the very top of cuda_graph_lora_params.py (before any imports like "from collections import namedtuple"), using the Apache‑2.0 template and setting the copyright years to the original start year through 2026 (e.g., "2019-2026" or the correct start year for this file); ensure the header text and license notice exactly match the project's canonical NVIDIA header format.tensorrt_llm/_torch/speculative/spec_tree_manager.py (1)
1-2:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache-2.0 header for 2026.
This modified file starts directly with imports, so the required NVIDIA copyright header is missing.
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/speculative/spec_tree_manager.py` around lines 1 - 2, This file is missing the required NVIDIA Apache-2.0 copyright header for 2026; add the standard NVIDIA Apache License 2.0 header (with year 2026 and appropriate copyright holder) at the very top of tensorrt_llm._torch.speculative.spec_tree_manager.py before any imports (i.e., above the existing import math / from typing import List) so the file complies with the project's licensing header policy.tensorrt_llm/_torch/models/modeling_qwen3vl.py (1)
1-3:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache-2.0 header for 2026.
This modified file starts directly with imports, so the required NVIDIA copyright header is missing.
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py` around lines 1 - 3, This file (modeling_qwen3vl.py) is missing the required NVIDIA Apache-2.0 copyright header for 2026; add the standard NVIDIA Apache-2.0 license header (with year 2026 and the Apache License 2.0 boilerplate) at the very top of the file before any imports (above the existing import block that begins with "import copy"); ensure the header matches the project's canonical NVIDIA header format and includes the correct year and license link.tensorrt_llm/_torch/attention_backend/interface.py (1)
1-2:⚠️ Potential issue | 🟠 MajorAdd the NVIDIA Apache-2.0 header for 2026.
This modified file starts directly with imports, so the required NVIDIA copyright header is missing.
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/attention_backend/interface.py` around lines 1 - 2, Add the NVIDIA Apache-2.0 copyright header (with year 2026) to the very top of the file so it precedes the existing imports; insert the standard NVIDIA Apache-2.0 license block before the first lines "import copy" and "import weakref" in tensorrt_llm/_torch/attention_backend/interface.py, ensuring the header matches the project's Apache License 2.0 format and includes the correct copyright owner and year.triton_backend/all_models/multimodal/multimodal_encoders/1/model.py (1)
1-25:⚠️ Potential issue | 🟠 MajorUpdate the file header to Apache 2.0 format.
The current BSD‑style header doesn’t match the required Apache 2.0 template for source files.As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@triton_backend/all_models/multimodal/multimodal_encoders/1/model.py` around lines 1 - 25, Replace the existing BSD‑style copyright header at the top of the file with the NVIDIA Apache License 2.0 header template, update the copyright year to 2026, include the appropriate SPDX identifier (SPDX-License-Identifier: Apache-2.0) and the standard Apache 2.0 notice and URL; ensure this new header fully replaces the current block that begins with the copyright notice and ends before code begins so the file uses the required Apache 2.0 format.tensorrt_llm/_torch/speculative/mtp.py (1)
1-10:⚠️ Potential issue | 🟠 MajorAdd/update NVIDIA copyright header in this source file.
This file appears to start immediately with imports (Line 1) and lacks the required NVIDIA Apache 2.0 header with the latest modification year. Please add/update the header here (and any other modified source files that are missing it).
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/speculative/mtp.py` around lines 1 - 10, This source file is missing the required NVIDIA Apache-2.0 copyright header; add/update a standard NVIDIA Apache 2.0 header (with the latest meaningful modification year and "NVIDIA CORPORATION" copyright line) at the very top of the file before any imports in the module containing MambaHybridCacheManager and use_pinned_memory (tensorrt_llm._torch.speculative.mtp), and apply the same header to any other modified source files lacking it.tensorrt_llm/runtime/model_runner_cpp.py (1)
853-858:⚠️ Potential issue | 🟡 MinorUpdate comment — "MUST be pinned" contradicts conditional pinning when CC is enabled.
The inline comment still says host memory must be page-locked for H2D/D2H transfers, but
maybe_pin_memorydeliberately skips pinning when confidential compute is active (all transfers become synchronous in that mode, peruse_pinned_memory()'s docstring). The word "MUST" is now misleading.✏️ Suggested comment update
# CUDA Stream Overlapping Requirements: # 1. Both memory copy stream and kernel execution stream must be non-default streams - # 2. For host<->device transfers (H2D/D2H), host memory MUST be page-locked (pinned) + # 2. For host<->device transfers (H2D/D2H), host memory should be page-locked (pinned) + # when confidential compute is not active; pinning is skipped under CC because all + # H2D/D2H copies become synchronous in that mode regardless. prompt_table_data = maybe_pin_memory(🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/runtime/model_runner_cpp.py` around lines 853 - 858, The comment incorrectly states host memory "MUST be page-locked" even though maybe_pin_memory(_prepare_embedding_table(...)) conditionally skips pinning when confidential compute is enabled; update the inline comment near mm_embedding_offloading to say that host memory should be page-locked for async H2D/D2H transfers unless use_pinned_memory() indicates confidential compute/synchronous transfers, and reference maybe_pin_memory, use_pinned_memory, mm_embedding_offloading and _prepare_embedding_table so readers understand pinning is conditional.
🧹 Nitpick comments (1)
scripts/check_pinned_memory_usage.py (1)
52-52:endswithpath check is slightly fragile.
path.as_posix().endswith("tensorrt_llm/_utils.py")would also match a hypotheticalother_package/tensorrt_llm/_utils.py. In a pre-commit context the paths are always repository-relative so this is unlikely to matter, but a more explicit check (e.g. comparing the last two path components) would be more robust.🔧 Optional hardening
- allow_direct_pin_memory = path.as_posix().endswith("tensorrt_llm/_utils.py") + parts = path.parts + allow_direct_pin_memory = ( + len(parts) >= 2 + and parts[-2] == "tensorrt_llm" + and parts[-1] == "_utils.py" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_pinned_memory_usage.py` at line 52, The current fragile check uses path.as_posix().endswith("tensorrt_llm/_utils.py") to set allow_direct_pin_memory; replace it with a robust check that compares the last two path components instead (e.g. verify path.parent.name == "tensorrt_llm" and path.name == "_utils.py" or compare path.parts[-2:] to ("tensorrt_llm","_utils.py")) so only the exact repository-relative module triggers allow_direct_pin_memory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check_pinned_memory_usage.py`:
- Around line 15-38: In visit_Call, the exemption flag allow_direct_pin_memory
is only applied to the .pin_memory attribute-call branch but not to the
pin_memory=True keyword check; modify the keyword branch in visit_Call to skip
appending the violation when self.allow_direct_pin_memory is True (same
conditional used for the attribute case) so that when allow_direct_pin_memory is
set the function/method call (visit_Call) ignores both direct `.pin_memory()`
and `pin_memory=True` keyword usages; update the check around the keyword.arg ==
"pin_memory" block in visit_Call to include a guard referencing
self.allow_direct_pin_memory before calling self.violations.append so both cases
are symmetric.
In `@tensorrt_llm/_torch/attention_backend/interface.py`:
- Line 17: The import currently uses "from tensorrt_llm._utils import
maybe_pin_memory"; change it to import the module (e.g., "from tensorrt_llm
import _utils") and update all usages of maybe_pin_memory in this file
(interface.py) to be qualified as _utils.maybe_pin_memory so the _utils
namespace is preserved consistent with the project's import guideline.
In `@tensorrt_llm/_torch/attention_backend/trtllm.py`:
- Around line 16-17: This file (tensorrt_llm._torch.attention_backend.trtllm,
near the top where imports like get_sm_version, maybe_pin_memory,
use_pinned_memory are declared) is missing the NVIDIA Apache-2.0 header; add the
standard SPDX/Apache-2.0 header block (including "Copyright (c) 2026 NVIDIA
CORPORATION" and the SPDX-License-Identifier: Apache-2.0) at the very top of the
file before any imports or code, matching the project's canonical header format.
In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Line 20: The import added pulls symbols directly from _utils; change it to
import the module (e.g., import ..._utils as _utils) and update all usages of
nvtx_range, nvtx_range_debug, and use_pinned_memory in this file to be qualified
(e.g., _utils.nvtx_range, _utils.nvtx_range_debug, _utils.use_pinned_memory) so
the _utils namespace is preserved per the guideline.
In `@tensorrt_llm/_torch/models/modeling_radio.py`:
- Line 25: Replace the short copyright line at the top of
tensorrt_llm/_torch/models/modeling_radio.py with the standard NVIDIA Apache-2.0
header updated to year 2026; open the file containing the import
use_pinned_memory and ensure the full SPDX/Apache-2.0 header appears before any
imports or code, matching the project's canonical NVIDIA header format and
including the SPDX identifier and Apache-2.0 license text reference.
In `@tensorrt_llm/_torch/pyexecutor/py_executor.py`:
- Line 616: Remove the unnecessary f-string prefix on the logger call in the
async worker startup message: locate the logger.info call that logs "Starting
the async worker for sampler D2H copies" (used in the async worker for sampler
D2H copies) and change the string literal to a plain regular string (remove the
leading f) so it isn't an f-string with no placeholders.
- Around line 613-617: The async-worker start block should be moved inside the
start_worker() initialization guard so the sampler async worker is started only
when the main worker is first started: inside the existing "if not
self.worker_started:" block, after the main worker startup code, check
"isinstance(self.sampler, AsyncWorkerMixin) and
self.sampler.async_worker_enabled()" and call self.sampler.async_worker_start();
remove the current async-worker block that now sits outside the guard to avoid
repeated calls; keep references to sampler, AsyncWorkerMixin,
async_worker_start(), async_worker_enabled(), and worker_started to locate the
change.
In `@tensorrt_llm/_torch/pyexecutor/sampler.py`:
- Around line 33-39: The current import in sampler.py pulls individual symbols
from tensorrt_llm._utils; change it to import the module itself (import
tensorrt_llm._utils as _utils) and update all usages of maybe_pin_memory,
mpi_disabled, nvtx_range, torch_dtype_to_binding, and use_pinned_memory to be
qualified (e.g., _utils.maybe_pin_memory, _utils.mpi_disabled,
_utils.nvtx_range, _utils.torch_dtype_to_binding, _utils.use_pinned_memory) so
the _utils namespace is preserved per project guidelines.
In `@tensorrt_llm/_torch/speculative/model_drafter.py`:
- Line 8: Add the NVIDIA Apache-2.0 license header to the top of
tensorrt_llm/_torch/speculative/model_drafter.py (above the existing imports),
using the standard Apache-2.0 SPDX header and set the copyright year to 2026;
ensure the header includes the SPDX-License-Identifier: Apache-2.0 line and the
NVIDIA copyright notice, then leave the existing import line "from
tensorrt_llm._utils import nvtx_range, use_pinned_memory" unchanged below the
header.
In `@tensorrt_llm/_torch/speculative/spec_tree_manager.py`:
- Line 6: Replace the direct symbol import with a module import and qualify its
uses: change the import of use_pinned_memory to import the module (e.g., import
tensorrt_llm._utils as _utils or from tensorrt_llm import _utils) and update all
references to use_pinned_memory in spec_tree_manager.py to
_utils.use_pinned_memory so the _utils namespace is preserved (look for
occurrences of use_pinned_memory in this file and adjust them).
In `@tensorrt_llm/_utils.py`:
- Around line 1261-1265: The logger.error call in the ImportError handler uses
an unnecessary f-string prefix which triggers Ruff F541; update the ImportError
block around the import of pynvml so the logger.error call uses a normal string
(remove the leading f in "f\"pynvml not available; assuming CC=off\"") while
preserving the message and the surrounding try/except that returns False.
- Around line 1317-1320: Add a Google-style docstring to the public utility
function maybe_pin_memory explaining its purpose, parameters, return value, and
behavior: describe that it takes a torch.Tensor (parameter name: tensor), pins
and returns the tensor if use_pinned_memory() is True, otherwise returns the
original tensor, and mention the return type torch.Tensor and any side effects
(pinning memory). Place this docstring immediately above the maybe_pin_memory
function definition and follow Google docstring sections (Args, Returns, and
optionally Raises if relevant).
In `@triton_backend/all_models/multimodal/multimodal_encoders/1/model.py`:
- Line 1: Add a Unix shebang to the top of the executable Python file or remove
its executable bit: open the file multimodal_encoders/1/model.py and either
insert a shebang line like #!/usr/bin/env python3 as the very first line, or
change the file permissions to remove the execute flag (chmod a-x) so it is not
treated as an executable; ensure the change is committed so static analysis
EXE002 is resolved.
---
Outside diff comments:
In `@tensorrt_llm/_torch/attention_backend/interface.py`:
- Around line 1-2: Add the NVIDIA Apache-2.0 copyright header (with year 2026)
to the very top of the file so it precedes the existing imports; insert the
standard NVIDIA Apache-2.0 license block before the first lines "import copy"
and "import weakref" in tensorrt_llm/_torch/attention_backend/interface.py,
ensuring the header matches the project's Apache License 2.0 format and includes
the correct copyright owner and year.
In `@tensorrt_llm/_torch/attention_backend/sparse/dsa.py`:
- Line 1: This file (module tensorrt_llm._torch.attention_backend.sparse.dsa) is
missing the required NVIDIA Apache-2.0 header; add the standard NVIDIA
Apache‑2.0 license header block at the very top of the file (before any
imports), include the correct copyright line and year (update the year if this
is a modification), and ensure the header text matches the other repository
files' exact wording/format.
In `@tensorrt_llm/_torch/attention_backend/sparse/rocket.py`:
- Line 1: Add the required NVIDIA Apache-2.0 copyright header to the top of
tensorrt_llm/_torch/attention_backend/sparse/rocket.py (before any imports such
as the existing "import math"); if this is a modified file update the copyright
year accordingly and ensure the header matches the project's standard NVIDIA
Apache-2.0 template.
In `@tensorrt_llm/_torch/models/modeling_clip.py`:
- Line 1: Add the required NVIDIA Apache-2.0 copyright header to the top of the
file tensorrt_llm._torch.models.modeling_clip (modeling_clip.py); insert the
standard NVIDIA header block before any imports and ensure the copyright year is
updated to the current year if this is a modified file so the file begins with
the full license/header followed by the existing "from typing ..." import line.
In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py`:
- Around line 1-32: This file is missing the required NVIDIA Apache-2.0
copyright header; add the standard NVIDIA Apache-2.0 license header (with the
correct start year and updated end year 2026) as the very first lines of
tensorrt_llm/_torch/models/modeling_qwen2vl.py so it precedes all imports and
code (e.g., before the existing imports like "import copy" and symbols such as
Qwen2_5_VisionPatchEmbed, Qwen2VisionTransformerPretrainedModel, Attention,
Linear, RMSNorm); ensure the header exactly matches the project's canonical
Apache-2.0 NVIDIA header format and includes the correct copyright years.
In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Around line 1-3: This file (modeling_qwen3vl.py) is missing the required
NVIDIA Apache-2.0 copyright header for 2026; add the standard NVIDIA Apache-2.0
license header (with year 2026 and the Apache License 2.0 boilerplate) at the
very top of the file before any imports (above the existing import block that
begins with "import copy"); ensure the header matches the project's canonical
NVIDIA header format and includes the correct year and license link.
In `@tensorrt_llm/_torch/models/modeling_siglip.py`:
- Around line 1-12: Add the standard NVIDIA Apache-2.0 copyright header at the
very top of tensorrt_llm/_torch/models/modeling_siglip.py (above all imports),
using the correct original start year and updating the end year to 2026; ensure
the header follows the Apache License 2.0 text format used across the repo and
does not modify existing imports or symbols such as SiglipVisionConfig,
SiglipVisionEmbeddings, or use_pinned_memory.
In `@tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py`:
- Line 1: Update the SPDX copyright header in
tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py to include 2026 in the year
range (e.g., change "2022-2024" to "2022-2026") so the file header complies with
the repository's copyright/year-update guideline; locate the top-of-file SPDX
comment and modify only the year range string.
In `@tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py`:
- Around line 1-8: Add the standard NVIDIA Apache-2.0 copyright header at the
very top of cuda_graph_lora_params.py (before any imports like "from collections
import namedtuple"), using the Apache‑2.0 template and setting the copyright
years to the original start year through 2026 (e.g., "2019-2026" or the correct
start year for this file); ensure the header text and license notice exactly
match the project's canonical NVIDIA header format.
In `@tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py`:
- Line 1: Update the SPDX copyright header in
tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py by extending the year
range to include 2026: modify the existing SPDX-FileCopyrightText line (the file
header string "SPDX-FileCopyrightText") so the copyright year range ends with
2026 instead of 2025.
In `@tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py`:
- Line 1: Update the file header comment string "# Copyright (c) 2025, NVIDIA
CORPORATION. All rights reserved." to use the current year 2026 so the
top-of-file NVIDIA copyright line reflects the modification year (i.e., change
"2025" to "2026"); ensure the header remains exactly in the same format as the
existing NVIDIA copyright header.
In `@tensorrt_llm/_torch/pyexecutor/sampling_utils.py`:
- Line 1: Update the file header copyright year from 2025 to 2026: modify the
top-of-file copyright line in tensorrt_llm/_torch/pyexecutor/sampling_utils.py
so it reads 2026 (ensuring the NVIDIA copyright header is present and reflects
the current year).
In `@tensorrt_llm/_torch/speculative/mtp.py`:
- Around line 1-10: This source file is missing the required NVIDIA Apache-2.0
copyright header; add/update a standard NVIDIA Apache 2.0 header (with the
latest meaningful modification year and "NVIDIA CORPORATION" copyright line) at
the very top of the file before any imports in the module containing
MambaHybridCacheManager and use_pinned_memory
(tensorrt_llm._torch.speculative.mtp), and apply the same header to any other
modified source files lacking it.
In `@tensorrt_llm/_torch/speculative/spec_tree_manager.py`:
- Around line 1-2: This file is missing the required NVIDIA Apache-2.0 copyright
header for 2026; add the standard NVIDIA Apache License 2.0 header (with year
2026 and appropriate copyright holder) at the very top of
tensorrt_llm._torch.speculative.spec_tree_manager.py before any imports (i.e.,
above the existing import math / from typing import List) so the file complies
with the project's licensing header policy.
In `@tensorrt_llm/runtime/model_runner_cpp.py`:
- Around line 853-858: The comment incorrectly states host memory "MUST be
page-locked" even though maybe_pin_memory(_prepare_embedding_table(...))
conditionally skips pinning when confidential compute is enabled; update the
inline comment near mm_embedding_offloading to say that host memory should be
page-locked for async H2D/D2H transfers unless use_pinned_memory() indicates
confidential compute/synchronous transfers, and reference maybe_pin_memory,
use_pinned_memory, mm_embedding_offloading and _prepare_embedding_table so
readers understand pinning is conditional.
In `@tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py`:
- Line 1: This file (calibrator.py) is missing the required NVIDIA Apache‑2.0
copyright header; add the standard NVIDIA Apache‑2.0 header block at the very
top of the file (before the first import such as import base64), ensuring it
includes the correct copyright owner and updated year for modified files and
matches the project's canonical header text.
In `@triton_backend/all_models/multimodal/multimodal_encoders/1/model.py`:
- Around line 1-25: Replace the existing BSD‑style copyright header at the top
of the file with the NVIDIA Apache License 2.0 header template, update the
copyright year to 2026, include the appropriate SPDX identifier
(SPDX-License-Identifier: Apache-2.0) and the standard Apache 2.0 notice and
URL; ensure this new header fully replaces the current block that begins with
the copyright notice and ends before code begins so the file uses the required
Apache 2.0 format.
---
Nitpick comments:
In `@scripts/check_pinned_memory_usage.py`:
- Line 52: The current fragile check uses
path.as_posix().endswith("tensorrt_llm/_utils.py") to set
allow_direct_pin_memory; replace it with a robust check that compares the last
two path components instead (e.g. verify path.parent.name == "tensorrt_llm" and
path.name == "_utils.py" or compare path.parts[-2:] to
("tensorrt_llm","_utils.py")) so only the exact repository-relative module
triggers allow_direct_pin_memory.
triton_backend/all_models/multimodal/multimodal_encoders/1/model.py
Outdated
Show resolved
Hide resolved
f091fc2 to
fd4b519
Compare
… an exception Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
…fy intent Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
…rrt_llm/runtime/multimodal_model_runner.py Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
9bc6c55 to
527a7ea
Compare
|
PR_Github #36539 [ run ] completed with state
|
chzblych
left a comment
There was a problem hiding this comment.
Approved for pre-commit checks.
|
/bot run --disable-fail-fast |
|
PR_Github #36592 [ run ] triggered by Bot. Commit: |
|
PR_Github #36592 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #36652 [ run ] triggered by Bot. Commit: |
|
PR_Github #36652 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #36692 [ run ] triggered by Bot. Commit: |
|
PR_Github #36692 [ run ] completed with state |
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #36803 [ run ] triggered by Bot. Commit: |
|
PR_Github #36803 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #36812 [ run ] triggered by Bot. Commit: |
|
PR_Github #36812 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #36939 [ run ] triggered by Bot. Commit: |
|
PR_Github #36939 [ run ] completed with state |
NVIDIA#11573) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
… fix confidential_compute_enabled(), only pin memory when CC=off
Summary by CodeRabbit
New Features
Chores
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.