Skip to content

[https://nvbugs/5911143][fix] add async worker to MTP/Eagle3 sampler,…#11573

Merged
pcastonguay merged 7 commits intoNVIDIA:mainfrom
dhansen-nvidia:confidential_compute_fixes
Feb 26, 2026
Merged

[https://nvbugs/5911143][fix] add async worker to MTP/Eagle3 sampler,…#11573
pcastonguay merged 7 commits intoNVIDIA:mainfrom
dhansen-nvidia:confidential_compute_fixes

Conversation

@dhansen-nvidia
Copy link
Collaborator

@dhansen-nvidia dhansen-nvidia commented Feb 18, 2026

… fix confidential_compute_enabled(), only pin memory when CC=off

Summary by CodeRabbit

  • New Features

    • Introduced configurable pinned memory policies for CPU-GPU data transfers, enabling runtime optimization based on system configuration.
    • Added automated enforcement of memory pinning best practices through pre-commit validation.
  • Chores

    • Standardized memory allocation patterns across the codebase to use unified memory pinning utilities.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

📝 Walkthrough

Walkthrough

Introduces a pinned memory policy enforcement mechanism comprising a pre-commit hook configuration, a static analysis checker script that detects direct pinned memory API usage, new utility functions for conditional memory pinning based on confidential compute state, and systematic replacement of hard-coded pinned memory directives throughout the codebase.

Changes

Cohort / File(s) Summary
Pre-commit Configuration & Checker
.pre-commit-config.yaml, scripts/check_pinned_memory_usage.py
Adds pre-commit hook to enforce pinned memory policy via static analysis. Introduces PinnedMemoryUsageChecker class and CLI entry point to detect direct .pin_memory() and pin_memory=True usage, reporting violations with line numbers.
Memory Pinning Utilities
tensorrt_llm/_utils.py
Adds use_pinned_memory() and maybe_pin_memory() functions for conditional memory pinning based on confidential compute state. Enhances confidential_compute_enabled() with defensive pynvml import handling and expanded condition checks using ctypes byref.
Attention Backend & Interface
tensorrt_llm/_torch/attention_backend/interface.py, tensorrt_llm/_torch/attention_backend/sparse/dsa.py, tensorrt_llm/_torch/attention_backend/sparse/rocket.py, tensorrt_llm/_torch/attention_backend/trtllm.py
Replaces hard-coded pin_memory=True with pin_memory=use_pinned_memory() or direct .pin_memory() calls with maybe_pin_memory() across DSA, rocket, and main attention backend implementations for seq_lens and buffer allocations.
Auto-Deploy & Model Components
tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py, tensorrt_llm/_torch/models/modeling_*.py (CLIP, Qwen2VL, Qwen3VL, RADIO, SigLIP)
Updates InputBuffer allocations and prepare_attn_metadata methods to use pin_memory=use_pinned_memory() instead of hard-coded True across multiple vision and multimodal models.
LoRA & Mamba
tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py, tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py
Replaces pin_memory=True with pin_memory=use_pinned_memory() for state_indices_cpu and host tensor allocations in mamba cache and LoRA parameter initialization.
PyExecutor Components
tensorrt_llm/_torch/pyexecutor/guided_decoder.py, tensorrt_llm/_torch/pyexecutor/llm_request.py, tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py, tensorrt_llm/_torch/pyexecutor/model_engine.py, tensorrt_llm/_torch/pyexecutor/resource_manager.py, tensorrt_llm/_torch/pyexecutor/sampler.py, tensorrt_llm/_torch/pyexecutor/sampling_utils*.py
Systematically replaces hard-coded pinned memory with dynamic policy via use_pinned_memory() and maybe_pin_memory() across guided decoding, cache management, sampler, and model engine tensor allocations. Adds async worker logging in py_executor.py.
Speculative Decoding
tensorrt_llm/_torch/speculative/eagle3.py, tensorrt_llm/_torch/speculative/interface.py, tensorrt_llm/_torch/speculative/model_drafter.py, tensorrt_llm/_torch/speculative/mtp.py, tensorrt_llm/_torch/speculative/spec_tree_manager.py
Updates tensor pinning across speculative sampling paths. MTPSampler additionally inherits AsyncWorkerMixin and uses helper methods _copy_to_host() and _record_sampler_event() for async worker support.
Input Processing & Runtime
tensorrt_llm/inputs/multimodal.py, tensorrt_llm/runtime/model_runner_cpp.py, tensorrt_llm/runtime/multimodal_model_runner.py, tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py
Replaces direct pinning with conditional maybe_pin_memory() and use_pinned_memory() in multimodal input processing, PTuning setup, and calibrator state allocation. Updates license headers to 2022-2026.
Triton Backend
triton_backend/all_models/multimodal/multimodal_encoders/1/model.py
Wraps tensor conversions and assignments with maybe_pin_memory() instead of chained .pin_memory() calls in multimodal encoder paths for mllama and qwen2_vl models.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The pull request description is largely incomplete and does not follow the required template structure. Complete the Description, Test Coverage, and PR Checklist sections. Add clear explanation of what confidential compute is, why the fix was needed, which tests validate the changes, and mark all applicable checklist items.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title references multiple distinct changes (async worker, confidential_compute_enabled fix, and memory pinning), but the summary clearly identifies these as part of a cohesive fix addressing confidential compute and memory pinning behavior.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (17)
tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update SPDX copyright year range to include 2026.
Line 1 still ends at 2024.

🔧 Suggested update
-# SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py` at line 1, Update the
SPDX copyright header in tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py to
include 2026 in the year range (e.g., change "2022-2024" to "2022-2026") so the
file header complies with the repository's copyright/year-update guideline;
locate the top-of-file SPDX comment and modify only the year range string.
tensorrt_llm/_torch/attention_backend/sparse/rocket.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.

🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import math

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/attention_backend/sparse/rocket.py` at line 1, Add the
required NVIDIA Apache-2.0 copyright header to the top of
tensorrt_llm/_torch/attention_backend/sparse/rocket.py (before any imports such
as the existing "import math"); if this is a modified file update the copyright
year accordingly and ensure the header matches the project's standard NVIDIA
Apache-2.0 template.
tensorrt_llm/_torch/models/modeling_clip.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.

🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from typing import Dict, Optional, Tuple, Union

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_clip.py` at line 1, Add the required
NVIDIA Apache-2.0 copyright header to the top of the file
tensorrt_llm._torch.models.modeling_clip (modeling_clip.py); insert the standard
NVIDIA header block before any imports and ensure the copyright year is updated
to the current year if this is a modified file so the file begins with the full
license/header followed by the existing "from typing ..." import line.
tensorrt_llm/_torch/pyexecutor/sampling_utils.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update copyright year to 2026.
Line 1 still shows 2025 even though this file is modified in 2026.

🔧 Suggested update
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/sampling_utils.py` at line 1, Update the file
header copyright year from 2025 to 2026: modify the top-of-file copyright line
in tensorrt_llm/_torch/pyexecutor/sampling_utils.py so it reads 2026 (ensuring
the NVIDIA copyright header is present and reflects the current year).
tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update SPDX copyright year range to include 2026.
Line 1 still ends at 2025.

🔧 Suggested update
-# SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2022-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py` at line 1, Update the
SPDX copyright header in tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py
by extending the year range to include 2026: modify the existing
SPDX-FileCopyrightText line (the file header string "SPDX-FileCopyrightText") so
the copyright year range ends with 2026 instead of 2025.
tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.

🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import base64

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py` at line 1, This file
(calibrator.py) is missing the required NVIDIA Apache‑2.0 copyright header; add
the standard NVIDIA Apache‑2.0 header block at the very top of the file (before
the first import such as import base64), ensuring it includes the correct
copyright owner and updated year for modified files and matches the project's
canonical header text.
tensorrt_llm/_torch/attention_backend/sparse/dsa.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA Apache‑2.0 header.
Line 1 begins with imports; this file is missing the required header for modified source files.

🔧 Suggested header insertion
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import math

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/attention_backend/sparse/dsa.py` at line 1, This file
(module tensorrt_llm._torch.attention_backend.sparse.dsa) is missing the
required NVIDIA Apache-2.0 header; add the standard NVIDIA Apache‑2.0 license
header block at the very top of the file (before any imports), include the
correct copyright line and year (update the year if this is a modification), and
ensure the header text matches the other repository files' exact wording/format.
tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update copyright year to 2026.
Line 1 still shows 2025 even though this file is modified in 2026.

🔧 Suggested update
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

As per coding guidelines, "Include NVIDIA copyright header on ALL new files and update year on modified files".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py` at line 1,
Update the file header comment string "# Copyright (c) 2025, NVIDIA CORPORATION.
All rights reserved." to use the current year 2026 so the top-of-file NVIDIA
copyright line reflects the modification year (i.e., change "2025" to "2026");
ensure the header remains exactly in the same format as the existing NVIDIA
copyright header.
tensorrt_llm/_torch/models/modeling_siglip.py (1)

1-12: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache‑2.0 header for this modified Python file.

This file is missing the required NVIDIA copyright header; please add the standard Apache‑2.0 header (with the correct start year and updated end year to 2026) at the top.

📄 Proposed header insertion
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from typing import Dict, Optional, Tuple

As per coding guidelines, “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_siglip.py` around lines 1 - 12, Add the
standard NVIDIA Apache-2.0 copyright header at the very top of
tensorrt_llm/_torch/models/modeling_siglip.py (above all imports), using the
correct original start year and updating the end year to 2026; ensure the header
follows the Apache License 2.0 text format used across the repo and does not
modify existing imports or symbols such as SiglipVisionConfig,
SiglipVisionEmbeddings, or use_pinned_memory.
tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)

1-32: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache‑2.0 header for this modified Python file.

This file is missing the required NVIDIA copyright header; please add the standard Apache‑2.0 header (with the correct start year and updated end year to 2026) at the top.

📄 Proposed header insertion
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import copy

As per coding guidelines, “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py` around lines 1 - 32, This
file is missing the required NVIDIA Apache-2.0 copyright header; add the
standard NVIDIA Apache-2.0 license header (with the correct start year and
updated end year 2026) as the very first lines of
tensorrt_llm/_torch/models/modeling_qwen2vl.py so it precedes all imports and
code (e.g., before the existing imports like "import copy" and symbols such as
Qwen2_5_VisionPatchEmbed, Qwen2VisionTransformerPretrainedModel, Attention,
Linear, RMSNorm); ensure the header exactly matches the project's canonical
Apache-2.0 NVIDIA header format and includes the correct copyright years.
tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py (1)

1-8: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache‑2.0 header for this modified Python file.

This file is missing the required NVIDIA copyright header; please add the standard Apache‑2.0 header (with the correct start year and updated end year to 2026) at the top.

📄 Proposed header insertion
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from collections import namedtuple

As per coding guidelines, “All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py` around lines 1 - 8,
Add the standard NVIDIA Apache-2.0 copyright header at the very top of
cuda_graph_lora_params.py (before any imports like "from collections import
namedtuple"), using the Apache‑2.0 template and setting the copyright years to
the original start year through 2026 (e.g., "2019-2026" or the correct start
year for this file); ensure the header text and license notice exactly match the
project's canonical NVIDIA header format.
tensorrt_llm/_torch/speculative/spec_tree_manager.py (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache-2.0 header for 2026.

This modified file starts directly with imports, so the required NVIDIA copyright header is missing.
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/speculative/spec_tree_manager.py` around lines 1 - 2,
This file is missing the required NVIDIA Apache-2.0 copyright header for 2026;
add the standard NVIDIA Apache License 2.0 header (with year 2026 and
appropriate copyright holder) at the very top of
tensorrt_llm._torch.speculative.spec_tree_manager.py before any imports (i.e.,
above the existing import math / from typing import List) so the file complies
with the project's licensing header policy.
tensorrt_llm/_torch/models/modeling_qwen3vl.py (1)

1-3: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache-2.0 header for 2026.

This modified file starts directly with imports, so the required NVIDIA copyright header is missing.
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py` around lines 1 - 3, This file
(modeling_qwen3vl.py) is missing the required NVIDIA Apache-2.0 copyright header
for 2026; add the standard NVIDIA Apache-2.0 license header (with year 2026 and
the Apache License 2.0 boilerplate) at the very top of the file before any
imports (above the existing import block that begins with "import copy"); ensure
the header matches the project's canonical NVIDIA header format and includes the
correct year and license link.
tensorrt_llm/_torch/attention_backend/interface.py (1)

1-2: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache-2.0 header for 2026.

This modified file starts directly with imports, so the required NVIDIA copyright header is missing.
As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/attention_backend/interface.py` around lines 1 - 2, Add
the NVIDIA Apache-2.0 copyright header (with year 2026) to the very top of the
file so it precedes the existing imports; insert the standard NVIDIA Apache-2.0
license block before the first lines "import copy" and "import weakref" in
tensorrt_llm/_torch/attention_backend/interface.py, ensuring the header matches
the project's Apache License 2.0 format and includes the correct copyright owner
and year.
triton_backend/all_models/multimodal/multimodal_encoders/1/model.py (1)

1-25: ⚠️ Potential issue | 🟠 Major

Update the file header to Apache 2.0 format.
The current BSD‑style header doesn’t match the required Apache 2.0 template for source files.

As per coding guidelines, "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@triton_backend/all_models/multimodal/multimodal_encoders/1/model.py` around
lines 1 - 25, Replace the existing BSD‑style copyright header at the top of the
file with the NVIDIA Apache License 2.0 header template, update the copyright
year to 2026, include the appropriate SPDX identifier (SPDX-License-Identifier:
Apache-2.0) and the standard Apache 2.0 notice and URL; ensure this new header
fully replaces the current block that begins with the copyright notice and ends
before code begins so the file uses the required Apache 2.0 format.
tensorrt_llm/_torch/speculative/mtp.py (1)

1-10: ⚠️ Potential issue | 🟠 Major

Add/update NVIDIA copyright header in this source file.

This file appears to start immediately with imports (Line 1) and lacks the required NVIDIA Apache 2.0 header with the latest modification year. Please add/update the header here (and any other modified source files that are missing it).

As per coding guidelines: "All source files must contain an NVIDIA copyright header with the year of latest meaningful modification. Use the Apache License 2.0 format."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/speculative/mtp.py` around lines 1 - 10, This source file
is missing the required NVIDIA Apache-2.0 copyright header; add/update a
standard NVIDIA Apache 2.0 header (with the latest meaningful modification year
and "NVIDIA CORPORATION" copyright line) at the very top of the file before any
imports in the module containing MambaHybridCacheManager and use_pinned_memory
(tensorrt_llm._torch.speculative.mtp), and apply the same header to any other
modified source files lacking it.
tensorrt_llm/runtime/model_runner_cpp.py (1)

853-858: ⚠️ Potential issue | 🟡 Minor

Update comment — "MUST be pinned" contradicts conditional pinning when CC is enabled.

The inline comment still says host memory must be page-locked for H2D/D2H transfers, but maybe_pin_memory deliberately skips pinning when confidential compute is active (all transfers become synchronous in that mode, per use_pinned_memory()'s docstring). The word "MUST" is now misleading.

✏️ Suggested comment update
             # CUDA Stream Overlapping Requirements:
             # 1. Both memory copy stream and kernel execution stream must be non-default streams
-            # 2. For host<->device transfers (H2D/D2H), host memory MUST be page-locked (pinned)
+            # 2. For host<->device transfers (H2D/D2H), host memory should be page-locked (pinned)
+            #    when confidential compute is not active; pinning is skipped under CC because all
+            #    H2D/D2H copies become synchronous in that mode regardless.
             prompt_table_data = maybe_pin_memory(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/runtime/model_runner_cpp.py` around lines 853 - 858, The comment
incorrectly states host memory "MUST be page-locked" even though
maybe_pin_memory(_prepare_embedding_table(...)) conditionally skips pinning when
confidential compute is enabled; update the inline comment near
mm_embedding_offloading to say that host memory should be page-locked for async
H2D/D2H transfers unless use_pinned_memory() indicates confidential
compute/synchronous transfers, and reference maybe_pin_memory,
use_pinned_memory, mm_embedding_offloading and _prepare_embedding_table so
readers understand pinning is conditional.
🧹 Nitpick comments (1)
scripts/check_pinned_memory_usage.py (1)

52-52: endswith path check is slightly fragile.

path.as_posix().endswith("tensorrt_llm/_utils.py") would also match a hypothetical other_package/tensorrt_llm/_utils.py. In a pre-commit context the paths are always repository-relative so this is unlikely to matter, but a more explicit check (e.g. comparing the last two path components) would be more robust.

🔧 Optional hardening
-    allow_direct_pin_memory = path.as_posix().endswith("tensorrt_llm/_utils.py")
+    parts = path.parts
+    allow_direct_pin_memory = (
+        len(parts) >= 2
+        and parts[-2] == "tensorrt_llm"
+        and parts[-1] == "_utils.py"
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/check_pinned_memory_usage.py` at line 52, The current fragile check
uses path.as_posix().endswith("tensorrt_llm/_utils.py") to set
allow_direct_pin_memory; replace it with a robust check that compares the last
two path components instead (e.g. verify path.parent.name == "tensorrt_llm" and
path.name == "_utils.py" or compare path.parts[-2:] to
("tensorrt_llm","_utils.py")) so only the exact repository-relative module
triggers allow_direct_pin_memory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/check_pinned_memory_usage.py`:
- Around line 15-38: In visit_Call, the exemption flag allow_direct_pin_memory
is only applied to the .pin_memory attribute-call branch but not to the
pin_memory=True keyword check; modify the keyword branch in visit_Call to skip
appending the violation when self.allow_direct_pin_memory is True (same
conditional used for the attribute case) so that when allow_direct_pin_memory is
set the function/method call (visit_Call) ignores both direct `.pin_memory()`
and `pin_memory=True` keyword usages; update the check around the keyword.arg ==
"pin_memory" block in visit_Call to include a guard referencing
self.allow_direct_pin_memory before calling self.violations.append so both cases
are symmetric.

In `@tensorrt_llm/_torch/attention_backend/interface.py`:
- Line 17: The import currently uses "from tensorrt_llm._utils import
maybe_pin_memory"; change it to import the module (e.g., "from tensorrt_llm
import _utils") and update all usages of maybe_pin_memory in this file
(interface.py) to be qualified as _utils.maybe_pin_memory so the _utils
namespace is preserved consistent with the project's import guideline.

In `@tensorrt_llm/_torch/attention_backend/trtllm.py`:
- Around line 16-17: This file (tensorrt_llm._torch.attention_backend.trtllm,
near the top where imports like get_sm_version, maybe_pin_memory,
use_pinned_memory are declared) is missing the NVIDIA Apache-2.0 header; add the
standard SPDX/Apache-2.0 header block (including "Copyright (c) 2026 NVIDIA
CORPORATION" and the SPDX-License-Identifier: Apache-2.0) at the very top of the
file before any imports or code, matching the project's canonical header format.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Line 20: The import added pulls symbols directly from _utils; change it to
import the module (e.g., import ..._utils as _utils) and update all usages of
nvtx_range, nvtx_range_debug, and use_pinned_memory in this file to be qualified
(e.g., _utils.nvtx_range, _utils.nvtx_range_debug, _utils.use_pinned_memory) so
the _utils namespace is preserved per the guideline.

In `@tensorrt_llm/_torch/models/modeling_radio.py`:
- Line 25: Replace the short copyright line at the top of
tensorrt_llm/_torch/models/modeling_radio.py with the standard NVIDIA Apache-2.0
header updated to year 2026; open the file containing the import
use_pinned_memory and ensure the full SPDX/Apache-2.0 header appears before any
imports or code, matching the project's canonical NVIDIA header format and
including the SPDX identifier and Apache-2.0 license text reference.

In `@tensorrt_llm/_torch/pyexecutor/py_executor.py`:
- Line 616: Remove the unnecessary f-string prefix on the logger call in the
async worker startup message: locate the logger.info call that logs "Starting
the async worker for sampler D2H copies" (used in the async worker for sampler
D2H copies) and change the string literal to a plain regular string (remove the
leading f) so it isn't an f-string with no placeholders.
- Around line 613-617: The async-worker start block should be moved inside the
start_worker() initialization guard so the sampler async worker is started only
when the main worker is first started: inside the existing "if not
self.worker_started:" block, after the main worker startup code, check
"isinstance(self.sampler, AsyncWorkerMixin) and
self.sampler.async_worker_enabled()" and call self.sampler.async_worker_start();
remove the current async-worker block that now sits outside the guard to avoid
repeated calls; keep references to sampler, AsyncWorkerMixin,
async_worker_start(), async_worker_enabled(), and worker_started to locate the
change.

In `@tensorrt_llm/_torch/pyexecutor/sampler.py`:
- Around line 33-39: The current import in sampler.py pulls individual symbols
from tensorrt_llm._utils; change it to import the module itself (import
tensorrt_llm._utils as _utils) and update all usages of maybe_pin_memory,
mpi_disabled, nvtx_range, torch_dtype_to_binding, and use_pinned_memory to be
qualified (e.g., _utils.maybe_pin_memory, _utils.mpi_disabled,
_utils.nvtx_range, _utils.torch_dtype_to_binding, _utils.use_pinned_memory) so
the _utils namespace is preserved per project guidelines.

In `@tensorrt_llm/_torch/speculative/model_drafter.py`:
- Line 8: Add the NVIDIA Apache-2.0 license header to the top of
tensorrt_llm/_torch/speculative/model_drafter.py (above the existing imports),
using the standard Apache-2.0 SPDX header and set the copyright year to 2026;
ensure the header includes the SPDX-License-Identifier: Apache-2.0 line and the
NVIDIA copyright notice, then leave the existing import line "from
tensorrt_llm._utils import nvtx_range, use_pinned_memory" unchanged below the
header.

In `@tensorrt_llm/_torch/speculative/spec_tree_manager.py`:
- Line 6: Replace the direct symbol import with a module import and qualify its
uses: change the import of use_pinned_memory to import the module (e.g., import
tensorrt_llm._utils as _utils or from tensorrt_llm import _utils) and update all
references to use_pinned_memory in spec_tree_manager.py to
_utils.use_pinned_memory so the _utils namespace is preserved (look for
occurrences of use_pinned_memory in this file and adjust them).

In `@tensorrt_llm/_utils.py`:
- Around line 1261-1265: The logger.error call in the ImportError handler uses
an unnecessary f-string prefix which triggers Ruff F541; update the ImportError
block around the import of pynvml so the logger.error call uses a normal string
(remove the leading f in "f\"pynvml not available; assuming CC=off\"") while
preserving the message and the surrounding try/except that returns False.
- Around line 1317-1320: Add a Google-style docstring to the public utility
function maybe_pin_memory explaining its purpose, parameters, return value, and
behavior: describe that it takes a torch.Tensor (parameter name: tensor), pins
and returns the tensor if use_pinned_memory() is True, otherwise returns the
original tensor, and mention the return type torch.Tensor and any side effects
(pinning memory). Place this docstring immediately above the maybe_pin_memory
function definition and follow Google docstring sections (Args, Returns, and
optionally Raises if relevant).

In `@triton_backend/all_models/multimodal/multimodal_encoders/1/model.py`:
- Line 1: Add a Unix shebang to the top of the executable Python file or remove
its executable bit: open the file multimodal_encoders/1/model.py and either
insert a shebang line like #!/usr/bin/env python3 as the very first line, or
change the file permissions to remove the execute flag (chmod a-x) so it is not
treated as an executable; ensure the change is committed so static analysis
EXE002 is resolved.

---

Outside diff comments:
In `@tensorrt_llm/_torch/attention_backend/interface.py`:
- Around line 1-2: Add the NVIDIA Apache-2.0 copyright header (with year 2026)
to the very top of the file so it precedes the existing imports; insert the
standard NVIDIA Apache-2.0 license block before the first lines "import copy"
and "import weakref" in tensorrt_llm/_torch/attention_backend/interface.py,
ensuring the header matches the project's Apache License 2.0 format and includes
the correct copyright owner and year.

In `@tensorrt_llm/_torch/attention_backend/sparse/dsa.py`:
- Line 1: This file (module tensorrt_llm._torch.attention_backend.sparse.dsa) is
missing the required NVIDIA Apache-2.0 header; add the standard NVIDIA
Apache‑2.0 license header block at the very top of the file (before any
imports), include the correct copyright line and year (update the year if this
is a modification), and ensure the header text matches the other repository
files' exact wording/format.

In `@tensorrt_llm/_torch/attention_backend/sparse/rocket.py`:
- Line 1: Add the required NVIDIA Apache-2.0 copyright header to the top of
tensorrt_llm/_torch/attention_backend/sparse/rocket.py (before any imports such
as the existing "import math"); if this is a modified file update the copyright
year accordingly and ensure the header matches the project's standard NVIDIA
Apache-2.0 template.

In `@tensorrt_llm/_torch/models/modeling_clip.py`:
- Line 1: Add the required NVIDIA Apache-2.0 copyright header to the top of the
file tensorrt_llm._torch.models.modeling_clip (modeling_clip.py); insert the
standard NVIDIA header block before any imports and ensure the copyright year is
updated to the current year if this is a modified file so the file begins with
the full license/header followed by the existing "from typing ..." import line.

In `@tensorrt_llm/_torch/models/modeling_qwen2vl.py`:
- Around line 1-32: This file is missing the required NVIDIA Apache-2.0
copyright header; add the standard NVIDIA Apache-2.0 license header (with the
correct start year and updated end year 2026) as the very first lines of
tensorrt_llm/_torch/models/modeling_qwen2vl.py so it precedes all imports and
code (e.g., before the existing imports like "import copy" and symbols such as
Qwen2_5_VisionPatchEmbed, Qwen2VisionTransformerPretrainedModel, Attention,
Linear, RMSNorm); ensure the header exactly matches the project's canonical
Apache-2.0 NVIDIA header format and includes the correct copyright years.

In `@tensorrt_llm/_torch/models/modeling_qwen3vl.py`:
- Around line 1-3: This file (modeling_qwen3vl.py) is missing the required
NVIDIA Apache-2.0 copyright header for 2026; add the standard NVIDIA Apache-2.0
license header (with year 2026 and the Apache License 2.0 boilerplate) at the
very top of the file before any imports (above the existing import block that
begins with "import copy"); ensure the header matches the project's canonical
NVIDIA header format and includes the correct year and license link.

In `@tensorrt_llm/_torch/models/modeling_siglip.py`:
- Around line 1-12: Add the standard NVIDIA Apache-2.0 copyright header at the
very top of tensorrt_llm/_torch/models/modeling_siglip.py (above all imports),
using the correct original start year and updating the end year to 2026; ensure
the header follows the Apache License 2.0 text format used across the repo and
does not modify existing imports or symbols such as SiglipVisionConfig,
SiglipVisionEmbeddings, or use_pinned_memory.

In `@tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py`:
- Line 1: Update the SPDX copyright header in
tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py to include 2026 in the year
range (e.g., change "2022-2024" to "2022-2026") so the file header complies with
the repository's copyright/year-update guideline; locate the top-of-file SPDX
comment and modify only the year range string.

In `@tensorrt_llm/_torch/peft/lora/cuda_graph_lora_params.py`:
- Around line 1-8: Add the standard NVIDIA Apache-2.0 copyright header at the
very top of cuda_graph_lora_params.py (before any imports like "from collections
import namedtuple"), using the Apache‑2.0 template and setting the copyright
years to the original start year through 2026 (e.g., "2019-2026" or the correct
start year for this file); ensure the header text and license notice exactly
match the project's canonical NVIDIA header format.

In `@tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py`:
- Line 1: Update the SPDX copyright header in
tensorrt_llm/_torch/pyexecutor/mamba_cache_manager.py by extending the year
range to include 2026: modify the existing SPDX-FileCopyrightText line (the file
header string "SPDX-FileCopyrightText") so the copyright year range ends with
2026 instead of 2025.

In `@tensorrt_llm/_torch/pyexecutor/sampling_utils_flashinfer.py`:
- Line 1: Update the file header comment string "# Copyright (c) 2025, NVIDIA
CORPORATION. All rights reserved." to use the current year 2026 so the
top-of-file NVIDIA copyright line reflects the modification year (i.e., change
"2025" to "2026"); ensure the header remains exactly in the same format as the
existing NVIDIA copyright header.

In `@tensorrt_llm/_torch/pyexecutor/sampling_utils.py`:
- Line 1: Update the file header copyright year from 2025 to 2026: modify the
top-of-file copyright line in tensorrt_llm/_torch/pyexecutor/sampling_utils.py
so it reads 2026 (ensuring the NVIDIA copyright header is present and reflects
the current year).

In `@tensorrt_llm/_torch/speculative/mtp.py`:
- Around line 1-10: This source file is missing the required NVIDIA Apache-2.0
copyright header; add/update a standard NVIDIA Apache 2.0 header (with the
latest meaningful modification year and "NVIDIA CORPORATION" copyright line) at
the very top of the file before any imports in the module containing
MambaHybridCacheManager and use_pinned_memory
(tensorrt_llm._torch.speculative.mtp), and apply the same header to any other
modified source files lacking it.

In `@tensorrt_llm/_torch/speculative/spec_tree_manager.py`:
- Around line 1-2: This file is missing the required NVIDIA Apache-2.0 copyright
header for 2026; add the standard NVIDIA Apache License 2.0 header (with year
2026 and appropriate copyright holder) at the very top of
tensorrt_llm._torch.speculative.spec_tree_manager.py before any imports (i.e.,
above the existing import math / from typing import List) so the file complies
with the project's licensing header policy.

In `@tensorrt_llm/runtime/model_runner_cpp.py`:
- Around line 853-858: The comment incorrectly states host memory "MUST be
page-locked" even though maybe_pin_memory(_prepare_embedding_table(...))
conditionally skips pinning when confidential compute is enabled; update the
inline comment near mm_embedding_offloading to say that host memory should be
page-locked for async H2D/D2H transfers unless use_pinned_memory() indicates
confidential compute/synchronous transfers, and reference maybe_pin_memory,
use_pinned_memory, mm_embedding_offloading and _prepare_embedding_table so
readers understand pinning is conditional.

In `@tensorrt_llm/tools/layer_wise_benchmarks/calibrator.py`:
- Line 1: This file (calibrator.py) is missing the required NVIDIA Apache‑2.0
copyright header; add the standard NVIDIA Apache‑2.0 header block at the very
top of the file (before the first import such as import base64), ensuring it
includes the correct copyright owner and updated year for modified files and
matches the project's canonical header text.

In `@triton_backend/all_models/multimodal/multimodal_encoders/1/model.py`:
- Around line 1-25: Replace the existing BSD‑style copyright header at the top
of the file with the NVIDIA Apache License 2.0 header template, update the
copyright year to 2026, include the appropriate SPDX identifier
(SPDX-License-Identifier: Apache-2.0) and the standard Apache 2.0 notice and
URL; ensure this new header fully replaces the current block that begins with
the copyright notice and ends before code begins so the file uses the required
Apache 2.0 format.

---

Nitpick comments:
In `@scripts/check_pinned_memory_usage.py`:
- Line 52: The current fragile check uses
path.as_posix().endswith("tensorrt_llm/_utils.py") to set
allow_direct_pin_memory; replace it with a robust check that compares the last
two path components instead (e.g. verify path.parent.name == "tensorrt_llm" and
path.name == "_utils.py" or compare path.parts[-2:] to
("tensorrt_llm","_utils.py")) so only the exact repository-relative module
triggers allow_direct_pin_memory.

@dhansen-nvidia dhansen-nvidia force-pushed the confidential_compute_fixes branch from f091fc2 to fd4b519 Compare February 18, 2026 21:19
mojombo and others added 3 commits February 23, 2026 15:26
… an exception

Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
…fy intent

Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
…rrt_llm/runtime/multimodal_model_runner.py

Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
@dhansen-nvidia dhansen-nvidia force-pushed the confidential_compute_fixes branch from 9bc6c55 to 527a7ea Compare February 23, 2026 20:27
@tensorrt-cicd
Copy link
Collaborator

PR_Github #36539 [ run ] completed with state SUCCESS. Commit: aed6a58
/LLM/main/L0_MergeRequest_PR pipeline #28274 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Copy link
Collaborator

@chzblych chzblych left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved for pre-commit checks.

@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36592 [ run ] triggered by Bot. Commit: 527a7ea Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36592 [ run ] completed with state SUCCESS. Commit: 527a7ea
/LLM/main/L0_MergeRequest_PR pipeline #28318 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36652 [ run ] triggered by Bot. Commit: 527a7ea Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36652 [ run ] completed with state SUCCESS. Commit: 527a7ea
/LLM/main/L0_MergeRequest_PR pipeline #28373 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36692 [ run ] triggered by Bot. Commit: 527a7ea Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36692 [ run ] completed with state SUCCESS. Commit: 527a7ea
/LLM/main/L0_MergeRequest_PR pipeline #28411 completed with status: 'SUCCESS'

Link to invocation

Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36803 [ run ] triggered by Bot. Commit: 3252c55 Link to invocation

@pcastonguay pcastonguay enabled auto-merge (squash) February 25, 2026 14:36
@tensorrt-cicd
Copy link
Collaborator

PR_Github #36803 [ run ] completed with state ABORTED. Commit: 3252c55
LLM/main/L0_MergeRequest_PR #28498 (Blue Ocean) completed with status: ABORTED

Link to invocation

@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36812 [ run ] triggered by Bot. Commit: a9d9e52 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36812 [ run ] completed with state SUCCESS. Commit: a9d9e52
/LLM/main/L0_MergeRequest_PR pipeline #28504 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@dhansen-nvidia
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36939 [ run ] triggered by Bot. Commit: a9d9e52 Link to invocation

@tensorrt-cicd
Copy link
Collaborator

PR_Github #36939 [ run ] completed with state SUCCESS. Commit: a9d9e52
/LLM/main/L0_MergeRequest_PR pipeline #28602 completed with status: 'SUCCESS'

Link to invocation

@pcastonguay pcastonguay merged commit 3fd5faf into NVIDIA:main Feb 26, 2026
5 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Mar 9, 2026
NVIDIA#11573)

Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.