[None][feat] Autodeploy add triton configs and optimize mamba prefill #9083

suyoggupta · 2025-11-12T05:52:21Z

Summary by CodeRabbit

New Features
- Added fused causal convolution with activation optimization for improved kernel performance
- Introduced dynamic hardware-specific tuning for MoE kernel configurations
- Extended support for chunked sequence processing across attention and SSM operations
Optimizations
- Improved output memory handling in CUDA graph compilation
- Enhanced metadata computation for cached SSM operations with additional context information

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

coderabbitai · 2025-11-12T06:03:50Z

📝 Walkthrough

Walkthrough

This PR introduces chunk-based processing support throughout the auto-deploy pipeline, implements dynamic MoE kernel configuration loading from JSON files, adds a causal convolution fusion optimization pass, and extends various backend implementations to handle activation parameters and chunking metadata.

Changes

Cohort / File(s)	Summary
Configuration & Model Properties `tensorrt_llm/_torch/auto_deploy/config/default.yaml`, `tensorrt_llm/_torch/auto_deploy/models/factory.py`, `tensorrt_llm/_torch/auto_deploy/models/hf.py`	Added `fuse_causal_conv_activation` transformation to compile stage. Introduced `chunk_size` property to `ModelFactory` and `AutoModelForCausalLMFactory` with fallback retrieval from model config.
Custom Ops: Attention & Metadata Interfaces `tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py`	Extended all metadata preparation function signatures to accept `chunk_size: int` parameter. Updated `SequenceInfo` to store `chunk_size` as optional attribute and changed `slot_idx` dtype from `int` to `long`. Updated cached constants tuple to include `chunk_size`.
Custom Ops: Causal Convolution (CUDA & Torch) `tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py`	Added `chunk_size` and `activation: Optional[str]` parameters to metadata preparation functions. Extended `_cuda_cached_causal_conv1d` to accept and propagate activation through prefill/decode paths. Modified output handling to eliminate dtype casting and avoid cloning. Updated `get_constants` to extract optional activation from source node with fallback to `None`.
Custom Ops: Mamba SSM (Torch & Triton) `tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py`	Added `chunk_size` parameter to torch metadata preparation. Introduced new `_triton_ssm_prepare_metadata` op returning 8-tuple with extended metadata (`cu_seqlens`, `chunk_indices`, `chunk_offsets`, `batch_info_tensor`). Updated `TritonBackendSSM` to consume and propagate augmented metadata through prefill/decode phases.
Custom Ops: MoE Configuration Infrastructure `tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_H100_80GB_HBM3.json`, `tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_L40S.json`	Added config loading infrastructure: `get_config_file_name()`, `get_moe_configs()` (cached), and `_get_kernel_config()` to support dynamic kernel tuning. Introduced JSON config files with Triton version and block-size presets keyed by batch size. Replaced direct `_default_kernel_config()` calls with resolver that attempts optimized config lookup before fallback.
Transforms: Causal Convolution Fusion `tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py`	New module introducing `FuseCausalConvActivation` transform registered as `"fuse_causal_conv_activation"`. Implements pattern matching for `causal_conv1d` followed by activation (silu), rewrites matched patterns to fused op call with activation baked as argument, and erases original nodes.
Runtime & Execution `tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py`, `tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py`	Extended `SequenceInfo` construction in `build_from_config` to pass `chunk_size`. Added debug output of `llm_args` during ADEngine initialization. Modified forward output path in `torch_cudagraph.py` to return sliced buffer without detach/clone, altering memory sharing and gradient tracking.
Tests `tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_cuda_causal_conv_cached_op.py`	Updated test calls to `cuda_cached_causal_conv1d` op to pass additional `None` argument reflecting updated op signature.

Sequence Diagram(s)

sequenceDiagram
    participant GraphModule
    participant Matcher as Pattern Matcher
    participant Transform as FuseCausalConvActivation
    participant Backend as CUDA Backend

    GraphModule->>Matcher: Scan for causal_conv1d + activation
    Matcher-->>Transform: Return matched (conv_node, activation_node, op_name)
    
    Transform->>Transform: Extract activation function name
    Note over Transform: Identify silu/activation type
    
    Transform->>GraphModule: Insert fused op call<br/>(cuda_cached_causal_conv1d + activation arg)
    GraphModule->>GraphModule: Replace activation node with fused call
    GraphModule->>GraphModule: Erase original conv & activation nodes
    
    GraphModule->>Backend: Execute fused kernel<br/>(activation baked in)
    Backend-->>GraphModule: Return fused output

sequenceDiagram
    participant User
    participant Factory as ModelFactory
    participant Executor as ADExecutor
    participant Config as get_moe_configs()
    participant Kernel as Triton Kernel

    User->>Factory: Query model chunk_size
    Factory-->>User: Return chunk_size from config
    
    User->>Executor: Initialize with factory
    Executor->>Factory: Fetch chunk_size
    Executor->>Executor: Build SequenceInfo with chunk_size
    
    Kernel->>Config: Request optimized config<br/>(E, N, dtype, batch_size)
    Config->>Config: Load JSON from disk (cached)
    Config->>Config: Find closest batch-size key to M
    alt Config Found
        Config-->>Kernel: Return optimized block sizes
        Kernel->>Kernel: Use tuned BLOCK_SIZE_M/N/K
    else Fallback
        Config-->>Kernel: Return None
        Kernel->>Kernel: Use default_kernel_config(M, E)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

torch_cudagraph.py (forward output handling): Change from detach/clone to direct slice affects memory ownership and gradient flow—verify no unintended side effects on backprop or memory lifecycle.
triton_backend_mamba.py (augmented metadata outputs): New 8-tuple return value with batch info tensors and chunk metadata; ensure all consumers properly unpack and utilize new fields, and verify prefill/decode logic correctly applies new indexing.
triton_moe.py (config loading & resolver): New _get_kernel_config() resolution logic selects closest batch-size key; verify rounding/fallback behavior handles edge cases and cache invalidation is correct.
cuda_backend_causal_conv.py & torch_backend_causal_conv.py (activation propagation): Activation parameter extraction and threading through prefill/decode—ensure all code paths handle None activation correctly and no activation is lost.
fuse_causal_conv.py (pattern matching & graph rewriting): New transform logic that modifies GraphModule; verify pattern matcher correctly identifies all intended causal_conv1d+activation patterns and rewrites do not break computation.
Signature consistency across custom ops: chunk_size added to many metadata functions; check that all callers have been updated and no mismatches exist between function signatures and call sites.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 48.84% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	⚠️ Warning	The pull request description is entirely empty, missing all required sections from the template including a description of changes, test coverage details, and PR checklist verification.	Provide a comprehensive PR description including: (1) what changes are made and why; (2) relevant test coverage information; (3) verification of the PR checklist items. Refer to the description template for required sections.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main changes: adding Triton configs and optimizing mamba prefill for Autodeploy, which aligns with the file changes and objectives.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py (1)
358-384: Fix the fake op signature to include chunk_size.

torch_backend_prepare_metadata now receives chunk_size, but the fake registration still exposes the old signature. During fake tensor tracing the dispatcher will pass the extra argument, leading to an immediate TypeError and breaking export. Please update the fake to accept the new parameter as well.
 @torch_backend_prepare_metadata.register_fake
 def torch_backend_prepare_metadata_fake(
-    position_ids, seq_len, input_pos, cache_loc, pages_per_seq, slot_idx, page_size
+    position_ids,
+    seq_len,
+    input_pos,
+    cache_loc,
+    pages_per_seq,
+    slot_idx,
+    page_size,
+    chunk_size,
 ):
     num_seq = SequenceInfo._get_sanitized_num_sequences(position_ids, seq_len)
     return (

🧹 Nitpick comments (1)

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py (1)

299-301: Drop the unused # noqa.

Ruff flags this # noqa: E501 as unused. Removing the directive (or splitting the string if needed) keeps the file clean.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb6eb95 and 40622e9.

📒 Files selected for processing (19)

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py (1 hunks)
tensorrt_llm/_torch/auto_deploy/config/default.yaml (1 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (4 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py (2 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_H100_80GB_HBM3.json (1 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_L40S.json (1 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py (4 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py (6 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py (1 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py (2 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py (7 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py (2 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py (1 hunks)
tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py (2 hunks)
tensorrt_llm/_torch/auto_deploy/models/factory.py (1 hunks)
tensorrt_llm/_torch/auto_deploy/models/hf.py (1 hunks)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (2 hunks)
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py (1 hunks)
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_cuda_causal_conv_cached_op.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{h,hpp,hh,hxx,cpp,cxx,cc,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Use only spaces, no tabs; indent with 4 spaces.

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_cuda_causal_conv_cached_op.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py
tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py
tensorrt_llm/_torch/auto_deploy/models/factory.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py
tensorrt_llm/_torch/auto_deploy/models/hf.py
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Python code must target Python 3.8+.
Indent Python code with 4 spaces; do not use tabs.
Maintain module namespace when importing; prefer 'from package.subpackage import foo' then 'foo.SomeClass()' instead of importing the class directly.
Python filenames should be snake_case (e.g., some_file.py).
Python classes use PascalCase names.
Functions and methods use snake_case names.
Local variables use snake_case; prefix 'k' for variables that start with a number (e.g., k_99th_percentile).
Global variables use upper SNAKE_CASE prefixed with 'G' (e.g., G_MY_GLOBAL).
Constants use upper SNAKE_CASE (e.g., MY_CONSTANT).
Avoid shadowing variables from an outer scope.
Initialize all externally visible members of a class in the constructor.
Prefer docstrings for interfaces that may be used outside a file; comments for in-function or file-local interfaces.
Use Google-style docstrings for classes and functions (Sphinx-parsable).
Document attributes and variables inline so they render under the class/function docstring.
Avoid reflection when a simpler, explicit approach suffices (e.g., avoid dict(**locals()) patterns).
In try/except, catch the most specific exceptions possible.
For duck-typing try/except, keep the try body minimal and use else for the main logic.

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_cuda_causal_conv_cached_op.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py
tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py
tensorrt_llm/_torch/auto_deploy/models/factory.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py
tensorrt_llm/_torch/auto_deploy/models/hf.py
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

**/*.{cpp,cxx,cc,h,hpp,hh,hxx,cu,cuh,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Prepend the NVIDIA Apache-2.0 copyright header with current year to the top of all source files (e.g., .cpp, .h, .cu, .py).

Files:

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_cuda_causal_conv_cached_op.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py
tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py
tensorrt_llm/_torch/auto_deploy/models/factory.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py
tensorrt_llm/_torch/auto_deploy/models/hf.py
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py
tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py
tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

🧠 Learnings (10)

📓 Common learnings

Learnt from: venkywonka
Repo: NVIDIA/TensorRT-LLM PR: 6029
File: .github/pull_request_template.md:45-53
Timestamp: 2025-08-27T17:50:13.264Z
Learning: For PR templates in TensorRT-LLM, avoid suggesting changes that would increase developer overhead, such as converting plain bullets to mandatory checkboxes. The team prefers guidance-style bullets that don't require explicit interaction to reduce friction in the PR creation process.

📚 Learning: 2025-08-14T21:04:50.248Z

Learnt from: thorjohnsen
Repo: NVIDIA/TensorRT-LLM PR: 6910
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:0-0
Timestamp: 2025-08-14T21:04:50.248Z
Learning: In KV cache onboarding logic during prefill in cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, when calculating which blocks fall within the attention window, use getTokensPerBlock() to advance token indices rather than block->getUniqueTokens().size(), because the calculation needs to consider the post-prefill state where blocks will be filled to capacity, not their current token count.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py
tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py

📚 Learning: 2025-08-08T04:10:19.038Z

Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 6728
File: cpp/tensorrt_llm/plugins/mixtureOfExperts/mixtureOfExpertsPlugin.cpp:966-966
Timestamp: 2025-08-08T04:10:19.038Z
Learning: TensorRT plugins currently don't support padding functionality, and TensorRT is not getting new features (in maintenance mode). This means that duplicating parameters like mExpertHiddenSize in function calls, even with TODO comments, can be acceptable as pragmatic solutions within these constraints.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py

📚 Learning: 2025-09-29T15:14:28.503Z

Learnt from: amitz-nv
Repo: NVIDIA/TensorRT-LLM PR: 8063
File: tensorrt_llm/lora_manager.py:1080-1112
Timestamp: 2025-09-29T15:14:28.503Z
Learning: In tensorrt_llm/lora_manager.py, when calculating part_sizes for attn_qkv fused LoRA modules, the sizes are correctly multiplied by tp_size because model_config.num_heads and model_config.num_kv_heads are already divided by tp_size (per-TP-rank values), so multiplication is needed to get the original full concatenated dimension size. The interleave_fused_lora_weights_for_tp function provides proper validation with asserts for total size and TP divisibility.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py

📚 Learning: 2025-08-21T02:39:12.009Z

Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 7104
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:1475-1480
Timestamp: 2025-08-21T02:39:12.009Z
Learning: The min latency mode functionality in TensorRT-LLM MOE kernels (cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu) is deprecated and no longer being maintained/updated, as confirmed by djns99. Bug reports and optimization suggestions for the computeStridesTmaWarpSpecializedLowLatencyKernel and related min latency code paths should be deprioritized.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_fused_moe_configs/E=128,N=1856,device_name=NVIDIA_L40S.json
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

📚 Learning: 2025-08-09T20:57:04.084Z

Learnt from: sklevtsov-nvidia
Repo: NVIDIA/TensorRT-LLM PR: 3294
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu:118-127
Timestamp: 2025-08-09T20:57:04.084Z
Learning: In the CUTLASS MoE finalize fusion implementation (cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_tma_warp_specialized_input.cu), when setting `fused_finalize_epilogue.stride_final_output` with shape `(hidden_size, num_output_tokens, 1)`, the `num_rows_in_final_output` should be set to `num_output_tokens` (not `hidden_size`) because of a swap+transpose operation that maps rows of the output tensor to `hidden_size` and columns to `num_output_tokens`.

Applied to files:

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py

📚 Learning: 2025-10-20T16:54:09.824Z

Learnt from: nvchenghaoz
Repo: NVIDIA/TensorRT-LLM PR: 8469
File: tensorrt_llm/_torch/auto_deploy/custom_ops/rms_norm.py:6-6
Timestamp: 2025-10-20T16:54:09.824Z
Learning: In tensorrt_llm/_torch/auto_deploy/custom_ops/rms_norm.py, the import `from ...modules.mamba.layernorm_gated import _layer_norm_fwd` is correct and should not be changed to modules.fla.layernorm_gated. The _layer_norm_fwd function exists in both modules/mamba/layernorm_gated.py and modules/fla/layernorm_gated.py, but the mamba version is the intended implementation for this use case.

Applied to files:

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py

📚 Learning: 2025-10-20T17:09:21.560Z

Learnt from: nvchenghaoz
Repo: NVIDIA/TensorRT-LLM PR: 8469
File: tensorrt_llm/_torch/auto_deploy/transform/library/rms_norm.py:180-182
Timestamp: 2025-10-20T17:09:21.560Z
Learning: In tensorrt_llm/_torch/auto_deploy/transform/library/rms_norm.py, the _gated_rmsnorm_replacement function does not need to cast the output of torch.ops.auto_deploy.torch_rmsnorm_gated back to the input dtype, even though the custom op returns fp32. The dtype handling is managed elsewhere or the fp32 output is acceptable for downstream consumers.

Applied to files:

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py

📚 Learning: 2025-08-20T06:56:02.889Z

Learnt from: eopXD
Repo: NVIDIA/TensorRT-LLM PR: 6768
File: cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp:577-579
Timestamp: 2025-08-20T06:56:02.889Z
Learning: In cpp/tensorrt_llm/batch_manager/kvCacheManager.cpp, maxSequenceLength is now enforced as a non-optional argument in the BlockManager constructor, so concerns about std::nullopt defaulting to 0 are not applicable. When windowSize > maxSequenceLength, a warning should be added instead of handling optional parameter cases.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py

📚 Learning: 2025-08-14T23:23:27.449Z

Learnt from: djns99
Repo: NVIDIA/TensorRT-LLM PR: 6915
File: cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_kernels.cu:4010-4012
Timestamp: 2025-08-14T23:23:27.449Z
Learning: For MOE (Mixture of Experts) code reviews in TensorRT-LLM, avoid repeatedly suggesting finalize fusion validation checks and safety assertions. The user djns99 has indicated these suggestions are repetitive and unwanted across multiple MOE-related changes.

Applied to files:

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

🧬 Code graph analysis (13)

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py (4)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (4)

seq_len (296-297)

input_pos (300-301)

cache_loc (304-305)

pages_per_seq (308-309)

tensorrt_llm/_torch/attention_backend/flashinfer.py (1)

page_size (197-201)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py (1)

tensorrt_llm/_torch/auto_deploy/utils/node_utils.py (1)

extract_op_args (469-506)

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py (2)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py (4)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (4)

seq_len (296-297)

input_pos (300-301)

cache_loc (304-305)

pages_per_seq (308-309)

tensorrt_llm/_torch/attention_backend/flashinfer.py (1)

page_size (197-201)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py (4)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (4)

seq_len (296-297)

input_pos (300-301)

cache_loc (304-305)

pages_per_seq (308-309)

tensorrt_llm/_torch/attention_backend/flashinfer.py (1)

page_size (197-201)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py (3)

tensorrt_llm/_torch/auto_deploy/shim/interface.py (1)

CachedSequenceInterface (11-92)

tensorrt_llm/_torch/auto_deploy/utils/node_utils.py (1)

is_op (197-220)

tensorrt_llm/_torch/auto_deploy/transform/interface.py (4)

BaseTransform (217-504)

SharedConfig (61-66)

TransformInfo (121-178)

TransformRegistry (507-535)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (2)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py (2)

tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (4)

seq_len (296-297)

_get_sanitized_seq_len (388-428)

to (465-472)

device (190-191)

tensorrt_llm/_torch/modules/mamba/mamba2_metadata.py (1)

cu_seqlens_to_chunk_indices_offsets (24-85)

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py (2)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py (3)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/utils/node_utils.py (1)

extract_op_args (469-506)

tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (3)

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

chunk_size (128-131)

tensorrt_llm/_torch/auto_deploy/models/factory.py (1)

chunk_size (198-200)

tensorrt_llm/_torch/auto_deploy/llm.py (1)

factory (110-113)

🪛 Ruff (0.14.4)

tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py

165-165: Unused function argument: chunk_size

(ARG001)

217-217: Unused function argument: input_pos

(ARG001)

217-217: Unused function argument: pages_per_seq

(ARG001)

217-217: Unused function argument: slot_idx

(ARG001)

217-217: Unused function argument: page_size

(ARG001)

217-217: Unused function argument: chunk_size

(ARG001)

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py

294-294: Unused function argument: chunk_size

(ARG001)

312-312: Unused function argument: pages_per_seq

(ARG001)

312-312: Unused function argument: slot_idx

(ARG001)

312-312: Unused function argument: page_size

(ARG001)

312-312: Unused function argument: chunk_size

(ARG001)

tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py

185-185: Unused function argument: chunk_size

(ARG001)

200-200: Unused function argument: position_ids

(ARG001)

200-200: Unused function argument: pages_per_seq

(ARG001)

200-200: Unused function argument: slot_idx

(ARG001)

200-200: Unused function argument: page_size

(ARG001)

200-200: Unused function argument: chunk_size

(ARG001)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_mamba.py

123-123: Unused function argument: chunk_size

(ARG001)

147-147: Unused function argument: input_pos

(ARG001)

147-147: Unused function argument: cache_loc

(ARG001)

147-147: Unused function argument: pages_per_seq

(ARG001)

147-147: Unused function argument: page_size

(ARG001)

147-147: Unused function argument: chunk_size

(ARG001)

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py

43-43: Prefer next(iter(node.users.keys())) over single element slice

Replace with next(iter(node.users.keys()))

(RUF015)

82-82: Unused method argument: cm

(ARG002)

83-83: Unused method argument: factory

(ARG002)

84-84: Unused method argument: shared_config

(ARG002)

99-99: Consider [*list(conv_node.args[:-1]), activation_name] instead of concatenation

Replace with [*list(conv_node.args[:-1]), activation_name]

(RUF005)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

32-32: Unused function argument: cache_loc

(ARG001)

33-33: Unused function argument: pages_per_seq

(ARG001)

35-35: Unused function argument: page_size

(ARG001)

98-98: Unused function argument: input_pos

(ARG001)

98-98: Unused function argument: cache_loc

(ARG001)

98-98: Unused function argument: pages_per_seq

(ARG001)

98-98: Unused function argument: page_size

(ARG001)

98-98: Unused function argument: chunk_size

(ARG001)

260-260: Unused function argument: cu_seqlens

(ARG001)

261-261: Unused function argument: chunk_indices

(ARG001)

262-262: Unused function argument: chunk_offsets

(ARG001)

263-263: Unused function argument: batch_info_tensor

(ARG001)

tensorrt_llm/_torch/auto_deploy/custom_ops/torch_backend_attention.py

366-366: Unused function argument: chunk_size

(ARG001)

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

300-300: Unused noqa directive (non-enabled: E501)

Remove unused noqa directive

(RUF100)

361-361: Unused function argument: dtype

(ARG001)

361-361: Unused function argument: block_shape

(ARG001)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py

64-64: Unused function argument: chunk_size

(ARG001)

85-85: Unused function argument: input_pos

(ARG001)

85-85: Unused function argument: cache_loc

(ARG001)

85-85: Unused function argument: pages_per_seq

(ARG001)

85-85: Unused function argument: page_size

(ARG001)

85-85: Unused function argument: chunk_size

(ARG001)

232-232: Unused function argument: activation

(ARG001)

🔇 Additional comments (18)

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/torch_backend_causal_conv.py (1)

354-363: Approved with a note on error handling.

The implementation correctly handles the optional activation parameter extraction with appropriate fallback to None. The try/except block accounts for cases where the parameter doesn't exist in the source node (as noted in the comment, it may be added by fusion later).

Note: The broad exception catching (RuntimeError, IndexError) follows the pattern from extract_op_args which can raise RuntimeError when a parameter is not found. This is acceptable given the optional nature of the parameter.

tensorrt_llm/_torch/auto_deploy/config/default.yaml (1)

168-169: LGTM!

The new fuse_causal_conv_activation transform is correctly placed at the compile stage, which is appropriate for fusion optimizations that occur after cache initialization and before model compilation.

tensorrt_llm/_torch/auto_deploy/models/hf.py (1)

127-131: LGTM!

The chunk_size property implementation follows the established pattern used by vocab_size_padded above it, correctly retrieving the value from the model config with an appropriate None fallback.

tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_cuda_causal_conv_cached_op.py (1)

85-85: LGTM!

The test is correctly updated to pass None for the new activation parameter, maintaining backward compatibility while adapting to the extended op signature.

tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)

124-124: LGTM!

The chunk_size parameter is correctly passed from the factory to SequenceInfo, enabling chunk-based processing throughout the pipeline.

tensorrt_llm/_torch/auto_deploy/custom_ops/mla.py (1)

185-185: Signature extension for interface consistency.

The chunk_size parameter is added to maintain consistency with other prepare_*_metadata interfaces being updated across the codebase. While currently unused in this implementation, it ensures a uniform signature for future enhancements.

Also applies to: 200-200
tensorrt_llm/_torch/auto_deploy/custom_ops/attention_interface.py (3)
91-91: LGTM!

The chunk_size parameter is correctly added as an optional attribute with appropriate initialization and storage.

Also applies to: 118-118

179-179: LGTM!

Adding "chunk_size" to _cached_constants correctly enables it to be passed as a constant argument to prepare_metadata operations.

169-169: Update tests to match new slot_idx dtype specification.

The change to torch.long is valid and aligns with PyTorch's indexing requirements (operations like index_select() and index_copy_() require torch.long dtype). However, existing tests create slot_idx with dtype=torch.int32. While the implementation currently handles the conversion via .to(torch.long), tests should be updated to match the new interface specification in attention_interface.py:169.

Update the following test files to create slot_idx with dtype=torch.long:

test_triton_mamba_cached_op.py (lines 46, 117)

test_torch_causal_conv_cached_op.py (lines 47, 111, 178)

test_torch_attention_op.py (line 478)

test_cuda_causal_conv_cached_op.py (lines 49, 115, 187)

test_torch_mamba_cached_op.py (lines 55, 127, 192)
⛔ Skipped due to learnings
Learnt from: ixlmar
Repo: NVIDIA/TensorRT-LLM PR: 7294
File: tensorrt_llm/_torch/pyexecutor/sampler.py:1068-1085
Timestamp: 2025-08-28T10:21:46.652Z
Learning: torch.index_select works with int32 indices in practice despite documentation stating LongTensor requirement. In TensorRT-LLM codebase, int32 indices are used intentionally and work correctly.
tensorrt_llm/_torch/auto_deploy/custom_ops/flashinfer_attention.py (1)

165-165: Signature extension for interface consistency.

The chunk_size parameter is added to maintain a uniform signature across all prepare_*_metadata operations in the codebase. While not currently utilized by FlashInfer's metadata preparation, this ensures interface consistency for future chunked prefill support.

Also applies to: 217-217

tensorrt_llm/_torch/auto_deploy/custom_ops/triton_attention.py (1)

286-295: Interface expansion: chunk_size parameter added but not yet used.

The chunk_size parameter has been added to maintain consistency with other metadata preparation functions across the codebase. While currently unused, this is part of a coordinated API expansion for future chunk-based processing support.

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py (5)

116-116: Well-structured activation parameter propagation.

The optional activation parameter is correctly threaded through the prefill and decode paths, with proper propagation to both causal_conv1d_fn and causal_conv1d_update. The try/except block in get_constants appropriately handles cases where the activation parameter is added later by the fusion transform.

Also applies to: 180-180, 199-199, 232-232, 299-304

190-192: Improved clarity with slice-based token selection.

The change from index-based mapping to explicit slicing makes the decode path token selection more readable and maintainable.

209-210: Appropriate optimization: removed unnecessary contiguous() call.

Since y is allocated with torch.empty() at line 140, it's already contiguous. The .contiguous() call was redundant. The comment correctly notes that y is not an alias of any input tensor.

185-186: No issues found. The dtype optimization is safe.

The dtype flow is consistent throughout the operation:

y is initialized with dtype=input.dtype (line 140)

causal_conv1d_fn returns its input tensor unchanged (line 74 of causal_conv1d.py), preserving the input dtype

y_varlen inherits input.dtype from the function return

y_prefill = y_varlen.transpose(0, 1) preserves dtype through transpose

Both y_flat[:total_prefill_tokens] and y_prefill have matching dtype, making the .to(y_flat.dtype) cast redundant

The removal of the explicit cast is a valid optimization.

205-207: Verify dtype compatibility for decode path—manual testing recommended.

The dtype concern is valid but unverifiable from the Python wrapper alone. Line 207's copy_ operation assumes y_dec (returned from causal_conv1d_update) preserves the dtype of its input x_decode. However, the underlying CUDA kernel implementation is not accessible in the codebase, making it impossible to confirm dtype preservation behavior. Test this with different input dtypes (e.g., float16, bfloat16) to ensure copy_ succeeds without unexpected conversions.

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_causal_conv.py (2)

15-58: Pattern matcher correctly identifies fusible activations.

The pattern matching logic properly identifies causal conv nodes with a single activation user. Currently supports SiLU with clear extensibility points for additional activations. The implementation is sound.

95-110: The fusion logic correctly assumes activation is the last parameter—verified against the signature.

The _cuda_cached_causal_conv1d function signature confirms activation is the final parameter, making the code at lines 99–100 correct: list(conv_node.args[:-1]) + [activation_name] properly constructs the new arguments.

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta · 2025-11-13T02:47:08Z

/bot run

tensorrt-cicd · 2025-11-13T02:54:09Z

PR_Github #24370 [ run ] triggered by Bot. Commit: f2ec3d8

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py

tensorrt-cicd · 2025-11-13T06:44:13Z

PR_Github #24370 [ run ] completed with state SUCCESS. Commit: f2ec3d8
/LLM/main/L0_MergeRequest_PR pipeline #18392 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

…version of the op Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta · 2025-11-13T06:58:23Z

adding @atrifex to review on behalf of oss-compliance

suyoggupta · 2025-11-13T07:00:36Z

/bot run

tensorrt-cicd · 2025-11-13T07:07:08Z

PR_Github #24416 [ run ] triggered by Bot. Commit: b15a2f8

tensorrt-cicd · 2025-11-13T09:28:15Z

PR_Github #24416 [ run ] completed with state SUCCESS. Commit: b15a2f8
/LLM/main/L0_MergeRequest_PR pipeline #18422 completed with status: 'FAILURE'

suyoggupta · 2025-11-13T17:24:48Z

/bot run

tensorrt-cicd · 2025-11-13T17:31:29Z

PR_Github #24484 [ run ] triggered by Bot. Commit: b15a2f8

tensorrt-cicd · 2025-11-13T21:36:39Z

PR_Github #24484 [ run ] completed with state SUCCESS. Commit: b15a2f8
/LLM/main/L0_MergeRequest_PR pipeline #18478 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta · 2025-11-14T01:06:16Z

/bot reuse-pipeline

tensorrt-cicd · 2025-11-14T01:13:17Z

PR_Github #24525 [ reuse-pipeline ] triggered by Bot. Commit: 33ef830

tensorrt-cicd · 2025-11-14T01:38:11Z

PR_Github #24525 [ reuse-pipeline ] completed with state SUCCESS. Commit: 33ef830
Reusing PR_Github #24484 for commit 33ef830

…NVIDIA#9083) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta requested a review from a team as a code owner November 12, 2025 05:52

suyoggupta requested a review from lucaslie November 12, 2025 05:52

github-project-automation bot moved this to Backlog in AutoDeploy Board Nov 12, 2025

github-project-automation bot added this to AutoDeploy Board Nov 12, 2025

suyoggupta commented Nov 12, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py Show resolved Hide resolved

suyoggupta commented Nov 12, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py Show resolved Hide resolved

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

suyoggupta requested review from a team as code owners November 12, 2025 06:22

suyoggupta requested review from 2ez4bz, Shixiaowei02, pcastonguay, tomeras91 and yuanjingx87 November 12, 2025 06:22

add default chunk size in attn i/f

f2ec3d8

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

nvchenghaoz approved these changes Nov 13, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/triton_backend_mamba.py Outdated Show resolved Hide resolved

suyoggupta added 2 commits November 12, 2025 19:07

remove commented code

1bd9efa

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

add tracking ticket

e8ce873

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

lucaslie approved these changes Nov 13, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/compile/backends/torch_cudagraph.py Show resolved Hide resolved

make the mamba custom op torch.compile compatible by fixing the fake …

b15a2f8

…version of the op Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

suyoggupta requested a review from atrifex November 13, 2025 06:57

govind-ramnarayan approved these changes Nov 13, 2025

View reviewed changes

lucaslie reviewed Nov 13, 2025

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/mamba/cuda_backend_causal_conv.py Show resolved Hide resolved

update license

befaa83

Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

atrifex approved these changes Nov 14, 2025

View reviewed changes

Merge branch 'main' into sg/mamba-prefill

33ef830

suyoggupta merged commit d12cb94 into NVIDIA:main Nov 14, 2025
5 checks passed

github-project-automation bot moved this from Backlog to Done in AutoDeploy Board Nov 14, 2025

Wanli-Jiang mentioned this pull request Nov 18, 2025

[None][feat] Nano-v3 stack PRs v2 #9062

Closed

zheyuf pushed a commit to zheyuf/TensorRT-LLM that referenced this pull request Nov 19, 2025

[None][feat] Autodeploy add triton configs and optimize mamba prefill (…

6be49f5

…NVIDIA#9083) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

greg-kwasniewski1 pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Nov 20, 2025

[None][feat] Autodeploy add triton configs and optimize mamba prefill (…

34bcbaf

…NVIDIA#9083) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>

[None][feat] Autodeploy add triton configs and optimize mamba prefill #9083

[None][feat] Autodeploy add triton configs and optimize mamba prefill #9083

Uh oh!

Conversation

suyoggupta commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

suyoggupta commented Nov 13, 2025

Uh oh!

tensorrt-cicd commented Nov 13, 2025

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Nov 13, 2025

Uh oh!

suyoggupta commented Nov 13, 2025

Uh oh!

suyoggupta commented Nov 13, 2025

Uh oh!

tensorrt-cicd commented Nov 13, 2025

Uh oh!

tensorrt-cicd commented Nov 13, 2025

Uh oh!

suyoggupta commented Nov 13, 2025

Uh oh!

tensorrt-cicd commented Nov 13, 2025

Uh oh!

tensorrt-cicd commented Nov 13, 2025

Uh oh!

Uh oh!

suyoggupta commented Nov 14, 2025

Uh oh!

tensorrt-cicd commented Nov 14, 2025

Uh oh!

tensorrt-cicd commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

suyoggupta commented Nov 12, 2025 •

edited

Loading

coderabbitai bot commented Nov 12, 2025 •

edited

Loading