[#11094][feat] AutoDeploy transform to fuse silu+mul by MrGeva · Pull Request #12497 · NVIDIA/TensorRT-LLM

MrGeva · 2026-03-24T10:41:02Z

The fusion replaces separate narrow → silu +
narrow → mul ops with a single silu_and_mul kernel (using
FlashInfer's fused implementation when available).

llama 8B fp8 1k/2k/64 H100 CW perf:
+3.3% output throughput (7,626 → 7,880 tok/s) and -3.2%
latency (8,702 → 8,420 ms avg) with the SiLU+Mul fusion enabled
across all 32 Llama layers.

Summary by CodeRabbit

New Features
- Added fused SiLU+Mul activation operation optimization to enhance model inference performance.
Chores
- Added new post-load-fusion transform configuration with SiLU+Mul fusion capabilities.
- Updated Llama 3.1 8B model registry configuration to enable the optimization.
Tests
- Added comprehensive unit tests covering basic fusion, multi-layer models, and configuration toggles.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

MrGeva · 2026-03-24T10:45:27Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

coderabbitai · 2026-03-24T10:45:42Z

📝 Walkthrough

Walkthrough

Adds a new fuse_silu_mul optimization that fuses SiLU activation and multiplication operations into a single custom CUDA kernel. Includes configuration registration, custom operator implementation, an FX graph transform to identify and replace the pattern, and comprehensive unit tests validating the optimization.

Changes

Cohort / File(s)	Summary
Configuration `examples/auto_deploy/model_registry/configs/llama3_1_8b.yaml`, `tensorrt_llm/_torch/auto_deploy/config/default.yaml`	Added `fuse_silu_mul` transform configuration with post-load-fusion stage; disabled by default in common config, enabled in Llama 3.1 8B model config.
Custom Operation `tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py`, `tensorrt_llm/_torch/auto_deploy/custom_ops/linear/__init__.py`	Implemented custom `auto_deploy::silu_and_mul` operator that fuses silu activation with multiplication, delegating to flashinfer kernel when available; exported new operation from module.
FX Graph Transform `tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py`	Implemented `FuseSiluMul` transform that scans FX graphs for mul nodes with silu+narrow pattern, replaces them with fused custom op calls, and cleans up dead code.
Unit Tests `tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py`	Added CUDA-only tests covering basic fusion, multi-layer models, output correctness verification, and disabled-transform behavior using SwiGLU MLP models.

Sequence Diagram

sequenceDiagram
    participant Config as Configuration System
    participant Model as Model/Export
    participant Optimizer as InferenceOptimizer
    participant Transform as FuseSiluMul Transform
    participant Graph as FX Graph Module
    participant CustomOp as Custom silu_and_mul Op
    
    Config->>Optimizer: Load fuse_silu_mul config (enabled)
    Model->>Optimizer: Export model to FX graph
    Optimizer->>Transform: Apply FuseSiluMul transform
    Transform->>Graph: Scan for mul nodes
    Graph-->>Transform: Identify mul(silu(narrow), narrow) pattern
    Transform->>CustomOp: Replace with silu_and_mul custom op
    CustomOp-->>Graph: Fused operation inserted
    Transform->>Graph: eliminate_dead_code() & recompile()
    Graph-->>Optimizer: Optimized FX graph returned
    Optimizer-->>Model: Model ready for inference

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description contains performance metrics and fusion details but lacks required template sections (Description, Test Coverage, and PR Checklist completion).	Fill out the PR description template: provide a detailed explanation of the issue/solution in the Description section, list all relevant tests in Test Coverage, and complete the PR Checklist.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding an AutoDeploy transform to fuse silu+mul operations, which aligns with the core changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py (1)

120-140: Prefix unused variable with underscore.

The half_size returned from _try_fuse_mul is unpacked but not used.

♻️ Proposed fix

-            fused_parent, half_size = result
+            fused_parent, _half_size = result

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py` around
lines 120 - 140, The unpacked variable `half_size` from _try_fuse_mul is unused;
change the unpacking in the loop where result is assigned (currently
"fused_parent, half_size = result") to use a prefixed underscore (e.g.,
"fused_parent, _half_size = result" or simply "fused_parent, _ = result") so the
linter-no-unused-variable issue is resolved while keeping the intent of
fuse_silu_mul's logic in the function that calls _try_fuse_mul.

tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py (1)

45-50: Consider using unpacking for the output shape construction.

The static analysis tool suggests a minor improvement for readability.

♻️ Optional refactor

 `@silu_and_mul.register_fake`
 def _(x: torch.Tensor) -> torch.Tensor:
     """Fake implementation for tracing."""
     half_size = x.shape[-1] // 2
-    output_shape = list(x.shape[:-1]) + [half_size]
+    output_shape = [*x.shape[:-1], half_size]
     return x.new_empty(output_shape, dtype=x.dtype)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py` around lines
45 - 50, The fake implementation registered with silu_and_mul (the function
defined as "_") builds output_shape by concatenating list(x.shape[:-1]) and
[half_size]; change this to use Python unpacking for clarity and brevity (e.g.,
construct output_shape with [*x.shape[:-1], half_size] or tuple form) while
keeping half_size = x.shape[-1] // 2 and returning x.new_empty(output_shape,
dtype=x.dtype).

tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py (1)

128-144: Unused variable and potential missing assertion.

The variable x from _export_model is unpacked but never used in test_fuse_silu_mul_disabled. Either prefix with underscore or remove. Also, consider adding a correctness check similar to other tests for completeness.

♻️ Proposed fix

 `@pytest.mark.skipif`(not torch.cuda.is_available(), reason="requires CUDA")
 def test_fuse_silu_mul_disabled():
     """Test that fusion is skipped when disabled."""
     model = TestModel()
-    gm, x = _export_model(model)
+    gm, _ = _export_model(model)

     transforms = {
         "fuse_gemms_mixed_children": {"stage": "post_load_fusion", "enabled": True},
         "fuse_silu_mul": {"stage": "post_load_fusion", "enabled": False},
     }
     io = InferenceOptimizer(transforms)
     gm = io.run(gm)

     # silu_and_mul should NOT be present when disabled
     assert _count_ops(gm, torch.ops.auto_deploy.silu_and_mul.default) == 0
     # Original ops should still be there
     assert _count_ops(gm, torch.ops.aten.silu.default) >= 1

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py`
around lines 128 - 144, The test unpacks x from _export_model but never uses it;
rename x to _x (or remove it) to avoid the unused-variable warning, and add a
correctness assertion: before calling InferenceOptimizer.run save original_gm =
gm, then after io.run(gm) use the example input (_x) to run both original_gm and
gm and assert their outputs are equal (e.g., with torch.allclose); keep the
existing op-count assertions using _count_ops and the transform names
(fuse_silu_mul, fuse_gemms_mixed_children) intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py`:
- Around line 1-9: Add the NVIDIA Apache-2.0 license header to the top of this
test module (test_fuse_silu_mul.py) so it matches project licensing
requirements; insert the standard NVIDIA Apache 2.0 copyright/license block as
the first lines of the file before any imports (the file currently imports
pytest, torch, Dim and modules like
tensorrt_llm._torch.auto_deploy.custom_ops.linear.silu_mul and others), ensuring
the exact header text used elsewhere in the repo is copied verbatim.
- Line 5: Replace the wildcard import from
tensorrt_llm._torch.auto_deploy.custom_ops.linear.silu_mul with an explicit
module import to preserve namespace and still trigger registration (e.g., import
tensorrt_llm._torch.auto_deploy.custom_ops.linear.silu_mul as silu_mul); update
the test to reference the module if needed and remove the wildcard import to
satisfy linting and the coding guideline that modules—not wildcard names—should
be imported.

---

Nitpick comments:
In `@tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py`:
- Around line 45-50: The fake implementation registered with silu_and_mul (the
function defined as "_") builds output_shape by concatenating list(x.shape[:-1])
and [half_size]; change this to use Python unpacking for clarity and brevity
(e.g., construct output_shape with [*x.shape[:-1], half_size] or tuple form)
while keeping half_size = x.shape[-1] // 2 and returning
x.new_empty(output_shape, dtype=x.dtype).

In `@tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py`:
- Around line 120-140: The unpacked variable `half_size` from _try_fuse_mul is
unused; change the unpacking in the loop where result is assigned (currently
"fused_parent, half_size = result") to use a prefixed underscore (e.g.,
"fused_parent, _half_size = result" or simply "fused_parent, _ = result") so the
linter-no-unused-variable issue is resolved while keeping the intent of
fuse_silu_mul's logic in the function that calls _try_fuse_mul.

In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py`:
- Around line 128-144: The test unpacks x from _export_model but never uses it;
rename x to _x (or remove it) to avoid the unused-variable warning, and add a
correctness assertion: before calling InferenceOptimizer.run save original_gm =
gm, then after io.run(gm) use the example input (_x) to run both original_gm and
gm and assert their outputs are equal (e.g., with torch.allclose); keep the
existing op-count assertions using _count_ops and the transform names
(fuse_silu_mul, fuse_gemms_mixed_children) intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9c74f8e1-89a1-412c-8620-420fade508c6

📥 Commits

Reviewing files that changed from the base of the PR and between 779693e and 445d911.

📒 Files selected for processing (6)

examples/auto_deploy/model_registry/configs/llama3_1_8b.yaml
tensorrt_llm/_torch/auto_deploy/config/default.yaml
tensorrt_llm/_torch/auto_deploy/custom_ops/linear/__init__.py
tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py
tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py
tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py

tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py

tensorrt-cicd · 2026-03-24T10:51:00Z

PR_Github #40107 [ run ] triggered by Bot. Commit: a285cab Link to invocation

tensorrt-cicd · 2026-03-24T14:52:32Z

PR_Github #40107 [ run ] completed with state SUCCESS. Commit: a285cab
/LLM/main/L0_MergeRequest_PR pipeline #31258 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

MrGeva · 2026-03-25T14:06:57Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-25T14:12:52Z

PR_Github #40333 [ run ] triggered by Bot. Commit: c32c96f Link to invocation

tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py

tensorrt_llm/_torch/auto_deploy/config/default.yaml

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py

MrGeva · 2026-03-25T16:07:14Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-25T16:13:12Z

PR_Github #40349 [ run ] triggered by Bot. Commit: aa5943e Link to invocation

tensorrt-cicd · 2026-03-25T16:13:14Z

PR_Github #40333 [ run ] completed with state ABORTED. Commit: c32c96f

Link to invocation

tensorrt-cicd · 2026-03-25T22:16:45Z

PR_Github #40349 [ run ] completed with state SUCCESS. Commit: aa5943e
/LLM/main/L0_MergeRequest_PR pipeline #31453 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

MrGeva · 2026-03-30T08:40:07Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-30T08:46:27Z

PR_Github #40705 [ run ] triggered by Bot. Commit: aa5943e Link to invocation

tensorrt-cicd · 2026-03-30T16:56:38Z

PR_Github #40705 [ run ] completed with state SUCCESS. Commit: aa5943e
/LLM/main/L0_MergeRequest_PR pipeline #31732 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

MrGeva · 2026-03-30T19:23:24Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

tensorrt-cicd · 2026-03-30T19:30:27Z

PR_Github #40766 [ run ] triggered by Bot. Commit: aa5943e Link to invocation

tensorrt-cicd · 2026-03-30T23:24:28Z

PR_Github #40766 [ run ] completed with state SUCCESS. Commit: aa5943e
/LLM/main/L0_MergeRequest_PR pipeline #31782 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tensorrt-cicd · 2026-03-31T13:58:29Z

PR_Github #40949 [ run ] triggered by Bot. Commit: 759b620 Link to invocation

tensorrt-cicd · 2026-03-31T13:58:31Z

PR_Github #40940 [ run ] completed with state ABORTED. Commit: aa5943e

Link to invocation

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py

tensorrt-cicd · 2026-03-31T22:38:51Z

PR_Github #40949 [ run ] completed with state FAILURE. Commit: 759b620
/LLM/main/L0_MergeRequest_PR pipeline #31937 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

MrGeva · 2026-04-05T13:40:40Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

tensorrt-cicd · 2026-04-05T13:46:34Z

PR_Github #41877 [ run ] triggered by Bot. Commit: f16676f Link to invocation

tensorrt-cicd · 2026-04-05T20:23:37Z

PR_Github #41877 [ run ] completed with state SUCCESS. Commit: f16676f
/LLM/main/L0_MergeRequest_PR pipeline #32743 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

MrGeva · 2026-04-06T04:19:54Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

tensorrt-cicd · 2026-04-06T04:26:12Z

PR_Github #41905 [ run ] triggered by Bot. Commit: f16676f Link to invocation

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

InferenceOptimizer API changed to require factory and config args. Update tests to apply transforms directly via TransformRegistry, and replace wildcard import with explicit module import. Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

…8 support - Custom op: only register when FlashInfer is available (no fallback) - Transform: use ADPatternMatcherPass for narrow+silu+mul variant, direct graph walk for quantized getitem+silu+mul variant - Move fuse_silu_mul before fuse_fp8_linear in default.yaml and enable by default; remove redundant override from llama3_1_8b config - Test: skip when FlashInfer unavailable, use TransformRegistry directly Benchmarked on Llama-3.1-8B-Instruct-FP8 (1k ISL, 2k OSL, 64 reqs): +2.7% token throughput (10669 vs 10383 tok/s) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

- Add test for Variant 2 (getitem+silu+mul from contiguous-split GEMM fusion) - Skip run_shape_prop when no torch.narrow ops exist (avoids ~1s overhead on FP8-only configs where only Variant 2 matches) - Add even-size assertion in register_fake for silu_and_mul custom op - Add NVIDIA copyright header to test file Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

MrGeva · 2026-04-06T07:08:56Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

tensorrt-cicd · 2026-04-06T07:15:11Z

PR_Github #41918 [ run ] triggered by Bot. Commit: 87bd97e Link to invocation

tensorrt-cicd · 2026-04-06T14:51:47Z

PR_Github #41918 [ run ] completed with state SUCCESS. Commit: 87bd97e
/LLM/main/L0_MergeRequest_PR pipeline #32777 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

MrGeva · 2026-04-06T15:14:00Z

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

tensorrt-cicd · 2026-04-06T15:20:08Z

PR_Github #41939 [ run ] triggered by Bot. Commit: 87bd97e Link to invocation

tensorrt-cicd · 2026-04-06T18:27:29Z

PR_Github #41939 [ run ] completed with state SUCCESS. Commit: 87bd97e
/LLM/main/L0_MergeRequest_PR pipeline #32796 completed with status: 'SUCCESS'

CI Report

Link to invocation

) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

MrGeva requested a review from a team as a code owner March 24, 2026 10:41

MrGeva requested a review from greg-kwasniewski1 March 24, 2026 10:41

github-actions bot assigned MrGeva Mar 24, 2026

MrGeva marked this pull request as draft March 24, 2026 10:41

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py Show resolved Hide resolved

tests/unittest/auto_deploy/singlegpu/transformations/library/test_fuse_silu_mul.py Outdated Show resolved Hide resolved

MrGeva marked this pull request as ready for review March 25, 2026 13:57

MrGeva requested a review from a team as a code owner March 25, 2026 13:57

MrGeva marked this pull request as draft March 25, 2026 13:58

MrGeva marked this pull request as ready for review March 25, 2026 14:07

MrGeva enabled auto-merge (squash) March 25, 2026 14:13

lucaslie reviewed Mar 25, 2026

View reviewed changes

tensorrt_llm/_torch/auto_deploy/custom_ops/linear/silu_mul.py Outdated Show resolved Hide resolved

tensorrt_llm/_torch/auto_deploy/config/default.yaml Outdated Show resolved Hide resolved

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py Show resolved Hide resolved

MrGeva disabled auto-merge March 25, 2026 16:07

LarryXFly approved these changes Mar 27, 2026

View reviewed changes

galagam mentioned this pull request Mar 30, 2026

[AutoDeploy] Support silu+mul fusion #11094

Closed

1 task

lucaslie approved these changes Mar 31, 2026

View reviewed changes

tensorrt_llm/_torch/auto_deploy/transform/library/fuse_silu_mul.py Show resolved Hide resolved

MrGeva force-pushed the eg/silumul branch from 759b620 to c879b89 Compare March 31, 2026 15:37

MrGeva added the AutoDeploy/llmc-blocker <NV> Tag for issues that are blocking AutoDeploy standalone repo label Apr 5, 2026

MrGeva force-pushed the eg/silumul branch 3 times, most recently from 7e9299b to f16676f Compare April 5, 2026 13:39

MrGeva added 5 commits April 6, 2026 10:08

fuse silu+mul

d075575

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

enabled in acc test

e3fd311

Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

MrGeva force-pushed the eg/silumul branch from f16676f to 87bd97e Compare April 6, 2026 07:08

MrGeva merged commit 8b7cc40 into NVIDIA:main Apr 6, 2026
5 checks passed

xinhe-nv pushed a commit to xinhe-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026

[NVIDIA#11094][feat] AutoDeploy transform to fuse silu+mul (NVIDIA#12497

a825278

) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026

[NVIDIA#11094][feat] AutoDeploy transform to fuse silu+mul (NVIDIA#12497

58108cf

) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026

[NVIDIA#11094][feat] AutoDeploy transform to fuse silu+mul (NVIDIA#12497

be6f2a5

) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>

Conversation

MrGeva commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

MrGeva commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

tensorrt-cicd commented Mar 24, 2026

Uh oh!

MrGeva commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrGeva commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

tensorrt-cicd commented Mar 25, 2026

Uh oh!

MrGeva commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

MrGeva commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 30, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

Uh oh!

tensorrt-cicd commented Mar 31, 2026

Uh oh!

MrGeva commented Apr 5, 2026

Uh oh!

tensorrt-cicd commented Apr 5, 2026

Uh oh!

tensorrt-cicd commented Apr 5, 2026

Uh oh!

MrGeva commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

MrGeva commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

Uh oh!

MrGeva commented Apr 6, 2026

Uh oh!

tensorrt-cicd commented Apr 6, 2026

MrGeva commented Mar 24, 2026 •

edited

Loading