Skip to content

[None][feat] Resubmission of the routing refactor in trtllmgen#13328

Merged
Funatiq merged 10 commits intoNVIDIA:mainfrom
ChristinaZ:refactor_routing2
May 5, 2026
Merged

[None][feat] Resubmission of the routing refactor in trtllmgen#13328
Funatiq merged 10 commits intoNVIDIA:mainfrom
ChristinaZ:refactor_routing2

Conversation

@ChristinaZ
Copy link
Copy Markdown
Collaborator

@ChristinaZ ChristinaZ commented Apr 22, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added SigmoidRenorm routing method for MoE operations
    • Expanded support for float32 data types in routing inputs (bias and logits)
    • Enhanced SM90+ hardware optimization with cluster-based execution support
  • Improvements

    • Refactored MoE routing system for improved flexibility and maintainability
    • Better routing method configuration with policy-driven architecture

Description

This PR fixes the issues introduced by #12246.
Previously, I had skipped the C++ unit tests and there were failing cases.
In this PR, I’ve addressed those failures and fixed the related bugs.

Test Coverage

cd cpp/build
make -j$(nproc) google-tests
./tests/unit_tests/kernels/routingKernelsTest 

pytest tests/unittest/_torch/modules/moe/test_moe_module.py::test_fp32_routing_bias -v

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@ChristinaZ ChristinaZ requested review from a team as code owners April 22, 2026 08:29
@ChristinaZ ChristinaZ changed the title Refactor routing2 [None][feat] Resubmission of the routing refactor in trtllmgen Apr 22, 2026
@ChristinaZ ChristinaZ requested review from litaotju, longlee0622 and xxi-nv and removed request for hlu1, mikeiovine, symphonylyh and yizhang-nv April 22, 2026 08:30
@ChristinaZ
Copy link
Copy Markdown
Collaborator Author

/bot help

@github-actions
Copy link
Copy Markdown

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@ChristinaZ
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44940 [ run ] triggered by Bot. Commit: 8dd4c75 Link to invocation

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

This PR refactors the TensorRT-LLM MoE routing kernel infrastructure from macro-based dispatch to a unified policy-driven system. It consolidates multiple routing implementations (renormalize, DeepSeek) into a single custom routing framework with configurable preprocess/postprocess policies. Runtime configuration replaces compile-time template parameters for flexibility. Old dispatch macros and separate routing implementations are removed and replaced with new infrastructure (RoutingCustomPolicy, RoutingDevKernel, RoutingFromTopKIds). Python layers and tests are updated to support new routing methods and dtype handling.

Changes

Cohort / File(s) Summary
Old Dispatch Infrastructure Removed
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h
Removed legacy routing/launch preprocessor macros (LAUNCH_TILEN, LAUNCH_ROUTING_LLAMA4, LAUNCH_ROUTING_WITH_NUM_EXPERTS_FORCE_FLOAT_INPUT, LAUNCH_ROUTING_WITH_NUM_EXPERTS) previously used for MoE kernel dispatch.
Old Routing Implementation Files Deleted
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/..., cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/...
Removed entire old DeepSeek and Renormalize routing orchestration and launcher implementations; consolidated into new unified framework. Deleted: RoutingDeepSeekCommon.cuh, RoutingRenormalizeCommon.cuh, launchMainKernel.cu, launchClusterKernel.cu, launchCoopKernel.cu, launchHistogramKernel.cu, launchHistogramScoresKernel.cu, launchInitExpertCounts.cu, launchOffsetsKernel.cu.
New Policy-Driven Routing Infrastructure
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDevKernel.h
Added comprehensive policy-based expert routing dispatch system with preprocess (NoOp, Softmax, Sigmoid, SigmoidBias) and postprocess policies (NoOp, Softmax, SumNormalize, ScaledSumNormalize). Implemented tier-based compilation and runtime dispatch with macros (LAUNCH_ROUTING_CUSTOM, LAUNCH_ROUTING_WITH_POLICIES, etc.) that select kernel parameters and manage kernel launches with programmatic launch dependency (PDL) support.
New Unified Routing Implementations
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDeepSeek.cu, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingFromTopKIds.cu
Added new implementations: routingCustom::run with policy-based control flow supporting dynamic/static/cluster kernels and optional post-TopK histogram pipelines; routingDeepSeek::run with DeepSeek-specific routing biases and grouped TopK; shared runPostTopKPipeline template for unified post-TopK processing across routing methods.
Core Kernel Headers Refactored
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.h, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.cuh
Removed compile-time boolean template parameters (isPow2_, UsePdl_) from KernelParamsBase; replaced with runtime fields mUsePdl and mIsPow2. Added routing-specific enums (RoutingPreprocessType, RoutingPostprocessType) and refactored routingCustom::KernelParams to use template ExpertSelectPolicy_ type. Added helper kernels (routingIndicesCoopKernel) and device functions (loadScalar, getExpertIdxFromInput). Changed PDL trigger timing and made isPow2 runtime-configurable.
Routing Methods Updated (DeepSeek, Llama4)
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingLlama4.cu
Converted compile-time template conditionals to runtime checks (params.mUsePdl, params.mIsPow2). Renamed mDtypeExpWmDtypeOutput. Added support for TopK-packed input path via runPostTopKPipeline. Updated bias handling to support type-erased void const* mPtrRoutingBias with explicit dtype field.
MoE Runner Updates
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu, cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.h
Extended Runner constructor to accept clusterSizeInBatchDim. Updated routing run signature to accept dtypeRoutingLogits and dtypeRoutingBias parameters. Added RoutingMethodType::SigmoidRenorm. Expanded routing control flow to support routingCustom for new methods. Renamed utility function getMaxNumCtasInBatchDimgetMaxNumCgasInBatchDim (CTA→CGA).
Python Torch Ops Updated
cpp/tensorrt_llm/thop/cuteDslMoeUtilsOp.cpp, cpp/tensorrt_llm/thop/fp4BlockScaleMoe.cpp, cpp/tensorrt_llm/thop/fp8BlockScaleMoe.cpp, cpp/tensorrt_llm/thop/fp8PerTensorScaleMoe.cpp, cpp/tensorrt_llm/thop/mxFp4BlockScaleMoe.cpp
Updated routing input validation to accept both Float and Bfloat16 for logits and bias. Added dtype propagation: compute dtypeRoutingLogits and args.mDtypeBias from input tensor dtypes and pass to routing kernel runner. Removed routing-method-specific dtype restrictions.
Python Routing Infrastructure
tensorrt_llm/_torch/modules/fused_moe/routing.py, tensorrt_llm/_torch/modules/fused_moe/__init__.py, tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py
Added SigmoidRenormMoeRoutingMethod enum value and implementation. Introduced RoutingParams dataclass to bundle routing configuration. Added _extract_routing_params() to centralize routing parameter extraction with support for MiniMaxM2MoeRoutingMethod and new methods. Updated fake output path to recognize MiniMaxM2MoeRoutingMethod for routing bias selection.
Model/Config Updates
tensorrt_llm/_torch/models/modeling_deepseekv3.py, tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py
Changed DeepseekV3Gate bias dtype from backend-dependent to unconditional float32. Extended custom-ops routing initialization to handle MiniMax2 and SigmoidRenorm by adding optional kwargs (callable_e_score_correction_bias, num_experts).
Test Infrastructure Refactored
cpp/tests/unit_tests/kernels/CMakeLists.txt, cpp/tests/unit_tests/kernels/routing/routingTest.h, cpp/tests/unit_tests/kernels/routing/routingTest.cpp, cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp, cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp, cpp/tests/unit_tests/kernels/routing/routingLlama4Test.cpp
Removed old routingRenormalizeTest.cpp. Added comprehensive routingCustomTest.cpp with extensive test coverage for policies, execution paths, and edge cases. Refactored test param construction to use fluent builder API. Updated routingTest.h with new fields (useTopKPackedAsInput, invalidExpertIdValue, preprocessType, postprocessType) and builder pattern. Updated routingDeepSeekTest.cpp and routingLlama4Test.cpp to handle packed TopK input, mixed bias dtypes, and new routing output configurations.
Python Test Updates
tests/unittest/_torch/modules/moe/moe_test_utils.py, tests/unittest/_torch/modules/moe/test_moe_module.py, tests/integration/defs/accuracy/test_llm_api_pytorch.py
Updated routing kernel compatibility check comments to reflect support for SigmoidRenorm and MiniMax2. Removed skip logic for unimplemented routing methods. Added _create_routing_method_with_fp32_bias() and test_fp32_routing_bias() for float32 bias testing. Updated test parameter passing for SigmoidRenormMoeRoutingMethod construction.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested reviewers

  • yizhang-nv
  • symphonylyh
  • xxi-nv
  • yweng0828
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is partially related to the changeset, referring to a real aspect of the change (resubmission of routing refactor), but lacks specificity about the bug fixes mentioned in the description. Consider a more specific title such as '[None][feat] Routing refactor resubmission with C++ unit test fixes' to better capture the actual changes and fixes involved.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The description explains the purpose (fixing issues from PR #12246, addressing failing C++ unit tests) and provides test coverage steps, but lacks detail about what specific bugs were fixed and why those changes are necessary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (4)
tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py (1)

92-103: Reuse one dummy correction bias per tuning session.

callable_e_score_correction_bias generates a new random tensor on every apply(), so autotuning can benchmark different tactics against different expert distributions. Caching one dummy bias here keeps tactic selection reproducible.

One way to make the dummy input stable
     # Get routing method
     routing_cls_kwargs = {}
+    dummy_e_score_correction_bias = None
+
+    if routing_method_type in (RoutingMethodType.DeepSeekV3,
+                               RoutingMethodType.MiniMax2):
+        dummy_e_score_correction_bias = torch.randn(
+            num_experts, dtype=torch.bfloat16, device=hidden_states.device)
+
     if routing_method_type == RoutingMethodType.DeepSeekV3:
         routing_cls_kwargs.update({
             'n_group':
             n_group,
@@
             'routed_scaling_factor':
             routed_scaling_factor,
             'is_fused':
             False,  # fuse_routing_kernel
             'callable_e_score_correction_bias':
-            lambda: torch.randn(
-                num_experts, dtype=torch.bfloat16, device=hidden_states.device)
+            lambda: dummy_e_score_correction_bias
         })
     if routing_method_type == RoutingMethodType.MiniMax2:
         routing_cls_kwargs.update({
             'callable_e_score_correction_bias':
-            lambda: torch.randn(
-                num_experts, dtype=torch.bfloat16, device=hidden_states.device),
+            lambda: dummy_e_score_correction_bias,
             'num_experts':
             num_experts,
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py` around lines 92 -
103, The current lambda assigned to
routing_cls_kwargs['callable_e_score_correction_bias'] creates a new random
tensor on every apply(), breaking autotuning reproducibility; instead, allocate
one dummy bias tensor once (e.g., dummy_e_score_correction_bias =
torch.randn(num_experts, dtype=torch.bfloat16, device=hidden_states.device)) and
set the callable to return that same tensor (e.g., lambda:
dummy_e_score_correction_bias) so MiniMax2 uses a stable dummy correction bias
across the tuning session; update the block handling RoutingMethodType.MiniMax2
to create and close over this cached tensor and keep the existing keys
('callable_e_score_correction_bias', 'num_experts') in routing_cls_kwargs.
tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py (1)

1096-1100: Reuse _extract_routing_params() in the fake path too.

This branch re-implements the same routing-method split that run_moe() and forward_impl() just centralized, so the next routing-method addition has to update three places again.

♻️ Suggested simplification
-            is_deepseek_v3_routing = isinstance(self.routing_method,
-                                                DeepSeekV3MoeRoutingMethod)
-            is_minimax_routing = isinstance(self.routing_method,
-                                            MiniMaxM2MoeRoutingMethod)
-            top_k = self.routing_method.routing_impl.top_k if is_deepseek_v3_routing else self.routing_method.top_k
-            routing_bias = self.routing_method.e_score_correction_bias if (
-                is_deepseek_v3_routing or is_minimax_routing) else None
+            routing_params = self._extract_routing_params()
+            top_k = routing_params.top_k
+            routing_bias = routing_params.routing_bias
             return fp4_block_scale_fake_output_without_finalize(
                 x,
                 self.num_experts,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py` around lines
1096 - 1100, The fake/inactive path currently reimplements the routing-method
split (calculating is_minimax_routing, top_k, routing_bias) instead of reusing
the centralized helper—call the existing _extract_routing_params() helper from
the fake path so it returns the same routing parameters used by run_moe() and
forward_impl(); replace the duplicated logic that computes is_minimax_routing,
top_k and routing_bias with a single call to _extract_routing_params() and use
its returned values to drive the fake path behavior.
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu (1)

59-62: clusterSizeInBatchDim is currently a no-op.

The constructor advertises a second tuning parameter, but the value is dropped on the floor here and never participates in workspace sizing or routing launch decisions. Either persist/use it or remove it until the implementation is ready.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu` around
lines 59 - 62, The constructor Runner::Runner(int32_t tileTokensDim, int32_t
clusterSizeInBatchDim) currently drops clusterSizeInBatchDim; persist it (e.g.,
add a member mClusterSizeInBatchDim and initialize it in the initializer list
alongside mTileTokensDim) and then use mClusterSizeInBatchDim in workspace
sizing and routing launch decisions (the code paths that compute workspace bytes
or choose kernel launch dimensions), or if the tuning parameter is not yet
supported remove clusterSizeInBatchDim from the signature and all callsites;
ensure references are updated to the chosen approach.
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh (1)

586-609: Drop the commented-out sample policy block.

Keeping a disabled implementation in a block comment here makes it easy for the example to drift away from the real dispatch path. Either remove it or gate it with #if defined(...) if you want an opt-in sample.

As per coding guidelines, "Do not use comments to disable code in C++; use #if / #endif or avoid dead code entirely".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`
around lines 586 - 609, Remove the large commented-out sample policy block (the
FirstKExpertSelect struct and its explicit PolicyTraits specialization
referencing TierList/Tier) from RoutingCustomPolicy.cuh; either delete it
entirely or wrap it with a clear compile-time guard like `#if`
defined(SAMPLE_ROUTING_POLICY) / `#endif` so it is not present as dead code in
comments—ensure references to FirstKExpertSelect and the PolicyTraits<T>
specialization are handled accordingly and that no dangling commented symbols
remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu`:
- Around line 954-980: The early-return path that calls runPostTopKPipeline when
precomputed topK is present must still validate sizes: move or duplicate the
bounds checks for data.mTopK (<= MaxSupportedTopExperts), data.mNumExperts (<=
MaxSupportedExperts) and the data.mNumExperts % 4 == 0 check to execute before
the early return that handles data.mPtrTopKIds / data.mPtrTopKPacked; keep the
existing TLLM_CHECK_WITH_INFO validating mPtrTopKWeights when mPtrTopKIds is
provided and then call runPostTopKPipeline only after these validations pass so
oversized precomputed inputs fail fast with the same checks as the
non-precomputed path.
- Around line 648-655: The PDL completion trigger is currently invoked before
Phase 5 writes permutation outputs, so when params.mUsePdl is true we must move
the cudaTriggerProgrammaticLaunchCompletion() call to after Phase 5 finishes
writing mPtrExpandedIdxToPermutedIdx and any other permutation outputs; update
the dyn-block path to place the trigger after the final global writes (same
location/order as the block kernel) so downstream kernels cannot consume
partially written routing results.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`:
- Around line 744-759: The dispatchRoutingPolicy function currently ignores
Data::mPostprocessType for many cases and silently remaps requested (preprocess,
postprocess) pairs; update dispatchRoutingPolicy (the function handling Data,
Fn, and enums RoutingPreprocessType/RoutingPostprocessType) to match on the full
pair instead of only preprocess: for each supported combination explicitly call
fn(...) with the exact (Preprocess, Postprocess) tuple (e.g.,
SigmoidBiasPreprocess + ScaledSumNormalizePostprocess, SigmoidPreprocess +
SumNormalizePostprocess, SoftmaxPreprocess + NoOpPostprocess, SoftmaxPreprocess
+ SumNormalizePostprocess, NoOpPreprocess + SoftmaxPostprocess), and add a final
else branch that fails fast (throw a std::runtime_error or assert/log + exit)
when an unsupported preprocess/postprocess pair is requested so callers cannot
be silently remapped.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDeepSeek.cu`:
- Around line 190-228: The routing bias is being narrowed early: biasVal is cast
to OutputT after loadScalar which loses fp32 precision (problematic when
mDtypeOutput is Bfloat16). Change the logic around biasVal/loadScalar so you
keep the bias in float precision for selection and comparison (use a float bias
variable from loadScalar(params.mPtrRoutingBias, params.mDtypeBias) and only
cast to OutputT when storing into outputs if needed). Update usages around
biasVal, scoreBias and any selection code that compares expert scores (e.g.,
where expertSelected, scoreIdx, smemScoreSigmoid are used) to use the float bias
variable so top-k decisions use full fp32 bias precision.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDevKernel.h`:
- Around line 117-167: The BFloat16 branches in both
LAUNCH_ROUTING_WITH_POLICIES and LAUNCH_ROUTING_WITH_EXPERT_SELECT incorrectly
accept any mDtypeInput; change the third branch condition in each macro from
"else if (data.mDtypeOutput == tg::Dtype::Bfloat16)" to "else if
(data.mDtypeOutput == tg::Dtype::Bfloat16 && data.mDtypeInput ==
tg::Dtype::Fp32)" so the bf16→bf16 kernel is only selected when input is fp32;
keep the final else that calls TLLM_LOG_ERROR("Unsupported dtypeOutput") so
unsupported input/output combinations are rejected.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingFromTopKIds.cu`:
- Around line 42-136: Add a fail-fast input validation in runPostTopKPipeline:
after computing useStaticBlock, useDynBlock, useSingleCluster and before any
routingCustom::launch* calls, check for the unsupported "packed-only without
weights" combination (data.mPtrTopKPacked != nullptr && data.mPtrTopKWeights ==
nullptr) that can lead to garbage writes to
mPtrPermutedIdxSize/mPtrNumNonExitingCtas and histogram fallback corruption; if
that condition is true and the code will take a
non-static/non-dyn/non-single-cluster path (i.e., !(useStaticBlock ||
useDynBlock || useSingleCluster) or when useCoop is possible), call
TLLM_CHECK_WITH_INFO(false, "clear message...") to abort early. Ensure the check
references runPostTopKPipeline, data.mPtrTopKPacked, data.mPtrTopKWeights,
mPtrPermutedIdxSize/mPtrNumNonExitingCtas and the boolean flags so the
validation is placed before any routingCustom::launch* invocations.

In `@cpp/tensorrt_llm/thop/cuteDslMoeUtilsOp.cpp`:
- Around line 79-81: The code sets dtypeRoutingLogits by mapping any non-Float
routing_logits to btg::Dtype::Bfloat16 which silently accepts unsupported
dtypes; update the logic in cuteDslMoeUtilsOp.cpp where dtypeRoutingLogits is
computed (the routing_logits.has_value() branch) to explicitly accept only
at::ScalarType::Float -> btg::Dtype::Fp32 and at::ScalarType::BFloat16 (or the
exact BF16 enum used by your build) -> btg::Dtype::Bfloat16, and otherwise fail
fast (throw an exception or return an error) when routing_logits->scalar_type()
is any other type (e.g., at::ScalarType::Half) so the kernel won't read invalid
data as bf16.

In `@cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp`:
- Around line 1262-1279: The test currently only checks kernel launch for
routingCustom::Data (routingData) with mixed bias dtype; instead add numeric
assertions by computing the CPU reference outputs (using the same reference
helper used by other test paths) and compare the device kernel outputs (the
buffer pointed to by routingData.mPtrRoutingBias or the scores output buffer
produced by routingCustom::run) against that CPU reference with an appropriate
tolerance; ensure you exercise the mDtypeBias / loadScalar path by reading back
the device output into host memory via bufferCast and then assert elementwise
equality/near-equality to the CPU reference (use the same tolerance and
comparison helper used elsewhere in these tests) so the mixed-precision behavior
is validated, and keep calls to routingCustom::run(routingData,
this->mStream->get()) and this->mStream->synchronize() before reading back.
- Around line 145-156: The ScaledSumNormalize oracle currently divides by
sumSigmoid without using the test epsilon, so thread routingData.mSumEpsilon
into the ScaledSumNormalize test logic: update the validation in the
RoutingPostprocessType::ScaledSumNormalize branch to divide by (sumSigmoid +
routingData.mSumEpsilon) when computing expected scores (using symbols
sigmoidScores, expIdx, and param.routedScalingFactor), and modify setParams() to
populate routingData.mSumEpsilon with the intended non-zero test values so the
non-zero-epsilon behavior is actually exercised; apply the same change to the
other occurrence mentioned (lines 235-242).

In `@cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp`:
- Around line 412-459: The test currently only ensures the kernel runs with
mDtypeBias = Fp32; update it to verify correctness by computing a CPU host
reference using the float32 bias (use the same inputs initialized via initData
and float32BiasHost), run moe::dev::routing::routingDeepSeek::run(routingData,
...), copy back the kernel outputs (top-k ids and weights buffers produced by
the test harness) and ASSERT/EXPECT that the device top-k ids/weights match the
host reference within tolerance; locate code around setCommonParams,
routingData, float32BiasHost/float32BiasDevice, routingDeepSeek::run and add the
host-reference computation and comparisons after this->mStream->synchronize() so
the test fails if dtype plumbing is wrong.

In `@cpp/tests/unit_tests/kernels/routing/routingTest.cpp`:
- Around line 301-317: The host-side reference in computePermutation() must not
index expertCountsHostPtr or expertScanCountsHostPtr with out-of-range expert
IDs when hasInvalidTopKInput is true; update computePermutation() (the host
oracle that reads expIdxHostPtr entries) to validate each expertIdx (require
expertIdx >= 0 && expertIdx < param.numExperts) before any access to
expertCountsHostPtr or expertScanCountsHostPtr and skip or set outputs for
invalid entries (e.g., produce -1) so the reference no longer walks past the
buffers for expertIdx >= param.numExperts.

In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 1225-1229: Replace the TRTLLM-only gate with the suite's full
backend capability check: call the same helper used elsewhere
(backend_type.get_quick_skip_reason or backend_type.can_implement pattern)
passing quant_algo, moe_model_config and routing_method_cls (and the custom
n_group/topk_group settings) and if it returns a reason, pytest.skip(reason); do
not use should_skip_trtllm here so unsupported combos like TRTLLM+QuantAlgo.FP8
or custom DeepSeek group/topk configurations are correctly skipped.

---

Nitpick comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`:
- Around line 586-609: Remove the large commented-out sample policy block (the
FirstKExpertSelect struct and its explicit PolicyTraits specialization
referencing TierList/Tier) from RoutingCustomPolicy.cuh; either delete it
entirely or wrap it with a clear compile-time guard like `#if`
defined(SAMPLE_ROUTING_POLICY) / `#endif` so it is not present as dead code in
comments—ensure references to FirstKExpertSelect and the PolicyTraits<T>
specialization are handled accordingly and that no dangling commented symbols
remain.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu`:
- Around line 59-62: The constructor Runner::Runner(int32_t tileTokensDim,
int32_t clusterSizeInBatchDim) currently drops clusterSizeInBatchDim; persist it
(e.g., add a member mClusterSizeInBatchDim and initialize it in the initializer
list alongside mTileTokensDim) and then use mClusterSizeInBatchDim in workspace
sizing and routing launch decisions (the code paths that compute workspace bytes
or choose kernel launch dimensions), or if the tuning parameter is not yet
supported remove clusterSizeInBatchDim from the signature and all callsites;
ensure references are updated to the chosen approach.

In `@tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py`:
- Around line 92-103: The current lambda assigned to
routing_cls_kwargs['callable_e_score_correction_bias'] creates a new random
tensor on every apply(), breaking autotuning reproducibility; instead, allocate
one dummy bias tensor once (e.g., dummy_e_score_correction_bias =
torch.randn(num_experts, dtype=torch.bfloat16, device=hidden_states.device)) and
set the callable to return that same tensor (e.g., lambda:
dummy_e_score_correction_bias) so MiniMax2 uses a stable dummy correction bias
across the tuning session; update the block handling RoutingMethodType.MiniMax2
to create and close over this cached tensor and keep the existing keys
('callable_e_score_correction_bias', 'num_experts') in routing_cls_kwargs.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`:
- Around line 1096-1100: The fake/inactive path currently reimplements the
routing-method split (calculating is_minimax_routing, top_k, routing_bias)
instead of reusing the centralized helper—call the existing
_extract_routing_params() helper from the fake path so it returns the same
routing parameters used by run_moe() and forward_impl(); replace the duplicated
logic that computes is_minimax_routing, top_k and routing_bias with a single
call to _extract_routing_params() and use its returned values to drive the fake
path behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d0ba2c10-bd7b-4c9e-a975-6b32330215d0

📥 Commits

Reviewing files that changed from the base of the PR and between 36fb5f0 and 8dd4c75.

📒 Files selected for processing (48)
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/IntFastDiv.h
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDeepSeek.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDevKernel.h
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingFromTopKIds.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.cuh
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.h
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernelTopK.cuh
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingLlama4.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/RoutingDeepSeekCommon.cuh
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchClusterKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchCoopKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchHistogramKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchInitExpertCounts.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchMainKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchOffsetsKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/RoutingRenormalizeCommon.cuh
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchClusterKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramScoresKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchInitExpertCounts.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchOffsetsKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.h
  • cpp/tensorrt_llm/thop/cuteDslMoeUtilsOp.cpp
  • cpp/tensorrt_llm/thop/fp4BlockScaleMoe.cpp
  • cpp/tensorrt_llm/thop/fp8BlockScaleMoe.cpp
  • cpp/tensorrt_llm/thop/fp8PerTensorScaleMoe.cpp
  • cpp/tensorrt_llm/thop/mxFp4BlockScaleMoe.cpp
  • cpp/tests/unit_tests/kernels/CMakeLists.txt
  • cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp
  • cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp
  • cpp/tests/unit_tests/kernels/routing/routingLlama4Test.cpp
  • cpp/tests/unit_tests/kernels/routing/routingRenormalizeTest.cpp
  • cpp/tests/unit_tests/kernels/routing/routingTest.cpp
  • cpp/tests/unit_tests/kernels/routing/routingTest.h
  • tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py
  • tensorrt_llm/_torch/models/modeling_deepseekv3.py
  • tensorrt_llm/_torch/modules/fused_moe/__init__.py
  • tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py
  • tensorrt_llm/_torch/modules/fused_moe/routing.py
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py
  • tests/unittest/_torch/modules/moe/moe_test_utils.py
  • tests/unittest/_torch/modules/moe/test_moe_module.py
💤 Files with no reviewable changes (17)
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchClusterKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchHistogramKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchOffsetsKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchOffsetsKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchInitExpertCounts.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/RoutingDeepSeekCommon.cuh
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchInitExpertCounts.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchClusterKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchMainKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramScoresKernel.cu
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchCoopKernel.cu
  • cpp/tests/unit_tests/kernels/routing/routingRenormalizeTest.cpp
  • cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/RoutingRenormalizeCommon.cuh

Comment thread cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu Outdated
Comment thread cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu Outdated
Comment thread cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp
Comment thread cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp
Comment thread cpp/tests/unit_tests/kernels/routing/routingTest.cpp
Comment thread tests/unittest/_torch/modules/moe/test_moe_module.py
Comment thread tests/unittest/_torch/modules/moe/test_moe_module.py Outdated
Comment thread tests/unittest/_torch/modules/moe/test_moe_module.py Outdated
@ChristinaZ
Copy link
Copy Markdown
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44965 [ kill ] triggered by Bot. Commit: 8dd4c75 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44940 [ run ] completed with state ABORTED. Commit: 8dd4c75

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #44965 [ kill ] completed with state SUCCESS. Commit: 8dd4c75
Successfully killed previous jobs for commit 8dd4c75

Link to invocation

@yweng0828 yweng0828 force-pushed the refactor_routing2 branch from b4ca38d to 5d4d8ce Compare May 1, 2026 15:15
@yweng0828
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@yweng0828
Copy link
Copy Markdown
Collaborator

/bot kill

@yweng0828 yweng0828 force-pushed the refactor_routing2 branch from 5d4d8ce to d6d1600 Compare May 1, 2026 15:30
@yweng0828
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46501 [ run ] triggered by Bot. Commit: d6d1600 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46501 [ run ] completed with state FAILURE. Commit: d6d1600
/LLM/main/L0_MergeRequest_PR pipeline #36562 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@yweng0828
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46545 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: run

Link to invocation

@yweng0828
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46549 [ run ] triggered by Bot. Commit: 6c06f2e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46549 [ run ] completed with state SUCCESS. Commit: 6c06f2e
/LLM/main/L0_MergeRequest_PR pipeline #36606 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

ChristinaZ and others added 10 commits May 4, 2026 09:57
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
@yweng0828 yweng0828 force-pushed the refactor_routing2 branch from 6c06f2e to 8d50c57 Compare May 4, 2026 16:58
@yweng0828
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46666 [ run ] triggered by Bot. Commit: 8d50c57 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46666 [ run ] completed with state SUCCESS. Commit: 8d50c57
/LLM/main/L0_MergeRequest_PR pipeline #36707 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@yweng0828
Copy link
Copy Markdown
Collaborator

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46725 [ run ] triggered by Bot. Commit: 8d50c57 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #46725 [ run ] completed with state SUCCESS. Commit: 8d50c57
/LLM/main/L0_MergeRequest_PR pipeline #36759 completed with status: 'SUCCESS'

CI Report

Link to invocation

@Funatiq Funatiq merged commit f8a9a29 into NVIDIA:main May 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants