[None][feat] Resubmission of the routing refactor in trtllmgen by ChristinaZ · Pull Request #13328 · NVIDIA/TensorRT-LLM

ChristinaZ · 2026-04-22T08:29:14Z

Summary by CodeRabbit

Release Notes

New Features
- Added SigmoidRenorm routing method for MoE operations
- Expanded support for float32 data types in routing inputs (bias and logits)
- Enhanced SM90+ hardware optimization with cluster-based execution support
Improvements
- Refactored MoE routing system for improved flexibility and maintainability
- Better routing method configuration with policy-driven architecture

Description

This PR fixes the issues introduced by #12246.
Previously, I had skipped the C++ unit tests and there were failing cases.
In this PR, I’ve addressed those failures and fixed the related bugs.

Test Coverage

cd cpp/build
make -j$(nproc) google-tests
./tests/unit_tests/kernels/routingKernelsTest 

pytest tests/unittest/_torch/modules/moe/test_moe_module.py::test_fp32_routing_bias -v

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

ChristinaZ · 2026-04-22T08:32:12Z

/bot help

github-actions · 2026-04-22T08:32:22Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

ChristinaZ · 2026-04-22T08:32:57Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-22T08:38:56Z

PR_Github #44940 [ run ] triggered by Bot. Commit: 8dd4c75 Link to invocation

coderabbitai · 2026-04-22T08:42:07Z

📝 Walkthrough

Walkthrough

This PR refactors the TensorRT-LLM MoE routing kernel infrastructure from macro-based dispatch to a unified policy-driven system. It consolidates multiple routing implementations (renormalize, DeepSeek) into a single custom routing framework with configurable preprocess/postprocess policies. Runtime configuration replaces compile-time template parameters for flexibility. Old dispatch macros and separate routing implementations are removed and replaced with new infrastructure (RoutingCustomPolicy, RoutingDevKernel, RoutingFromTopKIds). Python layers and tests are updated to support new routing methods and dtype handling.

Changes

Cohort / File(s)	Summary
Old Dispatch Infrastructure Removed `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h`	Removed legacy routing/launch preprocessor macros (`LAUNCH_TILEN`, `LAUNCH_ROUTING_LLAMA4`, `LAUNCH_ROUTING_WITH_NUM_EXPERTS_FORCE_FLOAT_INPUT`, `LAUNCH_ROUTING_WITH_NUM_EXPERTS`) previously used for MoE kernel dispatch.
Old Routing Implementation Files Deleted `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/...`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/...`	Removed entire old DeepSeek and Renormalize routing orchestration and launcher implementations; consolidated into new unified framework. Deleted: `RoutingDeepSeekCommon.cuh`, `RoutingRenormalizeCommon.cuh`, `launchMainKernel.cu`, `launchClusterKernel.cu`, `launchCoopKernel.cu`, `launchHistogramKernel.cu`, `launchHistogramScoresKernel.cu`, `launchInitExpertCounts.cu`, `launchOffsetsKernel.cu`.
New Policy-Driven Routing Infrastructure `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDevKernel.h`	Added comprehensive policy-based expert routing dispatch system with preprocess (`NoOp`, `Softmax`, `Sigmoid`, `SigmoidBias`) and postprocess policies (`NoOp`, `Softmax`, `SumNormalize`, `ScaledSumNormalize`). Implemented tier-based compilation and runtime dispatch with macros (`LAUNCH_ROUTING_CUSTOM`, `LAUNCH_ROUTING_WITH_POLICIES`, etc.) that select kernel parameters and manage kernel launches with programmatic launch dependency (PDL) support.
New Unified Routing Implementations `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDeepSeek.cu`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingFromTopKIds.cu`	Added new implementations: `routingCustom::run` with policy-based control flow supporting dynamic/static/cluster kernels and optional post-TopK histogram pipelines; `routingDeepSeek::run` with DeepSeek-specific routing biases and grouped TopK; shared `runPostTopKPipeline` template for unified post-TopK processing across routing methods.
Core Kernel Headers Refactored `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.h`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.cuh`	Removed compile-time boolean template parameters (`isPow2_`, `UsePdl_`) from `KernelParamsBase`; replaced with runtime fields `mUsePdl` and `mIsPow2`. Added routing-specific enums (`RoutingPreprocessType`, `RoutingPostprocessType`) and refactored `routingCustom::KernelParams` to use template `ExpertSelectPolicy_` type. Added helper kernels (`routingIndicesCoopKernel`) and device functions (`loadScalar`, `getExpertIdxFromInput`). Changed PDL trigger timing and made `isPow2` runtime-configurable.
Routing Methods Updated (DeepSeek, Llama4) `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingLlama4.cu`	Converted compile-time template conditionals to runtime checks (`params.mUsePdl`, `params.mIsPow2`). Renamed `mDtypeExpW` → `mDtypeOutput`. Added support for TopK-packed input path via `runPostTopKPipeline`. Updated bias handling to support type-erased `void const* mPtrRoutingBias` with explicit dtype field.
MoE Runner Updates `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu`, `cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.h`	Extended `Runner` constructor to accept `clusterSizeInBatchDim`. Updated routing `run` signature to accept `dtypeRoutingLogits` and `dtypeRoutingBias` parameters. Added `RoutingMethodType::SigmoidRenorm`. Expanded routing control flow to support `routingCustom` for new methods. Renamed utility function `getMaxNumCtasInBatchDim` → `getMaxNumCgasInBatchDim` (CTA→CGA).
Python Torch Ops Updated `cpp/tensorrt_llm/thop/cuteDslMoeUtilsOp.cpp`, `cpp/tensorrt_llm/thop/fp4BlockScaleMoe.cpp`, `cpp/tensorrt_llm/thop/fp8BlockScaleMoe.cpp`, `cpp/tensorrt_llm/thop/fp8PerTensorScaleMoe.cpp`, `cpp/tensorrt_llm/thop/mxFp4BlockScaleMoe.cpp`	Updated routing input validation to accept both `Float` and `Bfloat16` for logits and bias. Added dtype propagation: compute `dtypeRoutingLogits` and `args.mDtypeBias` from input tensor dtypes and pass to routing kernel runner. Removed routing-method-specific dtype restrictions.
Python Routing Infrastructure `tensorrt_llm/_torch/modules/fused_moe/routing.py`, `tensorrt_llm/_torch/modules/fused_moe/__init__.py`, `tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`	Added `SigmoidRenormMoeRoutingMethod` enum value and implementation. Introduced `RoutingParams` dataclass to bundle routing configuration. Added `_extract_routing_params()` to centralize routing parameter extraction with support for `MiniMaxM2MoeRoutingMethod` and new methods. Updated fake output path to recognize `MiniMaxM2MoeRoutingMethod` for routing bias selection.
Model/Config Updates `tensorrt_llm/_torch/models/modeling_deepseekv3.py`, `tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py`	Changed `DeepseekV3Gate` bias dtype from backend-dependent to unconditional `float32`. Extended custom-ops routing initialization to handle `MiniMax2` and `SigmoidRenorm` by adding optional kwargs (`callable_e_score_correction_bias`, `num_experts`).
Test Infrastructure Refactored `cpp/tests/unit_tests/kernels/CMakeLists.txt`, `cpp/tests/unit_tests/kernels/routing/routingTest.h`, `cpp/tests/unit_tests/kernels/routing/routingTest.cpp`, `cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp`, `cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp`, `cpp/tests/unit_tests/kernels/routing/routingLlama4Test.cpp`	Removed old `routingRenormalizeTest.cpp`. Added comprehensive `routingCustomTest.cpp` with extensive test coverage for policies, execution paths, and edge cases. Refactored test param construction to use fluent builder API. Updated `routingTest.h` with new fields (`useTopKPackedAsInput`, `invalidExpertIdValue`, `preprocessType`, `postprocessType`) and builder pattern. Updated `routingDeepSeekTest.cpp` and `routingLlama4Test.cpp` to handle packed TopK input, mixed bias dtypes, and new routing output configurations.
Python Test Updates `tests/unittest/_torch/modules/moe/moe_test_utils.py`, `tests/unittest/_torch/modules/moe/test_moe_module.py`, `tests/integration/defs/accuracy/test_llm_api_pytorch.py`	Updated routing kernel compatibility check comments to reflect support for `SigmoidRenorm` and `MiniMax2`. Removed skip logic for unimplemented routing methods. Added `_create_routing_method_with_fp32_bias()` and `test_fp32_routing_bias()` for float32 bias testing. Updated test parameter passing for `SigmoidRenormMoeRoutingMethod` construction.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

[None][feat] Refactor the routing part in trtllmgen #12246: Refactors routing dispatch infrastructure with policy-driven system and reorganizes kernel launches and dtype/bias handling similarly.
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend #9792: Modifies the same routing kernel dispatch/launch infrastructure and DeepSeek routing code structures.
[None][fix] Revert "Refactor the routing part in trtllmgen" (#12246) #13294: Directly reverts this routing refactor, removing the same added files and macros.

Suggested reviewers

yizhang-nv
symphonylyh
xxi-nv
yweng0828

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is partially related to the changeset, referring to a real aspect of the change (resubmission of routing refactor), but lacks specificity about the bug fixes mentioned in the description.	Consider a more specific title such as '[None][feat] Routing refactor resubmission with C++ unit test fixes' to better capture the actual changes and fixes involved.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description explains the purpose (fixing issues from PR `#12246`, addressing failing C++ unit tests) and provides test coverage steps, but lacks detail about what specific bugs were fixed and why those changes are necessary.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 13

🧹 Nitpick comments (4)

tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py (1)

92-103: Reuse one dummy correction bias per tuning session.

callable_e_score_correction_bias generates a new random tensor on every apply(), so autotuning can benchmark different tactics against different expert distributions. Caching one dummy bias here keeps tactic selection reproducible.

One way to make the dummy input stable

     # Get routing method
     routing_cls_kwargs = {}
+    dummy_e_score_correction_bias = None
+
+    if routing_method_type in (RoutingMethodType.DeepSeekV3,
+                               RoutingMethodType.MiniMax2):
+        dummy_e_score_correction_bias = torch.randn(
+            num_experts, dtype=torch.bfloat16, device=hidden_states.device)
+
     if routing_method_type == RoutingMethodType.DeepSeekV3:
         routing_cls_kwargs.update({
             'n_group':
             n_group,
@@
             'routed_scaling_factor':
             routed_scaling_factor,
             'is_fused':
             False,  # fuse_routing_kernel
             'callable_e_score_correction_bias':
-            lambda: torch.randn(
-                num_experts, dtype=torch.bfloat16, device=hidden_states.device)
+            lambda: dummy_e_score_correction_bias
         })
     if routing_method_type == RoutingMethodType.MiniMax2:
         routing_cls_kwargs.update({
             'callable_e_score_correction_bias':
-            lambda: torch.randn(
-                num_experts, dtype=torch.bfloat16, device=hidden_states.device),
+            lambda: dummy_e_score_correction_bias,
             'num_experts':
             num_experts,
         })

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py` around lines 92 -
103, The current lambda assigned to
routing_cls_kwargs['callable_e_score_correction_bias'] creates a new random
tensor on every apply(), breaking autotuning reproducibility; instead, allocate
one dummy bias tensor once (e.g., dummy_e_score_correction_bias =
torch.randn(num_experts, dtype=torch.bfloat16, device=hidden_states.device)) and
set the callable to return that same tensor (e.g., lambda:
dummy_e_score_correction_bias) so MiniMax2 uses a stable dummy correction bias
across the tuning session; update the block handling RoutingMethodType.MiniMax2
to create and close over this cached tensor and keep the existing keys
('callable_e_score_correction_bias', 'num_experts') in routing_cls_kwargs.

tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py (1)

1096-1100: Reuse _extract_routing_params() in the fake path too.

This branch re-implements the same routing-method split that run_moe() and forward_impl() just centralized, so the next routing-method addition has to update three places again.

♻️ Suggested simplification

-            is_deepseek_v3_routing = isinstance(self.routing_method,
-                                                DeepSeekV3MoeRoutingMethod)
-            is_minimax_routing = isinstance(self.routing_method,
-                                            MiniMaxM2MoeRoutingMethod)
-            top_k = self.routing_method.routing_impl.top_k if is_deepseek_v3_routing else self.routing_method.top_k
-            routing_bias = self.routing_method.e_score_correction_bias if (
-                is_deepseek_v3_routing or is_minimax_routing) else None
+            routing_params = self._extract_routing_params()
+            top_k = routing_params.top_k
+            routing_bias = routing_params.routing_bias
             return fp4_block_scale_fake_output_without_finalize(
                 x,
                 self.num_experts,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py` around lines
1096 - 1100, The fake/inactive path currently reimplements the routing-method
split (calculating is_minimax_routing, top_k, routing_bias) instead of reusing
the centralized helper—call the existing _extract_routing_params() helper from
the fake path so it returns the same routing parameters used by run_moe() and
forward_impl(); replace the duplicated logic that computes is_minimax_routing,
top_k and routing_bias with a single call to _extract_routing_params() and use
its returned values to drive the fake path behavior.

cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu (1)

59-62: clusterSizeInBatchDim is currently a no-op.

The constructor advertises a second tuning parameter, but the value is dropped on the floor here and never participates in workspace sizing or routing launch decisions. Either persist/use it or remove it until the implementation is ready.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu` around
lines 59 - 62, The constructor Runner::Runner(int32_t tileTokensDim, int32_t
clusterSizeInBatchDim) currently drops clusterSizeInBatchDim; persist it (e.g.,
add a member mClusterSizeInBatchDim and initialize it in the initializer list
alongside mTileTokensDim) and then use mClusterSizeInBatchDim in workspace
sizing and routing launch decisions (the code paths that compute workspace bytes
or choose kernel launch dimensions), or if the tuning parameter is not yet
supported remove clusterSizeInBatchDim from the signature and all callsites;
ensure references are updated to the chosen approach.

cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh (1)

586-609: Drop the commented-out sample policy block.

Keeping a disabled implementation in a block comment here makes it easy for the example to drift away from the real dispatch path. Either remove it or gate it with #if defined(...) if you want an opt-in sample.

As per coding guidelines, "Do not use comments to disable code in C++; use #if / #endif or avoid dead code entirely".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`
around lines 586 - 609, Remove the large commented-out sample policy block (the
FirstKExpertSelect struct and its explicit PolicyTraits specialization
referencing TierList/Tier) from RoutingCustomPolicy.cuh; either delete it
entirely or wrap it with a clear compile-time guard like `#if`
defined(SAMPLE_ROUTING_POLICY) / `#endif` so it is not present as dead code in
comments—ensure references to FirstKExpertSelect and the PolicyTraits<T>
specialization are handled accordingly and that no dangling commented symbols
remain.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu`:
- Around line 954-980: The early-return path that calls runPostTopKPipeline when
precomputed topK is present must still validate sizes: move or duplicate the
bounds checks for data.mTopK (<= MaxSupportedTopExperts), data.mNumExperts (<=
MaxSupportedExperts) and the data.mNumExperts % 4 == 0 check to execute before
the early return that handles data.mPtrTopKIds / data.mPtrTopKPacked; keep the
existing TLLM_CHECK_WITH_INFO validating mPtrTopKWeights when mPtrTopKIds is
provided and then call runPostTopKPipeline only after these validations pass so
oversized precomputed inputs fail fast with the same checks as the
non-precomputed path.
- Around line 648-655: The PDL completion trigger is currently invoked before
Phase 5 writes permutation outputs, so when params.mUsePdl is true we must move
the cudaTriggerProgrammaticLaunchCompletion() call to after Phase 5 finishes
writing mPtrExpandedIdxToPermutedIdx and any other permutation outputs; update
the dyn-block path to place the trigger after the final global writes (same
location/order as the block kernel) so downstream kernels cannot consume
partially written routing results.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`:
- Around line 744-759: The dispatchRoutingPolicy function currently ignores
Data::mPostprocessType for many cases and silently remaps requested (preprocess,
postprocess) pairs; update dispatchRoutingPolicy (the function handling Data,
Fn, and enums RoutingPreprocessType/RoutingPostprocessType) to match on the full
pair instead of only preprocess: for each supported combination explicitly call
fn(...) with the exact (Preprocess, Postprocess) tuple (e.g.,
SigmoidBiasPreprocess + ScaledSumNormalizePostprocess, SigmoidPreprocess +
SumNormalizePostprocess, SoftmaxPreprocess + NoOpPostprocess, SoftmaxPreprocess
+ SumNormalizePostprocess, NoOpPreprocess + SoftmaxPostprocess), and add a final
else branch that fails fast (throw a std::runtime_error or assert/log + exit)
when an unsupported preprocess/postprocess pair is requested so callers cannot
be silently remapped.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDeepSeek.cu`:
- Around line 190-228: The routing bias is being narrowed early: biasVal is cast
to OutputT after loadScalar which loses fp32 precision (problematic when
mDtypeOutput is Bfloat16). Change the logic around biasVal/loadScalar so you
keep the bias in float precision for selection and comparison (use a float bias
variable from loadScalar(params.mPtrRoutingBias, params.mDtypeBias) and only
cast to OutputT when storing into outputs if needed). Update usages around
biasVal, scoreBias and any selection code that compares expert scores (e.g.,
where expertSelected, scoreIdx, smemScoreSigmoid are used) to use the float bias
variable so top-k decisions use full fp32 bias precision.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDevKernel.h`:
- Around line 117-167: The BFloat16 branches in both
LAUNCH_ROUTING_WITH_POLICIES and LAUNCH_ROUTING_WITH_EXPERT_SELECT incorrectly
accept any mDtypeInput; change the third branch condition in each macro from
"else if (data.mDtypeOutput == tg::Dtype::Bfloat16)" to "else if
(data.mDtypeOutput == tg::Dtype::Bfloat16 && data.mDtypeInput ==
tg::Dtype::Fp32)" so the bf16→bf16 kernel is only selected when input is fp32;
keep the final else that calls TLLM_LOG_ERROR("Unsupported dtypeOutput") so
unsupported input/output combinations are rejected.

In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingFromTopKIds.cu`:
- Around line 42-136: Add a fail-fast input validation in runPostTopKPipeline:
after computing useStaticBlock, useDynBlock, useSingleCluster and before any
routingCustom::launch* calls, check for the unsupported "packed-only without
weights" combination (data.mPtrTopKPacked != nullptr && data.mPtrTopKWeights ==
nullptr) that can lead to garbage writes to
mPtrPermutedIdxSize/mPtrNumNonExitingCtas and histogram fallback corruption; if
that condition is true and the code will take a
non-static/non-dyn/non-single-cluster path (i.e., !(useStaticBlock ||
useDynBlock || useSingleCluster) or when useCoop is possible), call
TLLM_CHECK_WITH_INFO(false, "clear message...") to abort early. Ensure the check
references runPostTopKPipeline, data.mPtrTopKPacked, data.mPtrTopKWeights,
mPtrPermutedIdxSize/mPtrNumNonExitingCtas and the boolean flags so the
validation is placed before any routingCustom::launch* invocations.

In `@cpp/tensorrt_llm/thop/cuteDslMoeUtilsOp.cpp`:
- Around line 79-81: The code sets dtypeRoutingLogits by mapping any non-Float
routing_logits to btg::Dtype::Bfloat16 which silently accepts unsupported
dtypes; update the logic in cuteDslMoeUtilsOp.cpp where dtypeRoutingLogits is
computed (the routing_logits.has_value() branch) to explicitly accept only
at::ScalarType::Float -> btg::Dtype::Fp32 and at::ScalarType::BFloat16 (or the
exact BF16 enum used by your build) -> btg::Dtype::Bfloat16, and otherwise fail
fast (throw an exception or return an error) when routing_logits->scalar_type()
is any other type (e.g., at::ScalarType::Half) so the kernel won't read invalid
data as bf16.

In `@cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp`:
- Around line 1262-1279: The test currently only checks kernel launch for
routingCustom::Data (routingData) with mixed bias dtype; instead add numeric
assertions by computing the CPU reference outputs (using the same reference
helper used by other test paths) and compare the device kernel outputs (the
buffer pointed to by routingData.mPtrRoutingBias or the scores output buffer
produced by routingCustom::run) against that CPU reference with an appropriate
tolerance; ensure you exercise the mDtypeBias / loadScalar path by reading back
the device output into host memory via bufferCast and then assert elementwise
equality/near-equality to the CPU reference (use the same tolerance and
comparison helper used elsewhere in these tests) so the mixed-precision behavior
is validated, and keep calls to routingCustom::run(routingData,
this->mStream->get()) and this->mStream->synchronize() before reading back.
- Around line 145-156: The ScaledSumNormalize oracle currently divides by
sumSigmoid without using the test epsilon, so thread routingData.mSumEpsilon
into the ScaledSumNormalize test logic: update the validation in the
RoutingPostprocessType::ScaledSumNormalize branch to divide by (sumSigmoid +
routingData.mSumEpsilon) when computing expected scores (using symbols
sigmoidScores, expIdx, and param.routedScalingFactor), and modify setParams() to
populate routingData.mSumEpsilon with the intended non-zero test values so the
non-zero-epsilon behavior is actually exercised; apply the same change to the
other occurrence mentioned (lines 235-242).

In `@cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp`:
- Around line 412-459: The test currently only ensures the kernel runs with
mDtypeBias = Fp32; update it to verify correctness by computing a CPU host
reference using the float32 bias (use the same inputs initialized via initData
and float32BiasHost), run moe::dev::routing::routingDeepSeek::run(routingData,
...), copy back the kernel outputs (top-k ids and weights buffers produced by
the test harness) and ASSERT/EXPECT that the device top-k ids/weights match the
host reference within tolerance; locate code around setCommonParams,
routingData, float32BiasHost/float32BiasDevice, routingDeepSeek::run and add the
host-reference computation and comparisons after this->mStream->synchronize() so
the test fails if dtype plumbing is wrong.

In `@cpp/tests/unit_tests/kernels/routing/routingTest.cpp`:
- Around line 301-317: The host-side reference in computePermutation() must not
index expertCountsHostPtr or expertScanCountsHostPtr with out-of-range expert
IDs when hasInvalidTopKInput is true; update computePermutation() (the host
oracle that reads expIdxHostPtr entries) to validate each expertIdx (require
expertIdx >= 0 && expertIdx < param.numExperts) before any access to
expertCountsHostPtr or expertScanCountsHostPtr and skip or set outputs for
invalid entries (e.g., produce -1) so the reference no longer walks past the
buffers for expertIdx >= param.numExperts.

In `@tests/unittest/_torch/modules/moe/test_moe_module.py`:
- Around line 1225-1229: Replace the TRTLLM-only gate with the suite's full
backend capability check: call the same helper used elsewhere
(backend_type.get_quick_skip_reason or backend_type.can_implement pattern)
passing quant_algo, moe_model_config and routing_method_cls (and the custom
n_group/topk_group settings) and if it returns a reason, pytest.skip(reason); do
not use should_skip_trtllm here so unsupported combos like TRTLLM+QuantAlgo.FP8
or custom DeepSeek group/topk configurations are correctly skipped.

---

Nitpick comments:
In
`@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh`:
- Around line 586-609: Remove the large commented-out sample policy block (the
FirstKExpertSelect struct and its explicit PolicyTraits specialization
referencing TierList/Tier) from RoutingCustomPolicy.cuh; either delete it
entirely or wrap it with a clear compile-time guard like `#if`
defined(SAMPLE_ROUTING_POLICY) / `#endif` so it is not present as dead code in
comments—ensure references to FirstKExpertSelect and the PolicyTraits<T>
specialization are handled accordingly and that no dangling commented symbols
remain.

In `@cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu`:
- Around line 59-62: The constructor Runner::Runner(int32_t tileTokensDim,
int32_t clusterSizeInBatchDim) currently drops clusterSizeInBatchDim; persist it
(e.g., add a member mClusterSizeInBatchDim and initialize it in the initializer
list alongside mTileTokensDim) and then use mClusterSizeInBatchDim in workspace
sizing and routing launch decisions (the code paths that compute workspace bytes
or choose kernel launch dimensions), or if the tuning parameter is not yet
supported remove clusterSizeInBatchDim from the signature and all callsites;
ensure references are updated to the chosen approach.

In `@tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py`:
- Around line 92-103: The current lambda assigned to
routing_cls_kwargs['callable_e_score_correction_bias'] creates a new random
tensor on every apply(), breaking autotuning reproducibility; instead, allocate
one dummy bias tensor once (e.g., dummy_e_score_correction_bias =
torch.randn(num_experts, dtype=torch.bfloat16, device=hidden_states.device)) and
set the callable to return that same tensor (e.g., lambda:
dummy_e_score_correction_bias) so MiniMax2 uses a stable dummy correction bias
across the tuning session; update the block handling RoutingMethodType.MiniMax2
to create and close over this cached tensor and keep the existing keys
('callable_e_score_correction_bias', 'num_experts') in routing_cls_kwargs.

In `@tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py`:
- Around line 1096-1100: The fake/inactive path currently reimplements the
routing-method split (calculating is_minimax_routing, top_k, routing_bias)
instead of reusing the centralized helper—call the existing
_extract_routing_params() helper from the fake path so it returns the same
routing parameters used by run_moe() and forward_impl(); replace the duplicated
logic that computes is_minimax_routing, top_k and routing_bias with a single
call to _extract_routing_params() and use its returned values to drive the fake
path behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d0ba2c10-bd7b-4c9e-a975-6b32330215d0

📥 Commits

Reviewing files that changed from the base of the PR and between 36fb5f0 and 8dd4c75.

📒 Files selected for processing (48)

cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/IntFastDiv.h
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustom.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingCustomPolicy.cuh
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDeepSeek.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingDevKernel.h
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingFromTopKIds.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.cuh
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernel.h
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingKernelTopK.cuh
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routing/RoutingLlama4.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/RoutingDeepSeekCommon.cuh
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchClusterKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchCoopKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchHistogramKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchInitExpertCounts.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchMainKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchOffsetsKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/RoutingRenormalizeCommon.cuh
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchClusterKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramScoresKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchInitExpertCounts.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchOffsetsKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/runner.h
cpp/tensorrt_llm/thop/cuteDslMoeUtilsOp.cpp
cpp/tensorrt_llm/thop/fp4BlockScaleMoe.cpp
cpp/tensorrt_llm/thop/fp8BlockScaleMoe.cpp
cpp/tensorrt_llm/thop/fp8PerTensorScaleMoe.cpp
cpp/tensorrt_llm/thop/mxFp4BlockScaleMoe.cpp
cpp/tests/unit_tests/kernels/CMakeLists.txt
cpp/tests/unit_tests/kernels/routing/routingCustomTest.cpp
cpp/tests/unit_tests/kernels/routing/routingDeepSeekTest.cpp
cpp/tests/unit_tests/kernels/routing/routingLlama4Test.cpp
cpp/tests/unit_tests/kernels/routing/routingRenormalizeTest.cpp
cpp/tests/unit_tests/kernels/routing/routingTest.cpp
cpp/tests/unit_tests/kernels/routing/routingTest.h
tensorrt_llm/_torch/custom_ops/trtllm_gen_custom_ops.py
tensorrt_llm/_torch/models/modeling_deepseekv3.py
tensorrt_llm/_torch/modules/fused_moe/__init__.py
tensorrt_llm/_torch/modules/fused_moe/fused_moe_trtllm_gen.py
tensorrt_llm/_torch/modules/fused_moe/routing.py
tests/integration/defs/accuracy/test_llm_api_pytorch.py
tests/unittest/_torch/modules/moe/moe_test_utils.py
tests/unittest/_torch/modules/moe/test_moe_module.py

💤 Files with no reviewable changes (17)

cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingDeepSeek.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchClusterKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/DevKernel.h
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchHistogramKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchOffsetsKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchOffsetsKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchInitExpertCounts.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/RoutingDeepSeekCommon.cuh
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchInitExpertCounts.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/RoutingRenormalize.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchClusterKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchMainKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/launchHistogramScoresKernel.cu
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingDeepSeek/launchCoopKernel.cu
cpp/tests/unit_tests/kernels/routing/routingRenormalizeTest.cpp
cpp/tensorrt_llm/kernels/trtllmGenKernels/blockScaleMoe/routingRenormalize/RoutingRenormalizeCommon.cuh

ChristinaZ · 2026-04-22T11:45:31Z

/bot kill

tensorrt-cicd · 2026-04-22T12:24:03Z

PR_Github #44965 [ kill ] triggered by Bot. Commit: 8dd4c75 Link to invocation

tensorrt-cicd · 2026-04-22T12:24:06Z

PR_Github #44940 [ run ] completed with state ABORTED. Commit: 8dd4c75

Link to invocation

tensorrt-cicd · 2026-04-22T12:24:37Z

PR_Github #44965 [ kill ] completed with state SUCCESS. Commit: 8dd4c75
Successfully killed previous jobs for commit 8dd4c75

Link to invocation

yweng0828 · 2026-05-01T15:15:54Z

/bot run --disable-fail-fast

yweng0828 · 2026-05-01T15:25:49Z

/bot kill

yweng0828 · 2026-05-01T15:43:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-01T15:51:21Z

PR_Github #46501 [ run ] triggered by Bot. Commit: d6d1600 Link to invocation

tensorrt-cicd · 2026-05-02T06:45:44Z

PR_Github #46501 [ run ] completed with state FAILURE. Commit: d6d1600
/LLM/main/L0_MergeRequest_PR pipeline #36562 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yweng0828 · 2026-05-02T07:20:23Z

/bot run --disable-fail-fast/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-02T07:26:30Z

PR_Github #46545 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: run

Link to invocation

yweng0828 · 2026-05-02T08:48:28Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-02T08:54:50Z

PR_Github #46549 [ run ] triggered by Bot. Commit: 6c06f2e Link to invocation

tensorrt-cicd · 2026-05-02T18:40:27Z

PR_Github #46549 [ run ] completed with state SUCCESS. Commit: 6c06f2e
/LLM/main/L0_MergeRequest_PR pipeline #36606 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

yweng0828 · 2026-05-04T16:59:26Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-04T17:06:57Z

PR_Github #46666 [ run ] triggered by Bot. Commit: 8d50c57 Link to invocation

tensorrt-cicd · 2026-05-05T02:18:35Z

PR_Github #46666 [ run ] completed with state SUCCESS. Commit: 8d50c57
/LLM/main/L0_MergeRequest_PR pipeline #36707 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

yweng0828 · 2026-05-05T02:49:05Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-05T02:54:40Z

PR_Github #46725 [ run ] triggered by Bot. Commit: 8d50c57 Link to invocation

tensorrt-cicd · 2026-05-05T07:26:06Z

PR_Github #46725 [ run ] completed with state SUCCESS. Commit: 8d50c57
/LLM/main/L0_MergeRequest_PR pipeline #36759 completed with status: 'SUCCESS'

CI Report

Link to invocation

ChristinaZ requested review from a team as code owners April 22, 2026 08:29

ChristinaZ requested review from hlu1, mikeiovine, symphonylyh and yizhang-nv April 22, 2026 08:29

github-actions Bot assigned ChristinaZ Apr 22, 2026

ChristinaZ changed the title ~~Refactor routing2~~ [None][feat] Resubmission of the routing refactor in trtllmgen Apr 22, 2026

ChristinaZ requested review from litaotju, longlee0622 and xxi-nv and removed request for hlu1, mikeiovine, symphonylyh and yizhang-nv April 22, 2026 08:30

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

xxi-nv reviewed Apr 22, 2026

View reviewed changes

Comment thread tests/unittest/_torch/modules/moe/test_moe_module.py Outdated

xxi-nv approved these changes Apr 22, 2026

View reviewed changes

yweng0828 force-pushed the refactor_routing2 branch from b4ca38d to 5d4d8ce Compare May 1, 2026 15:15

yweng0828 force-pushed the refactor_routing2 branch from 5d4d8ce to d6d1600 Compare May 1, 2026 15:30

ChristinaZ and others added 10 commits May 4, 2026 09:57

Refactor the routing part in trtllmgen

477adc4

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Revise based on review

8b79f6d

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Fix time out

f7103b3

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Add support for bf16 bias and fix cpp unit tests

de4f72b

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Update the deepseek bias data type

89c0181

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Revise based on review

dfa4a03

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Update the resolve_deepseek_group_config

540b3d3

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

Remove unnecessary checks

3bd8183

Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>

fix CI issues

d2dfa10

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

fix CI issues v2

8d50c57

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

yweng0828 force-pushed the refactor_routing2 branch from 6c06f2e to 8d50c57 Compare May 4, 2026 16:58

Funatiq merged commit f8a9a29 into NVIDIA:main May 5, 2026
6 checks passed

Conversation

ChristinaZ commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

ChristinaZ commented Apr 22, 2026

Uh oh!

github-actions Bot commented Apr 22, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

ChristinaZ commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChristinaZ commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

tensorrt-cicd commented Apr 22, 2026

Uh oh!

yweng0828 commented May 1, 2026

Uh oh!

yweng0828 commented May 1, 2026

Uh oh!

yweng0828 commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 1, 2026

Uh oh!

tensorrt-cicd commented May 2, 2026

Uh oh!

yweng0828 commented May 2, 2026

Uh oh!

tensorrt-cicd commented May 2, 2026

Uh oh!

yweng0828 commented May 2, 2026

Uh oh!

tensorrt-cicd commented May 2, 2026

Uh oh!

tensorrt-cicd commented May 2, 2026

Uh oh!

yweng0828 commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 4, 2026

Uh oh!

tensorrt-cicd commented May 5, 2026

Uh oh!

yweng0828 commented May 5, 2026

Uh oh!

ChristinaZ commented Apr 22, 2026 •

edited

Loading