Skip to content

[None][feat] enable TRTLLM-Gen internal routing#13997

Merged
tcherckez-nvidia merged 2 commits into
NVIDIA:mainfrom
nv-auto-deploy:route-nvfp4-moe-to-trtllmgen
May 13, 2026
Merged

[None][feat] enable TRTLLM-Gen internal routing#13997
tcherckez-nvidia merged 2 commits into
NVIDIA:mainfrom
nv-auto-deploy:route-nvfp4-moe-to-trtllmgen

Conversation

@tcherckez-nvidia
Copy link
Copy Markdown
Collaborator

@tcherckez-nvidia tcherckez-nvidia commented May 11, 2026

Route matched noaux NVFP4 MoE patterns to TRTLLM-Gen internal routing by default when possible.

Keep the external-routing path for all-to-all cases, preserve routing_bias dtype for TRTLLM-Gen, and cover direct and EP-masked matcher shapes.

Summary by CodeRabbit

  • New Features

    • Added internal routing optimization for MoE (Mixture of Experts) operations
    • Enabled multi-stream MoE execution for improved performance with independent shared and routed expert branches
  • Tests

    • Added test coverage for MoE internal routing validation

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@tcherckez-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

This PR adds internal routing support to TRTLLM-Gen MoE fusion and introduces auxiliary stream scheduling for shared experts. It extends the MoE custom op to accept routing logits and bias as alternatives to external routing tensors, implements graph analysis to extract routing metadata from noaux_tc_op producers, and introduces the MultiStreamMOE transform for performance optimization.

Changes

Internal Routing and Auxiliary Stream MoE Optimization

Layer / File(s) Summary
Configuration Flag
examples/auto_deploy/model_registry/configs/super_v3.yaml
Adds enable_trtllm_gen_internal_routing: true to model transforms configuration.
Internal Routing Custom Op
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/trtllm_moe.py
Extends trtllm_nvfp4_trtllm_gen_moe_fused API to accept optional router_logits, routing_bias, and routing parameters. Implements _trtllm_nvfp4_trtllm_gen_moe_impl with routing mode validation, conditional kernel argument wiring for internal vs. external routing, and all-to-all path handling.
Routing Metadata Extraction
tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py
Adds helpers to trace routing tensors back to trtllm.noaux_tc_op producers and extract DeepSeekV3-style routing parameters. Implements _node_uses_moe_alltoall to detect when internal routing should be skipped.
Transform Orchestration
tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py
Updates _stack_nvfp4_trtllm_gen_moe_weights to conditionally build kernel arguments from extracted routing metadata or fall back to external routing. Adds enable_trtllm_gen_internal_routing to FuseNVFP4MoeConfig and threads it through the trtllm_gen backend.
Auxiliary Stream MoE Transform
tensorrt_llm/_torch/auto_deploy/transform/library/multi_stream_moe.py
Introduces MultiStreamMOE transform that rewires shared-expert computation to auxiliary CUDA streams. Implements merge-node detection, branch classification, and graph insertion of stream begin/end/wait markers.
Test Coverage
tests/unittest/auto_deploy/singlegpu/transformations/library/test_moe_fusion.py
Adds parameterized test for internal routing extraction covering direct and EP-masked routing scenarios.

Sequence Diagram(s)

The custom op routing flow and transform orchestration are visualized in the hidden review stack artifact above.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides a concise explanation of the change objective but lacks detailed description, test coverage documentation, and completed PR checklist items that the template requires. Add detailed explanation of the change in the Description section, list relevant test cases in Test Coverage, and complete the PR Checklist items to confirm alignment with coding guidelines and test coverage requirements.
Docstring Coverage ⚠️ Warning Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title '[None][feat] enable TRTLLM-Gen internal routing' clearly describes the main feature being added and is directly related to the changeset, which implements internal routing support for TRTLLM-Gen MoE operations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/auto_deploy/transform/library/multi_stream_moe.py (1)

195-223: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Rewire every shared-branch root to the aux stream, not just the first one.

first_shared only covers one direct consumer of fork_point. Shared-expert MLPs commonly fan out immediately into parallel gate/up projections, so sibling roots keep reading fork_point on the main stream. That splits one logical shared branch across streams, and the later end_aux/wait_aux nodes no longer bracket the full shared-expert computation.

Suggested fix
-        shared_nodes.sort(key=lambda n: node_order.get(n, 0))
-        first_shared = shared_nodes[0]
-
-        # Sanity check: the first shared op must directly consume the fork
-        # point so we can wire begin_aux_stream_passthrough into it.
-        if fork_point not in first_shared.all_input_nodes:
+        shared_nodes.sort(key=lambda n: node_order.get(n, 0))
+        root_shared_nodes = [n for n in shared_nodes if fork_point in n.all_input_nodes]
+        if not root_shared_nodes:
             ad_logger.warning(
-                f"First shared-expert op ({first_shared.name}) does not directly "
-                f"consume fork point ({fork_point.name}); skipping."
+                f"No shared-expert root directly consumes fork point ({fork_point.name}); skipping."
             )
             continue
+        first_shared = root_shared_nodes[0]
 
         with graph.inserting_before(first_shared):
             begin_aux_node = graph.call_function(
                 begin_aux_stream_passthrough,
                 args=(fork_point,),
             )
 
-        first_shared.args = tuple(
-            begin_aux_node if arg is fork_point else arg for arg in first_shared.args
-        )
+        for root in root_shared_nodes:
+            root.args = tuple(
+                begin_aux_node if arg is fork_point else arg for arg in root.args
+            )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/multi_stream_moe.py` around
lines 195 - 223, The current code only rewires the single first_shared consumer
to read from the begin_aux_node, leaving sibling shared roots still tied to
fork_point; change the rewrite so that every shared branch root in shared_nodes
that directly consumes fork_point is updated to use begin_aux_node instead of
fork_point. Locate the block where begin_aux_node is created via
begin_aux_stream_passthrough and replace the single reassignment of
first_shared.args with a loop (or comprehension) over shared_nodes that for each
node checks its args (or all_input_nodes membership) and replaces any arg that
is fork_point with begin_aux_node; ensure this uses the same begin_aux_node and
preserves nodes that do not reference fork_point so the entire shared-expert
subtree executes on the aux stream and remains bracketed by the corresponding
end_aux/wait_aux nodes.
🧹 Nitpick comments (2)
examples/auto_deploy/model_registry/configs/super_v3.yaml (1)

54-56: Add perf coverage before enabling this path by default.

This turns on a MoE routing hot path for trtllm_gen, but I don't see a matching perf sanity/test-list update in the diff to catch latency or throughput regressions. Please add a tests/integration/defs/perf/test_perf_sanity.py case plus a tests/integration/test_lists/test-db/l0_perf.yml entry, and the corresponding QA llm_perf_*.yml update if you want scheduled coverage.

As per coding guidelines, "If the PR touches performance-sensitive paths ... check whether a perf test entry is present or updated in: (a) tests/integration/test_lists/test-db/l0_perf.yml ... and (b) tests/integration/test_lists/qa/llm_perf_*.yml ..."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/auto_deploy/model_registry/configs/super_v3.yaml` around lines 54 -
56, The new config enables a MoE routing hot path (fuse_nvfp4_moe with backend:
trtllm_gen and enable_trtllm_gen_internal_routing: true) but no perf coverage
was added; add a performance sanity test and test-list entries: create a
tests/integration/defs/perf/test_perf_sanity.py that exercises the trtllm_gen
MoE routing case (reference the fuse_nvfp4_moe config name in the test), add a
corresponding entry in tests/integration/test_lists/test-db/l0_perf.yml to
include that test, and update the appropriate QA list under
tests/integration/test_lists/qa/ (llm_perf_*.yml) so this path is scheduled for
perf runs.
tests/unittest/auto_deploy/singlegpu/transformations/library/test_moe_fusion.py (1)

1631-1658: ⚡ Quick win

Cover the transform rewrite, not just the helper.

This only proves _extract_noaux_internal_routing() can parse a hand-built graph. The risky behavior in this PR is the fuse_nvfp4_moe rewrite that drops external routing tensors, threads router_logits/routing_bias into trtllm_nvfp4_trtllm_gen_moe_fused, and stays on the external path for all-to-all. A regression there would still pass this test. Please add a transform-level regression for both the internal-routing rewrite and the all-to-all fallback.

As per coding guidelines, "Coverage expectations: Assess whether new/changed tests cover happy path, important edge cases, and failure modes relevant to the feature or fix."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_moe_fusion.py`
around lines 1631 - 1658, The current unit test only validates the helper
_extract_noaux_internal_routing on a hand-built graph; you must add
transform-level tests that run the actual fuse_nvfp4_moe rewrite and assert its
behavior for both internal-routing fusion and the all-to-all fallback. Create
two tests (or one parametrized) that build an FX graph with real external
routing tensors (router_logits, routing_bias) and the noaux op, apply the
transformation function fuse_nvfp4_moe, and then assert: (1) when
internal-routing is detected the graph contains a single
trtllm_nvfp4_trtllm_gen_moe_fused node and router_logits/routing_bias are
threaded into that node, and (2) when routing is externally required (all-to-all
fallback case) the transform does not drop external routing tensors and keeps
the all-to-all path intact; include the with_ep_mask True/False cases similar to
test_nvfp4_trtllm_gen_internal_routing to cover masking behavior. Ensure
assertions inspect node targets/args in the transformed FX graph to uniquely
identify fuse_nvfp4_moe and trtllm_nvfp4_trtllm_gen_moe_fused behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py`:
- Around line 1-2: This file missing the required NVIDIA copyright header; open
fused_moe.py (module fused_moe) and insert the standard NVIDIA source header
(with the current modification year) at the very top of the file before any
imports (before the existing import math / import operator lines); ensure the
header text matches the project's canonical copyright header used in other .py
files and retains any required SPDX or license lines.

In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_moe_fusion.py`:
- Around line 1-2: Add the required NVIDIA copyright header block at the very
top of this modified Python file (before the existing import lines such as
"import json" and "import operator") in the same format used by other source
files in the repo and update the modification year to the current year; ensure
the header is a proper multi-line comment/docblock matching the project's
standard and leave the rest of the file (including imports and tests in
test_moe_fusion.py) unchanged below the header.

---

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/transform/library/multi_stream_moe.py`:
- Around line 195-223: The current code only rewires the single first_shared
consumer to read from the begin_aux_node, leaving sibling shared roots still
tied to fork_point; change the rewrite so that every shared branch root in
shared_nodes that directly consumes fork_point is updated to use begin_aux_node
instead of fork_point. Locate the block where begin_aux_node is created via
begin_aux_stream_passthrough and replace the single reassignment of
first_shared.args with a loop (or comprehension) over shared_nodes that for each
node checks its args (or all_input_nodes membership) and replaces any arg that
is fork_point with begin_aux_node; ensure this uses the same begin_aux_node and
preserves nodes that do not reference fork_point so the entire shared-expert
subtree executes on the aux stream and remains bracketed by the corresponding
end_aux/wait_aux nodes.

---

Nitpick comments:
In `@examples/auto_deploy/model_registry/configs/super_v3.yaml`:
- Around line 54-56: The new config enables a MoE routing hot path
(fuse_nvfp4_moe with backend: trtllm_gen and enable_trtllm_gen_internal_routing:
true) but no perf coverage was added; add a performance sanity test and
test-list entries: create a tests/integration/defs/perf/test_perf_sanity.py that
exercises the trtllm_gen MoE routing case (reference the fuse_nvfp4_moe config
name in the test), add a corresponding entry in
tests/integration/test_lists/test-db/l0_perf.yml to include that test, and
update the appropriate QA list under tests/integration/test_lists/qa/
(llm_perf_*.yml) so this path is scheduled for perf runs.

In
`@tests/unittest/auto_deploy/singlegpu/transformations/library/test_moe_fusion.py`:
- Around line 1631-1658: The current unit test only validates the helper
_extract_noaux_internal_routing on a hand-built graph; you must add
transform-level tests that run the actual fuse_nvfp4_moe rewrite and assert its
behavior for both internal-routing fusion and the all-to-all fallback. Create
two tests (or one parametrized) that build an FX graph with real external
routing tensors (router_logits, routing_bias) and the noaux op, apply the
transformation function fuse_nvfp4_moe, and then assert: (1) when
internal-routing is detected the graph contains a single
trtllm_nvfp4_trtllm_gen_moe_fused node and router_logits/routing_bias are
threaded into that node, and (2) when routing is externally required (all-to-all
fallback case) the transform does not drop external routing tensors and keeps
the all-to-all path intact; include the with_ep_mask True/False cases similar to
test_nvfp4_trtllm_gen_internal_routing to cover masking behavior. Ensure
assertions inspect node targets/args in the transformed FX graph to uniquely
identify fuse_nvfp4_moe and trtllm_nvfp4_trtllm_gen_moe_fused behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d26d551d-4218-4461-a57e-0bd2562bbf93

📥 Commits

Reviewing files that changed from the base of the PR and between 9547230 and 39322a9.

📒 Files selected for processing (5)
  • examples/auto_deploy/model_registry/configs/super_v3.yaml
  • tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/trtllm_moe.py
  • tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py
  • tensorrt_llm/_torch/auto_deploy/transform/library/multi_stream_moe.py
  • tests/unittest/auto_deploy/singlegpu/transformations/library/test_moe_fusion.py

Comment thread tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47733 [ run ] triggered by Bot. Commit: 39322a9 Link to invocation

Copy link
Copy Markdown
Collaborator

@galagam galagam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py Outdated
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47733 [ run ] completed with state SUCCESS. Commit: 39322a9
/LLM/main/L0_MergeRequest_PR pipeline #37628 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Route matched noaux NVFP4 MoE patterns to TRTLLM-Gen internal routing by default when possible.

Keep the external-routing path for all-to-all cases, preserve routing_bias dtype for TRTLLM-Gen, and cover direct and EP-masked matcher shapes.

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
@tcherckez-nvidia tcherckez-nvidia force-pushed the route-nvfp4-moe-to-trtllmgen branch from 39322a9 to 66129fd Compare May 12, 2026 05:54
@tcherckez-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47892 [ run ] triggered by Bot. Commit: 66129fd Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47892 [ run ] completed with state SUCCESS. Commit: 66129fd
/LLM/main/L0_MergeRequest_PR pipeline #37743 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Comment thread tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py Outdated
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
@tcherckez-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47961 [ run ] triggered by Bot. Commit: 341c725 Link to invocation

@tcherckez-nvidia tcherckez-nvidia enabled auto-merge (squash) May 12, 2026 14:39
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47961 [ run ] completed with state SUCCESS. Commit: 341c725
/LLM/main/L0_MergeRequest_PR pipeline #37802 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@tcherckez-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48099 [ run ] triggered by Bot. Commit: 341c725 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48099 [ run ] completed with state SUCCESS. Commit: 341c725
/LLM/main/L0_MergeRequest_PR pipeline #37929 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@tcherckez-nvidia
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48135 [ run ] triggered by Bot. Commit: 341c725 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #48135 [ run ] completed with state SUCCESS. Commit: 341c725
/LLM/main/L0_MergeRequest_PR pipeline #37960 completed with status: 'SUCCESS'

CI Report

Link to invocation

@tcherckez-nvidia tcherckez-nvidia merged commit 2fd65a5 into NVIDIA:main May 13, 2026
7 checks passed
@tcherckez-nvidia tcherckez-nvidia deleted the route-nvfp4-moe-to-trtllmgen branch May 13, 2026 11:21
greg-kwasniewski1 added a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 15, 2026
…p4_trtllm_gen_moe_fused

The custom_op-decorated real impl of trtllm_nvfp4_trtllm_gen_moe_fused
lost its batch_info_host parameter when main was merged into this branch
(merge d8f3517 pulled in NVIDIA#13997's TRTLLM-Gen internal-routing rewrite,
which dropped batch_info_host while introducing router_logits/routing_bias/
top_k/n_group/topk_group/routed_scaling_factor). The register_fake impl
kept batch_info_host, so the registered schema and the fake schema went
out of sync.

The MoE call site (added by this PR's runtime-max-tokens feature) passes
batch_info_host uniformly to every MoE op. The dispatcher rejects it on
trtllm_nvfp4_trtllm_gen_moe_fused with:

  RuntimeError: Unknown keyword argument 'batch_info_host' for operator
    'auto_deploy::trtllm_nvfp4_trtllm_gen_moe_fused'.

surfaced by TestNemotronSuperV3::test_accuracy[nvfp4-4-attn_dp_on-trtllm]
on DGX_B200-4_GPUs-AutoDeploy-1.

Re-add batch_info_host to the real-impl signature in the same position
the fake impl already has it, and forward it into the underlying
_trtllm_nvfp4_trtllm_gen_moe_impl helper (which already accepts and
threads it through).

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants