Skip to content

Adding support for fused_moe_gmm#3627

Merged
copybara-service[bot] merged 1 commit intomainfrom
nicogrande/fused-moe-gmm
Apr 21, 2026
Merged

Adding support for fused_moe_gmm#3627
copybara-service[bot] merged 1 commit intomainfrom
nicogrande/fused-moe-gmm

Conversation

@NicoGrande
Copy link
Copy Markdown
Collaborator

@NicoGrande NicoGrande commented Apr 9, 2026

Description

This PR adds support for the tpu-inferece fused_moe_gmm kernel in the MaxText MoE inference codepath. Initial results using this kernel show up to ~4x generation throughput increase when testing with qwen3-30b-a3b.

Additionally, this PR introduces a second optimization to MaxText which pre-fuses the MoE weight kernels such that they can be efficiently passed into the fused_moe_gmm kernel. We show the impact of these optimizations below with autoregressive generation step times:

Baseline (MaxText sparse_matmul MoE): 28.353 ms

Fused MoE (prefuse_moe_weights=False): 20.432 ms

Fused MoE (prefuse_moe_weights=True): 6.114 ms

Tests

This PR adds new tests to tests/unit/moe_test.py. Additionally this PR was tested e2e with both qwen3-30b-a3b and qwen3-235b-a22b.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 45.07042% with 39 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/maxtext/layers/moe.py 23.33% 20 Missing and 3 partials ⚠️
src/maxtext/utils/model_creation_utils.py 60.97% 13 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@NicoGrande NicoGrande force-pushed the nicogrande/fused-moe-gmm branch 2 times, most recently from 4a22680 to 814348f Compare April 9, 2026 22:32
Copy link
Copy Markdown
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Could you add your tests to showcase the functionality? Thanks!

Comment thread tests/unit/fused_moe_test.py Outdated
@NicoGrande NicoGrande force-pushed the nicogrande/fused-moe-gmm branch 4 times, most recently from e2a24f7 to b5d61be Compare April 10, 2026 17:56
Comment thread src/maxtext/configs/inference/vllm.yml
Comment thread src/maxtext/configs/inference/vllm.yml
Comment thread src/maxtext/layers/moe.py
Comment thread src/maxtext/layers/moe.py Outdated
Comment thread tests/unit/moe_test.py
Copy link
Copy Markdown
Collaborator

@gobbleturk gobbleturk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@NicoGrande NicoGrande force-pushed the nicogrande/fused-moe-gmm branch from b5d61be to a6cfd99 Compare April 11, 2026 22:21
Comment thread src/maxtext/configs/inference/vllm.yml
@NicoGrande NicoGrande force-pushed the nicogrande/fused-moe-gmm branch 3 times, most recently from 57d1a60 to be95bcf Compare April 14, 2026 00:30
Comment thread src/maxtext/layers/moe.py
Comment thread tests/unit/moe_test.py Outdated
Copy link
Copy Markdown
Collaborator

@NuojCheng NuojCheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NicoGrande NicoGrande force-pushed the nicogrande/fused-moe-gmm branch from be95bcf to 611c5bc Compare April 21, 2026 21:43
@NicoGrande NicoGrande force-pushed the nicogrande/fused-moe-gmm branch from 611c5bc to d0a0744 Compare April 21, 2026 21:56
@copybara-service copybara-service Bot merged commit 3fc8a2b into main Apr 21, 2026
68 of 76 checks passed
@copybara-service copybara-service Bot deleted the nicogrande/fused-moe-gmm branch April 21, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants