Adding support for fused_moe_gmm by NicoGrande · Pull Request #3627 · AI-Hypercomputer/maxtext

NicoGrande · 2026-04-09T22:18:56Z

Description

This PR adds support for the tpu-inferece fused_moe_gmm kernel in the MaxText MoE inference codepath. Initial results using this kernel show up to ~4x generation throughput increase when testing with qwen3-30b-a3b.

Additionally, this PR introduces a second optimization to MaxText which pre-fuses the MoE weight kernels such that they can be efficiently passed into the fused_moe_gmm kernel. We show the impact of these optimizations below with autoregressive generation step times:

Baseline (MaxText sparse_matmul MoE): 28.353 ms

Fused MoE (prefuse_moe_weights=False): 20.432 ms

Fused MoE (prefuse_moe_weights=True): 6.114 ms

Tests

This PR adds new tests to tests/unit/moe_test.py. Additionally this PR was tested e2e with both qwen3-30b-a3b and qwen3-235b-a22b.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-04-09T22:25:03Z

Codecov Report

❌ Patch coverage is 45.07042% with 39 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/layers/moe.py	23.33%	20 Missing and 3 partials ⚠️
src/maxtext/utils/model_creation_utils.py	60.97%	13 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

RissyRan

Overall LGTM. Could you add your tests to showcase the functionality? Thanks!

gobbleturk

Awesome!

NuojCheng

LGTM

NicoGrande force-pushed the nicogrande/fused-moe-gmm branch 2 times, most recently from 4a22680 to 814348f Compare April 9, 2026 22:32

RissyRan reviewed Apr 9, 2026

View reviewed changes

Comment thread tests/unit/fused_moe_test.py Outdated

NicoGrande force-pushed the nicogrande/fused-moe-gmm branch 4 times, most recently from e2a24f7 to b5d61be Compare April 10, 2026 17:56

NicoGrande requested review from gpolovets1 and mitalisi as code owners April 10, 2026 17:56

NicoGrande requested review from Lumosis, jrplatin, mailvijayasingh and patemotter as code owners April 10, 2026 17:56