Skip to content

Optimize SM90 MegaMoE split path#1

Merged
AichenF merged 1 commit into
megamoe_sm90from
sm90-megamoe-split-optimize
Jun 1, 2026
Merged

Optimize SM90 MegaMoE split path#1
AichenF merged 1 commit into
megamoe_sm90from
sm90-megamoe-split-optimize

Conversation

@liz-badada

@liz-badada liz-badada commented May 31, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Enable optimized SM90 MegaMoE split L1/L2 path.
  • Centralize retained SM90 split heuristics.
  • Add split-vs-one correctness and benchmark support.

Performance

M Final us vs DeepEP vs Original vs Prev two-kernel
8 745.5 2.04x 1.26x 1.26x
16 769.1 2.11x 1.44–1.60x 1.50x
32 760.6 2.17x 1.50x 1.70x
64 791.0 2.08x 1.45–1.51x 1.53x
128 818.5 2.04x 1.28x 1.50x
256 1121.3 1.55x 1.43x 1.45x
260 1222.0 1.42x 1.30x 1.27x
512 1921.7 1.32–1.33x 1.26–1.43x 1.34x
819 2798.4 1.20x 1.38–1.42x 1.41x
1024 3278.0 1.29x 1.34–1.52x 0.99x
2048 6098.0 1.20x 1.34–1.37x 0.98x
4096 11430.0 1.16x 1.37–1.38x 1.00x
8192 22286.0 1.14x 1.39x 1.02x

H20

M latency us achieved TFLOPS TC SOL roofline bound us roofline SOL bound
8 745.5 7.6 2.6% 255.5 34.3% HBM/overhead
16 769.1 14.7 5.0% 289.3 37.6% HBM/overhead
32 760.6 29.6 10.0% 295.0 38.8% HBM/overhead
64 791.0 57.0 19.3% 296.5 37.5% HBM/overhead
128 818.5 110.2 37.2% 304.7 37.2% compute
256 1121.3 160.9 54.3% 609.4 54.3% compute
260 1222.0 149.9 50.7% 618.9 50.7% compute
512 1921.7 187.7 63.4% 1218.8 63.4% compute
819 2798.4 206.2 69.7% 1949.7 69.7% compute
1024 3278.0 220.1 74.4% 2437.7 74.4% compute
2048 6098.0 236.7 80.0% 4875.4 80.0% compute
3072 8746.0 247.5 83.6% 7313.1 83.6% compute
4096 11430.0 252.5 85.3% 9750.7 85.3% compute
8192 22286.0 259.0 87.5% 19501.5 87.5% compute

Correctness

Case Result
Main random M8/M16/M32/M64/M128/M256/M260/M512/M819/M1024/M2048/M3072/M4096/M8192 max_abs=0
Hopper random M512/M1024/M2048/M3072/M4096 max_abs=0
Main power-law M1024 / masked M1024 / EPLB hash M2048 max_abs=0
Hopper power-law M2048 max_abs=0

@AichenF AichenF merged commit aa901ca into megamoe_sm90 Jun 1, 2026
AichenF added a commit that referenced this pull request Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants