add swiglu a4w4 moe path for gpt-oss model#2972
Merged
Merged
Conversation
Contributor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
18cb812 to
cb1ae40
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a GPT-OSS-specific SwiGLU (and bias-aware) activation/gating path to support MXFP4 (a4w4) MoE flows, integrating it across the HIP activation kernels, FlyDSL split-K postprocessing, and fused_moe dispatch/config selection.
Changes:
- Introduces HIP kernels and pybind exports for
swiglu_and_mulplus bias-aware*_and_mul_biasvariants. - Extends FlyDSL stage1 post-processing (
silu_and_mul_fq) to supportact="swiglu"and optional per-expert fp32 bias usingtopk_ids. - Updates fused MoE dispatch/heuristics/configs to route GPT-OSS MXFP4 SwiGLU cases through FlyDSL/CK-Tile appropriately and adds tuned GPT-OSS fp4 fmoe configs.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| csrc/kernels/activation_kernels.cu | Adds SwiGLU and per-expert-bias activation kernels plus launch/dispatch plumbing. |
| csrc/include/rocm_ops.hpp | Exposes new activation entry points to Python via pybind. |
| csrc/include/activation.h | Declares new activation APIs for linkage/exports. |
| csrc/ck_tile_gemm_moe_2stages/moe_cktile2stages.cu | Adds dtype validation for optional fp32 expert bias (stage1/stage2) and output dtype checks. |
| csrc/ck_gemm_moe_2stages_codegen/gemm_moe_tune.py | Skips CK2stages codegen for unsupported SwiGLU MXFP4 cases (defers to FlyDSL/CK-Tile). |
| aiter/ops/flydsl/moe_kernels.py | Threads activation/bias options through stage1 split-K postprocessing and adds topk_ids plumbing. |
| aiter/ops/flydsl/kernels/silu_and_mul_fq.py | Generalizes fused postprocess kernel to act in {silu, swiglu} and optional bias/topk_ids. |
| aiter/ops/flydsl/kernels/mixed_moe_gemm_2stage.py | Refactors bias loads and aligns fp4 scale rounding with fp4_utils behavior. |
| aiter/ops/activation.py | Adds Python compile_ops stubs for the new activation APIs. |
| aiter/fused_moe.py | Adds GPT-OSS SwiGLU MXFP4 dispatch heuristics, bias normalization, and topk_ids propagation for split-K bias. |
| aiter/configs/model_configs/gptoss_fp4_tuned_fmoe.csv | Adds tuned FlyDSL configs for GPT-OSS fp4 SwiGLU MoE. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
coderfeli
reviewed
May 6, 2026
f70e557 to
1fe031a
Compare
Co-authored-by: Cursor <cursoragent@cursor.com>
de4bbcb to
f7daed5
Compare
coderfeli
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Add GPT-OSS SwiGLU MXFP4 MoE support in AITER.
GPT-OSS uses a SwiGLU MoE path with MXFP4 activations/weights and fp32 expert bias. The existing dispatch could fall back to unsupported CK2stages SwiGLU codegen for untuned shapes, and some paths did not correctly handle GPT-OSS gate/up layout or bias semantics.
Technical Details
swiglu_and_mulsilu_and_mul_biasswiglu_and_mul_biasactparameter.fp4_utilsbehavior.fused_moe.pydispatch for GPT-OSS MXFP4 SwiGLU:GPTOSS_USE_GENERIC_SWIGLU_MXFP4_LAYOUT=0behavior.256through32768.Test Plan
python3 -m py_compile aiter/fused_moe.py aiter/ops/flydsl/moe_kernels.py aiter/ops/flydsl/kernels/mixed_moe_gemm_2stage.pypython3 -m py_compile csrc/ck_gemm_moe_2stages_codegen/gemm_moe_tune.pyswiglu_and_mul_biaswith bf16 input + fp32 biassilu_and_mul_biaswith bf16 input + fp32 biasTest Result
Notes:
Current GPT-OSS tuned FlyDSL configs use ksplit=0, so bias is fused in the FlyDSL GEMM path for these tuned shapes. Split-k FlyDSL/CK-Tile paths keep bias in post activation because bias must be applied after K reduction and only once.
Submission Checklist