Skip to content

silu_mul_fused kernel#2578

Merged
vgokhale merged 9 commits into
mainfrom
silu_mul_fusion
May 15, 2026
Merged

silu_mul_fused kernel#2578
vgokhale merged 9 commits into
mainfrom
silu_mul_fusion

Conversation

@Chi-Chu319
Copy link
Copy Markdown
Contributor

silu_mul_fused kernel for GLM-47-FP8 TP4, Kimi-K25 TP4

bench mark command:

python op_tests/op_benchmarks/triton/bench_moe.py -bench_silu_mul --model glm47fp8_tp4,kimik25_tp4 -no_bench_stage2 -dtype bf16

Benchmark result
image

@Chi-Chu319 Chi-Chu319 requested a review from juuso-oskari April 1, 2026 11:47
@Chi-Chu319 Chi-Chu319 marked this pull request as ready for review April 1, 2026 11:47
@Chi-Chu319 Chi-Chu319 requested a review from a team April 1, 2026 11:47
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2578 --add-label <label>

@Chi-Chu319
Copy link
Copy Markdown
Contributor Author

Closed because of #2592

@Chi-Chu319 Chi-Chu319 reopened this May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-300x Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
ci:sglang SGLang integration tests: DeepSeek-R1-MXFP4 accuracy, Qwen 3.5 accuracy
ci:atom ATOM benchmark: DeepSeek-R1-0528, GPT-OSS-120B
ci:atom_full ATOM accuracy suite for PR and main models from ATOM models_accuracy.json
ci:vllm vLLM benchmark: GPT-OSS-120B, DeepSeek-R1-0528, Kimi-K2.5
ci:all All standard extended tests (excludes ci:atom_full)

Only add ci:atom_full for FlyDSL or Triton upgrades.
Add labels via the sidebar or gh pr edit 2578 --add-label <label>

Chi-Chu319 and others added 2 commits May 15, 2026 05:45
…revious implementation. Update activation module to include new fused_silu_mul function. Adjust benchmarks and tests to utilize the new function. Remove deprecated fused_silu_mul files.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@vgokhale vgokhale merged commit 683b6e8 into main May 15, 2026
33 of 34 checks passed
@vgokhale vgokhale deleted the silu_mul_fusion branch May 15, 2026 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants