feat(blog): add MI355X vs B200 GLM-5 FP8 SGLang post by functionstackx · Pull Request #378 · SemiAnalysisAI/InferenceX-app

functionstackx · 2026-05-25T23:08:21Z

Summary

New blog post: AMD MI355X SGLang FP8 undercuts NVIDIA B200 SGLang FP8 per million tokens on GLM-5, 14 weeks after the model's 2026-02-11 release
Peak gap: 1.41x at 18 tok/s/user with MTP (40% cheaper) and 1.36x at 10 tok/s/user without MTP on the 8k/1k workload, single-node
Walks through sgl-project/sglang#21511 (HaiShaw): FP8 KV cache + FP8 attention via TileLang, reusing fused_qk_rope_cat_and_cache_mla for both Q and KV quant on MI355
Covers GLM-5 architecture (744B/40B active, 256 experts top-8, glm_moe_dsa, DSA + MLA, 200K ctx)
All tables sourced from InferenceX 2026-05-20 run (g_runid=26187777287); chart preset linked from both DashboardCTA blocks

Test plan

Visual check on local dev server (pnpm dev → /blog/mi355x-glm5-fp8-sglang-40-cheaper-than-b200)
Verify chart preset link resolves to the correct GLM-5 FP8 view with i_metric=y_costh and the four series active
Confirm OG image renders and RSS feed picks up the post
Sanity-check the GLM-5 parameter counts (744B/40B, 256 experts) against the official ZAI announcement
Confirm the 2026-05-20 chart numbers in the iso-interactivity table still match once a newer dump publishes

🤖 Generated with Claude Code

Note

Low Risk
Content-only additions (documentation skill and static MDX); no application logic, auth, or data pipeline changes.

Overview
Adds a Claude skill (.claude/skills/write-inferencex-blog/SKILL.md) that documents how to draft InferenceX benchmark posts—source-of-truth priority (CSV vs chart), TCO/cost formulas, slug/frontmatter, MDX sections (DashboardCTA, Figure, FAQ JsonLd), and commit/PR workflow—and points at this post as the AMD-vs-NVIDIA single-node cost template.

Publishes a new MDX article at packages/app/content/blog/mi355x-glm5-fp8-sglang-40-cheaper-than-b200.mdx claiming MI355X SGLang FP8 on GLM-5 8k/1k is up to 40% cheaper per million tokens than B200 (peak 1.41x with MTP at 18 tok/s/user), tied to SGLang PR #21511 and InferenceX PR #1440, with per-concurrency tables, iso-interactivity comparisons (including where B200 wins above ~90 tok/s/user), preset dashboard links, and five FAQ JSON-LD entries.

^{Reviewed by Cursor Bugbot for commit c2f98a5. Bugbot is set up for automated code reviews on this repo. Configure here.}

14 weeks after GLM-5's release, MI355X SGLang FP8 undercuts B200 SGLang FP8 per million tokens across the single-node Pareto on 8k/1k — peak 1.41x with MTP at 18 tok/s/user, 1.36x non-MTP at 10 tok/s/user. Walks through SGLang PR #21511 (HaiShaw) fusing QK rope cat + MLA cache + FP8 quant on MI355 via TileLang. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-25T23:08:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
inferencemax-app	Ready	Preview, Comment	May 25, 2026 11:22pm

Removes the redundant kernel-fusion recap (already covered in the "What Shipped to Make This Happen" section) and lifts the MI355X capability sentence into its own paragraph for clearer pacing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…kill Removes the stray blank line between the MTP iso-interactivity table header and its data rows that was preventing markdown from parsing them as a table (rendering all rows as a single pipe-delimited paragraph instead). Also adds .claude/skills/write-inferencex-blog/SKILL.md, codifying the structure, numeric-verification workflow, frontmatter, MDX components, dashboard-link conventions, and FAQ JSON-LD pattern that this PR's post follows — so future InferenceX blog posts can be authored against a consistent template. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

B200 ran on lmsysorg/sglang:v0.5.12-cu130; MI355X ran on lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

There is no MI355X GLM-5 disagg or wide-EP recipe yet. Updates both the What's Next bullet and the matching FAQ answer to state the gap directly rather than implying a recipe exists but underperforms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…llout Replaces "playbook exists" framing with the direct statement that AMD has still not shipped disagg for GLM-5. Applied to both the bullet and the matching FAQ answer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Data run date (2026-05-20) stays as-is in the body since that's when the InferenceX measurement happened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit e8a9524. Configure here.}

1. Soften "across the entire Pareto" claim in lede and subtitle to "across most of the Pareto" with the ~10-77 tok/s/user range called out explicitly. The MTP table already shows B200 noses ahead above ~90 tok/s/user. 2. Correct "TP=4 dominates across the whole range" in the iso-interactivity intro — TP=4 dominates up to ~77 tok/s/user; TP=8 conc 4 takes over at ~90 tok/s/user where TP=4 can't reach. 3. Fix FAQ overstatement: MTP "roughly doubles" -> "lifts ~1.34x" on the cited concurrency 32 data point (1,274 -> 1,707 tok/s/GPU). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 23:08 View deployment

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread packages/app/content/blog/mi355x-glm5-fp8-sglang-40-cheaper-than-b200.mdx Outdated

vercel Bot deployed to Preview May 25, 2026 23:11 View deployment

vercel Bot deployed to Preview May 25, 2026 23:12 View deployment

edit(blog): drop trailing summary sentence after What's Next bullets

efbda14

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 23:13 View deployment

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread packages/app/content/blog/mi355x-glm5-fp8-sglang-40-cheaper-than-b200.mdx Outdated

Comment thread packages/app/content/blog/mi355x-glm5-fp8-sglang-40-cheaper-than-b200.mdx Outdated

edit(blog): name SGLang v0.12 in the lede and methods note

f9701dd

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 23:15 View deployment

edit(blog): list exact SGLang v0.5.12 container image tags used

f8357b0

B200 ran on lmsysorg/sglang:v0.5.12-cu130; MI355X ran on lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260517. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 23:16 View deployment

functionstackx and others added 3 commits May 25, 2026 19:16

edit(blog): set publish date to 2026-05-25

e8a9524

Data run date (2026-05-20) stays as-is in the body since that's when the InferenceX measurement happened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 25, 2026 23:18 View deployment

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread packages/app/content/blog/mi355x-glm5-fp8-sglang-40-cheaper-than-b200.mdx

functionstackx enabled auto-merge (squash) May 25, 2026 23:19

functionstackx merged commit 09dc863 into master May 25, 2026
14 of 15 checks passed

functionstackx deleted the glm5-mi355-vs-b200 branch May 25, 2026 23:21

vercel Bot deployed to Preview May 25, 2026 23:22 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blog): add MI355X vs B200 GLM-5 FP8 SGLang post#378

feat(blog): add MI355X vs B200 GLM-5 FP8 SGLang post#378
functionstackx merged 10 commits into
masterfrom
glm5-mi355-vs-b200

functionstackx commented May 25, 2026 •

edited by cursor Bot

Loading

Uh oh!

vercel Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented May 25, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented May 25, 2026 •

edited by cursor Bot

Loading

vercel Bot commented May 25, 2026 •

edited

Loading