[None][doc] Refactor blog18 by bobboli · Pull Request #13956 · NVIDIA/TensorRT-LLM

bobboli · 2026-05-10T14:48:34Z

Summary

Restructures the Performance Benchmark section of blog18 into focused subsections (Methodology / Scaling With EP Size / Post-Quant Dispatch / Latency Floor / Reproduction) and adds new MXFP8 and NVFP4 results so the post-quant story is no longer hypothetical.

Methodology now spells out `bytes_per_token` per recipe (BF16 / MXFP8 / NVFP4) and clarifies that the reported bandwidth is logical — includes the local-rank fraction of traffic — matching the convention used by other MoE comm libraries.
Scaling With EP Size retains the BF16 ep ∈ {8, 16, 32, 64} sweep with corrected GB/s numbers (the previous tables conflated FP8 and BF16 byte counts on the dispatch column; both phases ship BF16 since blockwise FP8 has no post-quant dispatch path today).
Post-Quant Dispatch (new) — MXFP8 hits 1.81× speedup vs BF16 at ep=8 / bsz=2048; NVFP4 hits 3.06×, both close to their byte-ratio asymptotes. Includes a new `tech_blog18_post_quant_dispatch.png` chart.

Bandwidth chart re-rendered as a landscape side-by-side panel using BF16 byte counts throughout. Adds reference figures for quant formats and the dispatch-MoE-combine R0 detail; re-renders the rank-major vs expert-major figure.

Test plan

Markdown structure verified (no broken anchors / TOC consistent).
Numbers cross-checked against `tests/microbenchmarks/bench_moe_comm.py` JSON output for ep=8 BF16 / MXFP8 / NVFP4 runs.
Reviewer to spot-check the chart against the table values.

Summary by CodeRabbit

Documentation
- Updated blog article on MoE communication optimization with refined terminology and improved framework descriptions.
- Enhanced performance benchmarking section with updated bandwidth measurements and comprehensive methodology details.
- Expanded discussion of dispatch optimization techniques with updated performance metrics.
- Restructured sections for improved clarity and navigation.

coderabbitai · 2026-05-10T14:51:56Z

📝 Walkthrough

Walkthrough

This PR updates a technical blog post documenting NVIDIA's NVLink one-sided AlltoAll optimization for MoE communication. Changes include renaming concepts (push/pull instead of dispatch/combine), introducing expanded raw-token data layout explanation, rewriting performance benchmarking with detailed methodology and updated metrics, adding post-quantization dispatch analysis, and restructuring future work guidance.

Changes

MoE Communication Blog Article Update

Layer / File(s)	Summary
Navigation and Terminology Foundations `docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md` (lines 14–28, 43)	Table of contents restructured to add "Raw Token Data Layout," "Quantization-Agnostic Communication," methodology, scaling, post-quant dispatch, latency floor, and reproduction sections. Design overview updated to reference "raw-token data layout" rather than "token-major."
Core Communication Concepts `docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md` (lines 61–81)	One-sided communication section reframed with push (dispatch) and pull (combine) semantics. Raw token data layout expanded with explanation of token delivery, deduplication behavior for multiple experts on same rank, and smaller recv buffer requirements.
Interface and Mechanism Details `docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md` (lines 113–140)	Interface description updated to reference raw-token layout for recv buffer allocation. "Dispatch Put and Combine Get" section renamed to "Dispatch Push and Combine Pull" with expanded description of atomic-based slot assignment, deduplication, combine's reuse of routing for weighted reduction, and zero-copy path where MoE output writes directly to symmetric workspace.
Performance Methodology and Analysis `docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md` (lines 169–211, 272–320)	Performance benchmarking completely rewritten with detailed methodology including `bytes_per_token` table for BF16, MXFP8, and NVFP4; clarified bandwidth calculation and timing scope; refreshed dispatch/combine latency and bandwidth tables for ep_size (8). Added post-quantization dispatch section with new recipe comparison table, speedup and GB/s observations. Updated latency floor narrative with quantified statement that synchronization accounts for ~40% of dispatch time at batch size 1, decreasing to ~7% at batch size 2048.
Reproduction and Future Work `docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md` (lines 321–341)	Reproduction section updated and reformatted. Future work and conclusion sections restructured with explicit future-work bullet points and updated description of NVLinkOneSided AlltoAll's role as default communication strategy within single NVLink domain in TensorRT-LLM.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The PR title '[None][doc] Refactor blog18' is vague and generic. While it indicates a documentation change to blog18, it does not convey the specific nature of the changes (performance section restructuring, post-quant dispatch results, benchmark updates).	Consider a more descriptive title like '[doc] Restructure blog18 perf section + add post-quant dispatch results' to better reflect the main changes and help reviewers understand the scope of the update.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description clearly explains the changes, objectives, and test plan for the blog restructuring and benchmark updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md`:
- Around line 175-177: The fenced code block containing the formula "bandwidth =
batch_size × min(ep_size, top_k) × bytes_per_token / latency" lacks a language
identifier; update the block delimiter from ``` to include a language (e.g.,
```text or ```python) so Markdown lint (MD040) and syntax highlighting work
correctly for the formula, ensuring the line with the variables bandwidth,
batch_size, ep_size, top_k, bytes_per_token, and latency remains unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7acf4322-7919-4306-abf6-3e7ba1d94713

📥 Commits

Reviewing files that changed from the base of the PR and between afe1a31 and fab1e30.

⛔ Files ignored due to path filters (9)

docs/source/blogs/media/tech_blog18_bandwidth.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_dispatch_moe_combine.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_dispatch_moe_combine_R0.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_one_sided_vs_two_sided.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_post_quant_dispatch.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_quant_formats.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_rank_major_vs_expert_major.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_raw_tokens_vs_permuted_tokens.png is excluded by !**/*.png
docs/source/blogs/media/tech_blog18_token_major_vs_expert_major.png is excluded by !**/*.png

📒 Files selected for processing (1)

docs/source/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md

bobboli · 2026-05-10T18:02:19Z

/bot run

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

tensorrt-cicd · 2026-05-10T18:07:58Z

PR_Github #47610 [ run ] triggered by Bot. Commit: 59590cc Link to invocation

tensorrt-cicd · 2026-05-10T18:49:56Z

PR_Github #47610 [ run ] completed with state SUCCESS. Commit: 59590cc
/LLM/main/L0_MergeRequest_PR pipeline #37516 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli requested a review from a team as a code owner May 10, 2026 14:48

bobboli requested review from QiJune and arysef May 10, 2026 14:48

github-actions Bot assigned bobboli May 10, 2026

coderabbitai Bot reviewed May 10, 2026

View reviewed changes

Comment thread ...e/blogs/tech_blog/blog18_Optimizing_MoE_Communication_with_One_Sided_AlltoAll_Over_NVLink.md Outdated

nv-guomingz approved these changes May 10, 2026

View reviewed changes

bobboli changed the title ~~[None][doc] Restructure blog18 perf section + add post-quant dispatch results~~ [None][doc] Refactor blog18 May 10, 2026

bobboli force-pushed the update_blog18_alltoall branch from 6e62b8c to ea43673 Compare May 10, 2026 18:01

[None][doc] refactor blog18

59590cc

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

bobboli force-pushed the update_blog18_alltoall branch from ea43673 to 59590cc Compare May 10, 2026 18:03

bobboli merged commit 944b7eb into NVIDIA:main May 11, 2026
7 of 10 checks passed

yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request May 19, 2026

[None][doc] Refactor blog18 (NVIDIA#13956)

5f9c0f2

Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][doc] Refactor blog18#13956

[None][doc] Refactor blog18#13956
bobboli merged 1 commit into
NVIDIA:mainfrom
bobboli:update_blog18_alltoall

bobboli commented May 10, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 10, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

bobboli commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bobboli commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bobboli commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

tensorrt-cicd commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bobboli commented May 10, 2026 •

edited

Loading

coderabbitai Bot commented May 10, 2026 •

edited

Loading