[Others]【Hackathon 10th Spring No.49】【RFC V2.0】投机解码 ngram GPU 并行 Kernel 设计#1295
Open
cloudforge1 wants to merge 3 commits intoPaddlePaddle:masterfrom
Open
[Others]【Hackathon 10th Spring No.49】【RFC V2.0】投机解码 ngram GPU 并行 Kernel 设计#1295cloudforge1 wants to merge 3 commits intoPaddlePaddle:masterfrom
cloudforge1 wants to merge 3 commits intoPaddlePaddle:masterfrom
Conversation
V2.0 inline override of existing RFC: - §三 expanded from 12 lines to complete two-phase parallel architecture - Added Phase 1/Phase 2 kernel design, atomicMin64, CUB BlockScan - Added template specialization (PR #7136), early-exit, scratch cache - Added §四 with 27-config CI benchmark data (1.27x-1700x speedup) - Updated metadata: V2.0, dual-author (NKNaN V1.0 + cloudforge1 V2.0)
- Remove 实现PR row (no merged RFC has this) - Remove 依赖飞桨版本 row (not in merged RFCs) - Remove hackathon tag from 任务名称 (matches NKNaN V1.0 + fuzhenxin) - Section structure: 一概述→二现状→三设计→四测试→五排期→六影响面+参考 Matches fuzhenxin (merged) and deepseek (merged) patterns
- IP notice: 21→19 µs, 722→1,885× speedup - §五.2: Replace old scaling numbers with latest #7136 CI data - Production (bsz=32): 939→276 CPU, 661→19 GPU, 1.42→14.17× - Extreme (bsz=256, 131K): 275ms→284ms CPU, 162→151 GPU, 1,700→1,885× - Added high-concurrency (bsz=512): 72,640 CPU, 71 GPU, 1,030× - 27→33 benchmark configs across 7 dimensions - §五.3: Added reviewer expanded criteria (issue #7200) - §七: Updated impact with specific production/extreme numbers - §二: Condensed raw C++ to pseudo-code summary - Added issue #7200 reference, V2.0 changelog
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
概要
V2.0 对已合入 RFC(NKNaN V1.0, PR #1213)的内联更新,补充完整的 GPU 并行 Kernel 设计方案。
修改内容
对
20260207_refine_speculate_decoding_ngram_for_fastdeploy.md进行 inline override:关联 PR