[Optim] The gemm of w4afp8 adopts an adaptive N #5853

yangjianfengo1 · 2026-01-04T01:55:46Z

Motivation

w4afp8的gemm 当token数小于256的时候会launch N=256的gemm，存在非常大的计算浪费，我们现在会launch 3个gemm，分别是N=64，128，256，当token数小于64的时候，调用N=64的gemm，大于64小于128的时候，调用N=128的gemm，否则调用N=256的gemm

Modifications

Usage or Command

接口未改变，调用侧不用修改

Accuracy Tests

bench mark评估效果持平

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-01-04T01:55:51Z

Thanks for your contribution!

codecov-commenter · 2026-01-04T03:23:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@2785b82). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5853   +/-   ##
==========================================
  Coverage           ?   66.70%           
==========================================
  Files              ?      347           
  Lines              ?    44420           
  Branches           ?     6822           
==========================================
  Hits               ?    29632           
  Misses             ?    12604           
  Partials           ?     2184

Flag	Coverage Δ
GPU	`66.70% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gongshaotian

LGTM

opt w4afp8

7586683

yangjianfengo1 temporarily deployed to Metax_ci January 4, 2026 01:55 — with GitHub Actions Inactive

yangjianfengo1 changed the title ~~opt w4afp8~~ [Optim] w4afp8 Jan 4, 2026

yangjianfengo1 changed the title ~~[Optim] w4afp8~~ [Optim] When the token is small, use a gemm with a smaller N Jan 4, 2026

Merge branch 'develop' into 0104

01aba60

yangjianfengo1 had a problem deploying to Metax_ci January 4, 2026 05:45 — with GitHub Actions Failure

Merge branch 'develop' into 0104

36c2869

yangjianfengo1 temporarily deployed to Metax_ci January 5, 2026 12:09 — with GitHub Actions Inactive

gongshaotian approved these changes Jan 7, 2026

View reviewed changes

Sunny-bot1 approved these changes Jan 7, 2026

View reviewed changes

yangjianfengo1 changed the title ~~[Optim] When the token is small, use a gemm with a smaller N~~ [Optim] The gemm of w4afp8 adopts an adaptive N Jan 7, 2026

gongshaotian merged commit 59523b2 into PaddlePaddle:develop Jan 7, 2026
18 of 20 checks passed

yangjianfengo1 added a commit to yangjianfengo1/FastDeploy that referenced this pull request Jan 8, 2026

opt w4afp8 (PaddlePaddle#5853)

8f69fd7

Jiang-Jia-Jun pushed a commit that referenced this pull request Jan 8, 2026

opt w4afp8 (#5853) (#5938)

803c985

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Optim] The gemm of w4afp8 adopts an adaptive N #5853

[Optim] The gemm of w4afp8 adopts an adaptive N #5853

Uh oh!

yangjianfengo1 commented Jan 4, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Jan 4, 2026

Uh oh!

codecov-commenter commented Jan 4, 2026 •

edited

Loading

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Optim] The gemm of w4afp8 adopts an adaptive N #5853

[Optim] The gemm of w4afp8 adopts an adaptive N #5853

Uh oh!

Conversation

yangjianfengo1 commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Jan 4, 2026

Uh oh!

codecov-commenter commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yangjianfengo1 commented Jan 4, 2026 •

edited

Loading

codecov-commenter commented Jan 4, 2026 •

edited

Loading