Skip to content

Conversation

@yangjianfengo1
Copy link
Contributor

@yangjianfengo1 yangjianfengo1 commented Jan 4, 2026

Motivation

w4afp8的gemm 当token数小于256的时候会launch N=256的gemm,存在非常大的计算浪费,我们现在会launch 3个gemm,分别是N=64,128,256,当token数小于64的时候,调用N=64的gemm,大于64小于128的时候,调用N=128的gemm,否则调用N=256的gemm

Modifications

Usage or Command

接口未改变,调用侧不用修改

Accuracy Tests

bench mark评估效果持平

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Jan 4, 2026

Thanks for your contribution!

@yangjianfengo1 yangjianfengo1 changed the title opt w4afp8 [Optim] w4afp8 Jan 4, 2026
@yangjianfengo1 yangjianfengo1 changed the title [Optim] w4afp8 [Optim] When the token is small, use a gemm with a smaller N Jan 4, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@2785b82). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #5853   +/-   ##
==========================================
  Coverage           ?   66.70%           
==========================================
  Files              ?      347           
  Lines              ?    44420           
  Branches           ?     6822           
==========================================
  Hits               ?    29632           
  Misses             ?    12604           
  Partials           ?     2184           
Flag Coverage Δ
GPU 66.70% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@gongshaotian gongshaotian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangjianfengo1 yangjianfengo1 changed the title [Optim] When the token is small, use a gemm with a smaller N [Optim] The gemm of w4afp8 adopts an adaptive N Jan 7, 2026
@gongshaotian gongshaotian merged commit 59523b2 into PaddlePaddle:develop Jan 7, 2026
18 of 20 checks passed
yangjianfengo1 added a commit to yangjianfengo1/FastDeploy that referenced this pull request Jan 8, 2026
Jiang-Jia-Jun pushed a commit that referenced this pull request Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants