[cpu]llama avx model inference supports #8634

bukejiyu · 2024-06-20T04:27:36Z

PR types

PR changes

Description

paddle inference_mode 集成xft cpu kernel
机器8463B
输入/输出 128/15 bs=1
静态图llama 测速 next_tokens: 100+ms
48线程动态图llama 测速 next_tokens: 70+ms

paddle-bot · 2024-06-20T04:27:40Z

Thanks for your contribution!

CLAassistant · 2024-06-20T04:27:41Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

395822456@qq.com seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov · 2024-06-20T04:57:52Z

Codecov Report

Attention: Patch coverage is 0% with 282 lines in your changes missing coverage. Please review.

Project coverage is 55.63%. Comparing base (65e721e) to head (b791375).
Report is 11 commits behind head on develop.

Files	Patch %	Lines
...dlenlp/experimental/transformers/llama/modeling.py	0.00%	133 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py	0.00%	101 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py	0.00%	48 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8634      +/-   ##
===========================================
- Coverage    55.80%   55.63%   -0.18%     
===========================================
  Files          620      620              
  Lines        96642    96940     +298     
===========================================
  Hits         53928    53928              
- Misses       42714    43012     +298

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

csrc/cpu/src/set_value_by_flags.cc

csrc/cpu/src/setup.py

llm/predict/predictor.py

paddlenlp/experimental/transformers/fused_transformer_layers.py

paddlenlp/experimental/transformers/llama/modeling.py

llm/predict/predictor.py

csrc/cpu/README.md

csrc/cpu/setup.sh

DesmonDay

LGTM

bukejiyu force-pushed the tmp_cpu branch 2 times, most recently from 426aa9d to c3c3d49 Compare June 25, 2024 04:20

bukejiyu changed the title ~~tmp support cpu~~ [cpu]llama avx model inference supports Jun 25, 2024

bukejiyu force-pushed the tmp_cpu branch from c3c3d49 to f5c1503 Compare June 25, 2024 06:49