[LLM] Support fuse attention q, k, v weights #8202

DrownFish19 · 2024-03-28T03:02:45Z

PR types

New features

PR changes

APIs and Models

Description

Support fuse attention q, k, v weights

fuse_attention_qkv

fuse_attention_ffn
normal fuse weights

性能测试（llama-7b）

分布式	原始加载时间	fuse加载时间
DP	20	23
PP4	5	7
TP2	10	23

1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch

paddle-bot · 2024-03-28T03:02:49Z

Thanks for your contribution!

codecov · 2024-03-28T03:28:50Z

Codecov Report

Attention: Patch coverage is 94.34783% with 13 lines in your changes are missing coverage. Please review.

Project coverage is 55.45%. Comparing base (beb433a) to head (f6f3b0e).
Report is 2 commits behind head on develop.

Files	Patch %	Lines
paddlenlp/transformers/conversion_utils.py	94.82%	6 Missing ⚠️
paddlenlp/transformers/opt/modeling.py	81.81%	4 Missing ⚠️
paddlenlp/transformers/model_utils.py	91.89%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #8202      +/-   ##
===========================================
+ Coverage    55.33%   55.45%   +0.12%     
===========================================
  Files          614      614              
  Lines        95341    95570     +229     
===========================================
+ Hits         52753    52999     +246     
+ Misses       42588    42571      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…dev-fuse-qkv

This reverts commit 113b883.

paddlenlp/transformers/conversion_utils.py

paddlenlp/transformers/gpt/modeling.py

ZHUI · 2024-04-17T03:45:13Z

简单测试 llama 7b 加载速度。

ZHUI · 2024-04-17T03:48:16Z

LoRA 适配

ZHUI

LGTM

ziangqin-baidu and others added 4 commits March 18, 2024 08:06

1. add use-interface & fuse action

d859826

1.1. modify 1., code order 2. switch to name_mapping 3. solve tp branch 3.2 follow hui, handel qkv separately 3.3 handle pdparams 3.4 from torch 3.5 abandon low_cpu_mem_usage 3.6 solve shard branch

3.6.1 solve shard branch after rebase develop

f7b6973

code clean

05507a7

Merge remote-tracking branch 'qinziang/fuse' into dev-fuse-qkv

b8b828b

DrownFish19 added 2 commits March 29, 2024 12:20

remove debug comment

28ed30f

Redefine fuse and split functions

1a1591a

DrownFish19 force-pushed the dev-fuse-qkv branch 5 times, most recently from 0fd4d1f to 753c980 Compare April 1, 2024 11:25

Redefine fuse and split functions

ef1fb18

DrownFish19 force-pushed the dev-fuse-qkv branch from 753c980 to ef1fb18 Compare April 1, 2024 11:26

DrownFish19 added 3 commits April 2, 2024 07:09

Merge branch 'dev-fuse-qkv' of github.com:DrownFish19/PaddleNLP into …

f56e602

…dev-fuse-qkv

comment and fix

19faa26

update method

4a92dde

DrownFish19 marked this pull request as ready for review April 2, 2024 12:04

DrownFish19 added 11 commits April 9, 2024 10:18

update QKV fuse and split

c0a71d2

support fuse weights in multi-files

d1e7d17

add precision compare

58a2c23

simplify function call

7e1ab08

support use_fast_ffn

0458f4e

clean modeling and configuration

920155b

add test for gpt and opt

a90fcfc

fix tp_actions get

774bb6f

Merge branch 'PaddlePaddle:develop' into dev-fuse-qkv

057198c

Merge branch 'PaddlePaddle:develop' into dev-fuse-qkv

db78c6a

Merge branch 'PaddlePaddle:develop' into dev-fuse-qkv

a1aa078

DrownFish19 added 5 commits April 12, 2024 10:43

add fast_ffn test

f0d19f6

Merge branch 'dev-fuse-qkv' of github.com:DrownFish19/PaddleNLP into …

110983d

…dev-fuse-qkv

add Qwen2Moe

113b883

Revert "add Qwen2Moe"

1fb0a7d

This reverts commit 113b883.

Merge branch 'PaddlePaddle:develop' into dev-fuse-qkv

a118f1b

ZHUI reviewed Apr 17, 2024

View reviewed changes

paddlenlp/transformers/conversion_utils.py Outdated Show resolved Hide resolved

paddlenlp/transformers/conversion_utils.py Outdated Show resolved Hide resolved

paddlenlp/transformers/gpt/modeling.py Outdated Show resolved Hide resolved

DrownFish19 added 3 commits April 17, 2024 09:21

add test for split

11d0793

update doc

0c98a45

update filter_dict_keys

f6f3b0e

ZHUI approved these changes Apr 25, 2024

View reviewed changes

ZHUI merged commit f29a7b9 into PaddlePaddle:develop Apr 25, 2024
8 of 10 checks passed

DrownFish19 deleted the dev-fuse-qkv branch April 29, 2024 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Support fuse attention q, k, v weights #8202

[LLM] Support fuse attention q, k, v weights #8202

DrownFish19 commented Mar 28, 2024 •

edited

paddle-bot bot commented Mar 28, 2024

codecov bot commented Mar 28, 2024 •

edited

ZHUI commented Apr 17, 2024

ZHUI commented Apr 17, 2024

ZHUI left a comment

[LLM] Support fuse attention q, k, v weights #8202

[LLM] Support fuse attention q, k, v weights #8202

Conversation

DrownFish19 commented Mar 28, 2024 • edited

PR types

PR changes

Description

paddle-bot bot commented Mar 28, 2024

codecov bot commented Mar 28, 2024 • edited

Codecov Report

ZHUI commented Apr 17, 2024

ZHUI commented Apr 17, 2024

ZHUI left a comment

Choose a reason for hiding this comment

DrownFish19 commented Mar 28, 2024 •

edited

codecov bot commented Mar 28, 2024 •

edited