supports llama-dybatch-V1 #6676

carryyu · 2023-08-10T03:38:39Z

PR types

New features

PR changes

Models

Description

supports llama-dybatch-V1

paddle-bot · 2023-08-10T03:38:44Z

Thanks for your contribution!

wj-Mcat

我觉得大部分的工作都非常棒，有几点想跟你讨论的。

另外，后面有时间可以也加一加相关单测，目前 paddlenlp 合入进去的相关东西一般都是要加的。

paddlenlp/ops/generation/encode_rotary_qk.cu

paddlenlp/ops/generation/setup_cuda.py

paddlenlp/transformers/fused_multi_transformer_fine_grained.py

paddlenlp/transformers/llama/modeling.py

paddlenlp/transformers/fused_multi_transformer_fine_grained.py

qingqing01

需要补充使用文档

llm/llama/dybatch/export_generation_model.py

paddlenlp/transformers/fused_multi_transformer_fine_grained.py

carryyu · 2023-08-14T14:28:54Z

需要补充使用文档

已添加

csrc/encode_rotary_qk.cu

wawltor · 2023-08-16T03:28:44Z

llm/llama/dybatch/README.md

@@ -0,0 +1,21 @@
+# LLaMA DyBatch


目前在LLM目录模型的使用方法基本得到统一，微调、预测、量化相关的脚本都是共用一套

动态插入的脚本是否可以得到统一

这部分可能不太好统一，后续还有各种量化方法，全部放到一起显得不够清晰，或者在主README里面加一下跳转链接这样呢

paddlenlp/transformers/fused_transformer_layers.py

llm/llama/dybatch/split_weight.py

heavengate · 2023-08-17T02:58:30Z

llm/llama/dybatch/utils.py

@@ -0,0 +1,247 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


这个建议放到paddlenlp/transformers目录，组织方式 @wj-Mcat 帮忙看下~

heavengate · 2023-08-18T06:11:54Z

llm/llama/dybatch/export_model.py

@@ -0,0 +1,147 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


这个目录下的文件看下能否删掉，在llm/llama目录的文件加一个--enable_dybatch的参数，通过分支来维护

heavengate

确认下是否是在tests/transformer/llama目录下添加单测

CLAassistant · 2023-08-18T11:39:43Z

All committers have signed the CLA.

codecov · 2023-08-18T12:20:31Z

Codecov Report

Merging #6676 (22df0d1) into develop (e0a9f4e) will decrease coverage by 0.35%.
Report is 1 commits behind head on develop.
The diff coverage is 0.44%.

❗ Current head 22df0d1 differs from pull request most recent head e276db4. Consider uploading reports for the commit e276db4 to get more accurate results

@@             Coverage Diff             @@
##           develop    #6676      +/-   ##
===========================================
- Coverage    60.85%   60.50%   -0.35%     
===========================================
  Files          534      539       +5     
  Lines        78870    79322     +452     
===========================================
+ Hits         47995    47996       +1     
- Misses       30875    31326     +451

Files Changed	Coverage Δ
paddlenlp/experimental/transformers/__init__.py	`0.00% <0.00%> (ø)`
...erimental/transformers/fused_transformer_layers.py	`0.00% <0.00%> (ø)`
...enlp/experimental/transformers/generation_utils.py	`0.00% <0.00%> (ø)`
...dlenlp/experimental/transformers/llama/__init__.py	`0.00% <0.00%> (ø)`
...dlenlp/experimental/transformers/llama/modeling.py	`0.00% <0.00%> (ø)`
paddlenlp/utils/import_utils.py	`85.71% <50.00%> (-0.96%)`	⬇️
paddlenlp/transformers/llama/modeling.py	`69.95% <100.00%> (ø)`

... and 1 file with indirect coverage changes

…tchV1-llama

heavengate · 2023-08-21T06:19:27Z

experimental/inference/llama/README.md

@@ -0,0 +1,19 @@
+# LLaMA Inference


按llm目录的组织方式，这里.sh文件删掉吧，文档按Python命令方式给一下，区分一下单卡和多卡

多卡权重拆分的统一脚本 @wj-Mcat 看下

按llm目录的组织方式，这里.sh文件删掉吧，文档按Python命令方式给一下，区分一下单卡和多卡

我在最新的 commit 当中已经删掉了。

heavengate · 2023-08-21T06:20:58Z

experimental/inference/llama/run.sh

+export FLAGS_new_executor_serial_run=1
+export FLAGS_allocator_strategy=naive_best_fit
+export FLAGS_fraction_of_gpu_memory_to_use=0.95
+export FLAGS_use_cutlass_fmha=1


非必要的flag建议删掉，如log相关

其余flag在README里简单说明一下作用

wj-Mcat · 2023-08-21T12:59:37Z

截止目前，完成了 InferenceModel：

单卡动态图、动转静和静态图验证
多卡动态图、动转静和静态图验证
REAMD 文档的调整
inferenceModel 的调整

wj-Mcat · 2023-08-21T13:13:10Z

paddlenlp/transformers/llama/modeling.py

+    if paddle.in_dynamic_mode():
+        y_is_distributed = y.is_distributed
+    else:
+        y_is_distributed = tensor_parallel_degree > 1


动态图下 y.is_distributed 为真实值，可是在静态图下y.is_distributed 一直为 False，于是会影响最终 Logits 的维度，从而影响解码的精度。

在此处针对于静态图做了一定的适配。

wawltor · 2023-08-22T01:36:49Z

llm/predictor.py

+            return None
+
+
+class DygraphInferencePredictor(BasePredictor):


这里的命名后续可以修改一下，之前的理解是dygraph表示动态图，inference表示静态图推理

自己记个TODO吧

这里的命名暂时没有比较优雅合适的名字：

DygraphInferencePredictor（中庸）

DygraphinferenceModelPredictor（太长）

DIPredictor（缩写，什么鬼）

大家有什么合适的名字也可以来参与讨论。

wawltor · 2023-08-22T01:37:16Z

llm/predictor.py

@@ -242,53 +250,296 @@ def _infer(self, inputs: dict[str, np.ndarray]):
        return decoded_ids


-def create_predictor(predictor_args: PredictorArgument, model_args: ModelArgument):
+class StaticInferencePredictor(BasePredictor):


同时这里区分动态batch和非动态batch的了

这里有两个 flag：

mode: dygraph, static

inference_model: bool 类型

通过以上两个 flag 来控制这四种情况。

dygraph -> dynamic

wawltor

LGTM

carryyu force-pushed the dybatchV1-llama branch from 0b35ec5 to 839692e Compare August 11, 2023 11:52

wj-Mcat requested changes Aug 14, 2023

View reviewed changes

qingqing01 reviewed Aug 14, 2023

View reviewed changes

paddlenlp/transformers/fused_multi_transformer_fine_grained.py Outdated Show resolved Hide resolved

qingqing01 reviewed Aug 14, 2023

View reviewed changes

llm/llama/dybatch/export_generation_model.py Outdated Show resolved Hide resolved

xiaoxiaohehe001 reviewed Aug 14, 2023

View reviewed changes

paddlenlp/transformers/fused_multi_transformer_fine_grained.py Outdated Show resolved Hide resolved

carryyu force-pushed the dybatchV1-llama branch from d43357a to 59467c6 Compare August 14, 2023 15:23

wawltor reviewed Aug 16, 2023

View reviewed changes

xiaoxiaohehe001 reviewed Aug 16, 2023

View reviewed changes

paddlenlp/transformers/fused_transformer_layers.py Outdated Show resolved Hide resolved

heavengate reviewed Aug 17, 2023

View reviewed changes

llm/llama/dybatch/split_weight.py Outdated Show resolved Hide resolved

heavengate reviewed Aug 17, 2023

View reviewed changes

xiaoxiaohehe001 mentioned this pull request Aug 17, 2023

Supports chatglm dybatch V1. #6757

Closed

heavengate reviewed Aug 18, 2023

View reviewed changes

carryyu force-pushed the dybatchV1-llama branch from 309fb6d to 5897009 Compare August 18, 2023 11:39

carryyu force-pushed the dybatchV1-llama branch from 5897009 to 5e31739 Compare August 18, 2023 11:41

supports DyBatch-V1

dcb4041

carryyu force-pushed the dybatchV1-llama branch from f2bb78d to dcb4041 Compare August 18, 2023 11:54

Merge branch 'develop' of github.com:PaddlePaddle/PaddleNLP into dyba…

a112305

…tchV1-llama

heavengate reviewed Aug 21, 2023

View reviewed changes

wj-Mcat added 5 commits August 21, 2023 07:18

update dybatch llama

22df0d1

complete predictor

76251ca

complete multi-gpus checking

d2926eb

complete predictor

73913b7

remove experiment fils

a9f7edf

update parallel_matmul y_distributed

e276db4

wj-Mcat reviewed Aug 21, 2023

View reviewed changes

wj-Mcat approved these changes Aug 21, 2023

View reviewed changes

wawltor reviewed Aug 22, 2023

View reviewed changes

wawltor approved these changes Aug 22, 2023

View reviewed changes

wawltor merged commit b3b650c into PaddlePaddle:develop Aug 22, 2023
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

supports llama-dybatch-V1 #6676

supports llama-dybatch-V1 #6676

carryyu commented Aug 10, 2023

paddle-bot bot commented Aug 10, 2023

wj-Mcat left a comment

qingqing01 left a comment

carryyu commented Aug 14, 2023

wawltor Aug 16, 2023

carryyu Aug 18, 2023

heavengate Aug 17, 2023

heavengate Aug 18, 2023

heavengate left a comment

CLAassistant commented Aug 18, 2023 •

edited

Loading

codecov bot commented Aug 18, 2023 •

edited

Loading

heavengate Aug 21, 2023

wj-Mcat Aug 21, 2023

heavengate Aug 21, 2023

wj-Mcat commented Aug 21, 2023

wj-Mcat Aug 21, 2023

wawltor Aug 22, 2023

wj-Mcat Aug 22, 2023

wawltor Aug 22, 2023

wj-Mcat Aug 22, 2023

wj-Mcat Aug 22, 2023

wawltor left a comment

		@@ -0,0 +1,247 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,147 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

supports llama-dybatch-V1 #6676

supports llama-dybatch-V1 #6676

Conversation

carryyu commented Aug 10, 2023

PR types

PR changes

Description

paddle-bot bot commented Aug 10, 2023

wj-Mcat left a comment

Choose a reason for hiding this comment

qingqing01 left a comment

Choose a reason for hiding this comment

carryyu commented Aug 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heavengate left a comment

Choose a reason for hiding this comment

CLAassistant commented Aug 18, 2023 • edited Loading

codecov bot commented Aug 18, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wj-Mcat commented Aug 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

CLAassistant commented Aug 18, 2023 •

edited

Loading

codecov bot commented Aug 18, 2023 •

edited

Loading