Support generation code for GPT-3 in static graph. #2188

qingqing01 · 2022-05-17T06:28:40Z

PR types

New features

PR changes

Models

Description

Support Sampling、TopKSampling、TopPSampling in static graph by While op. The generation results are verified by gpt2-medium-en models.

wangxicoding

LGTM

ZHUI · 2022-05-17T09:11:37Z

examples/language_model/gpt-3/static/modeling.py

@@ -744,10 +751,23 @@ def forward(self,
        embedding_output = self.embeddings(
            input_ids=input_ids, position_ids=position_ids)

+        causal_mask = paddle.tensor.triu(


@wangxicoding 此处是否小心下训练性能下降问题？或者类似上面加一下 if self.training

Done, refine code

ZHUI · 2022-05-17T09:14:41Z

examples/language_model/gpt-3/static/modeling.py

+            if parallel_output:
+                return logits
+
+            paddle.distributed.init_parallel_env()


Suggested change

paddle.distributed.init_parallel_env()

paddle.distributed.init_parallel_env()

init_parallel_env 放这里有额外的考虑吗，可否在模型启动时？
会不会出现调用多次 init_parallel_env？

去掉了，debug时加的，当前collective._c_concat不支持静态图，后续还需在框架内部修复bug。

ZHUI · 2022-05-17T09:16:50Z

examples/language_model/gpt-3/static/modeling.py

+        sorted_probs, sorted_idx = layers.argsort(probs, descending=True)
+        cum_sorted_probs = layers.cumsum(sorted_probs, axis=1, exclusive=True)


paddle.xxx API 如果可以替换 layers API的话，可以替换一下

argsort没有替换的原因: paddle.argsort只返回排序后索引信息，没有value，但内部计算有返回，只是Python API最后没有输出。参考代码: https://github.com/PaddlePaddle/Paddle/blob/af79273d97b678c2eefd55b48e3ef3352c15a921/python/paddle/tensor/search.py#L114

如果替换，还需调用调用gather取值。觉得多了一步计算。

layers.cumsum没有替换原因， paddle. cumsum没有exclusive参数，按照False的逻辑，我看PaddleNLP里的一些demo也直接用paddle. cumsum，为了效果期间，暂且没换。

ZHUI · 2022-05-17T09:20:41Z

examples/language_model/gpt-3/static/run_static.sh

@@ -12,7 +12,7 @@ rm -rf main_sharding*
 task_name="gpt-mp-sharding"
 rm -rf output/$task_name/log

-python -u  -m paddle.distributed.fleet.launch \
+python3.7 -u  -m paddle.distributed.fleet.launch \


Suggested change

python3.7 -u -m paddle.distributed.fleet.launch \

python -u -m paddle.distributed.fleet.launch \

ZHUI · 2022-05-17T09:52:17Z

#2190 刚merge了一个 gpt-3 的pr，注意冲突

…nto gpt-3-generation

ZHUI

LGTM

qingqing01 added 6 commits May 17, 2022 06:17

Support GPT-3 generation in static graph

426562c

Update code to upstream

eb87e35

Update args.py

a368635

Clean commits and revert examples/language_model/gpt

2d39f31

Revert args in model_zoo/gpt/args.py

a317fad

Clean code

f296df1

qingqing01 force-pushed the gpt-3-generation branch from dc03363 to f296df1 Compare May 17, 2022 06:41

wangxicoding previously approved these changes May 17, 2022

View reviewed changes

ZeyuChen assigned ZHUI May 17, 2022

format code style

457430e

qingqing01 dismissed wangxicoding’s stale review via 457430e May 17, 2022 07:58

ZHUI reviewed May 17, 2022

View reviewed changes

qingqing01 added 2 commits May 19, 2022 03:21

Support batch generation and fix code style

9a0e686

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

b4915a3

…nto gpt-3-generation

qingqing01 force-pushed the gpt-3-generation branch from 7b15283 to b4915a3 Compare May 19, 2022 03:21

qingqing01 added 2 commits May 19, 2022 03:28

Revert run_static.sh

da7e2ab

Merge branch 'develop' into gpt-3-generation

3058e18

ZHUI approved these changes May 19, 2022

View reviewed changes

Merge branch 'develop' into gpt-3-generation

a2f8d8b

ZHUI merged commit 7780497 into PaddlePaddle:develop May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support generation code for GPT-3 in static graph. #2188

Support generation code for GPT-3 in static graph. #2188

qingqing01 commented May 17, 2022

wangxicoding left a comment

ZHUI May 17, 2022 •

edited

Loading

qingqing01 May 19, 2022

ZHUI May 17, 2022

qingqing01 May 19, 2022

ZHUI May 17, 2022

qingqing01 May 19, 2022

ZHUI May 17, 2022

qingqing01 May 19, 2022

ZHUI commented May 17, 2022

ZHUI left a comment

	paddle.distributed.init_parallel_env()
	paddle.distributed.init_parallel_env()

		sorted_probs, sorted_idx = layers.argsort(probs, descending=True)
		cum_sorted_probs = layers.cumsum(sorted_probs, axis=1, exclusive=True)

	python3.7 -u -m paddle.distributed.fleet.launch \
	python -u -m paddle.distributed.fleet.launch \

Support generation code for GPT-3 in static graph. #2188

Support generation code for GPT-3 in static graph. #2188

Conversation

qingqing01 commented May 17, 2022

PR types

PR changes

Description

wangxicoding left a comment

Choose a reason for hiding this comment

ZHUI May 17, 2022 • edited Loading

Choose a reason for hiding this comment

qingqing01 May 19, 2022

Choose a reason for hiding this comment

ZHUI May 17, 2022

Choose a reason for hiding this comment

qingqing01 May 19, 2022

Choose a reason for hiding this comment

ZHUI May 17, 2022

Choose a reason for hiding this comment

qingqing01 May 19, 2022

Choose a reason for hiding this comment

ZHUI May 17, 2022

Choose a reason for hiding this comment

qingqing01 May 19, 2022

Choose a reason for hiding this comment

ZHUI commented May 17, 2022

ZHUI left a comment

Choose a reason for hiding this comment

ZHUI May 17, 2022 •

edited

Loading