Support pure fp16 for gpt static #1353

zhangbo9674 · 2021-11-23T09:11:02Z

PR types

Performance optimization

PR changes

Models

Description

Support pure fp16 for gpt static.

Speed Test:
enviroment : V100-32G-6, clip is ClipByNorm, global_batch_size=8, use_recompute=false.
amp - O1:

amp - O2：

…nto dev/support_gpt3_static_fp16

ZHUI · 2021-11-23T09:24:23Z

examples/language_model/gpt/dataset.py

@@ -329,7 +329,8 @@ def build_dataset(index, name, num_samples):
            sample_ids=sample_ids,
            sample_lens=sample_lens,
            eos_id=eos_id,
-            seed=args.seed)
+            seed=args.seed,


直接 copy 一份到 gpt-3/static 吧。不然跟 gpt/run_pretrain.py不兼容

ZHUI · 2021-11-23T09:33:34Z

examples/language_model/gpt-3/static/run_pretrain_static.py

-            if args.grad_clip > 0:
-                clip = paddle.fluid.clip.GradientClipByGlobalNorm(
+            if args.grad_clip > 0:  
+                clip = paddle.fluid.clip.GradientClipByNorm(


同之前，此处不改。

ZHUI · 2021-11-23T09:34:53Z

examples/language_model/gpt-3/static/run_pretrain_static.py

@@ -20,6 +20,7 @@
 import random
 import time
 import sys
+from paddle.fluid import core


放到跟 paddle import 那边一起？

这里的 core 好像没有使用

ZHUI · 2021-11-23T09:38:19Z

examples/language_model/gpt-3/static/run_pretrain_static.py

@@ -357,6 +351,11 @@ def do_train(args):
    exe = paddle.static.Executor(place)
    exe.run(startup_program)
    test_program = main_program.clone(for_test=True)
+
+
+    if args.use_amp and args.use_fp16:


这块的逻辑需要清晰一下：
这样可行，可以讨论一下，形成以后的通用写法

- use_fp16 [True False] - fp16_level ["amp", "pure_fp16"]

得横向参考下同类产品怎么定义的，最好跟其他主流用法对齐

有个问题，这里paddle的静态图都是用amp和pure_fp16的概念进行区分，其混合精度接口也是用pure_fp16进行定义

已修改为：
-- use_amp [True, False]
-- amp_level ["O1", "O2"]

这里将 gpt-3 动态图参数方式也统一下？

TODO: 如果修改的话，benchmark 相关脚本需要更新
https://github.com/PaddlePaddle/PaddleNLP/blob/develop/tests/benchmark/run_benchmark.sh#L59
https://github.com/PaddlePaddle/benchmark/blob/master/dynamic_graph/gpt/paddle/run_benchmark.sh#L61

PaddlePaddle/benchmark 的修改可以我来，PaddleNLP/blob/develop/tests/benchmark/run_benchmark.sh 可以顺手改了。

ZHUI · 2021-11-23T09:38:46Z

examples/language_model/gpt-3/static/modeling.py

@@ -227,11 +228,15 @@ def forward(self,
        # scale dot product attention
        product = layers.matmul(
            x=q, y=k, transpose_y=True, alpha=self.head_dim**-0.5)
+
+        fuse = True


改为可配置比较好

动态图直接使用了softmax_mask_fuse_upper_triangle，这里静态图也直接使用这个策略了。

ZHUI · 2021-11-24T02:37:23Z

如果有前后速度对比的测试结果，可以在PR介绍中贴一下。

ZHUI · 2021-11-24T13:29:20Z

examples/language_model/gpt-3/static/modeling.py

@@ -23,6 +23,7 @@
 from paddle.fluid import layers


dataset.py 文件给不了 comments。但是这里应该是要上去两个层级的目录，否者 data_tools目录的路径不对

# Used to load data_tools path. sys.path.insert(0, "../")

->

# Used to load data_tools path. sys.path.insert(0, "../../")

ZHUI · 2021-11-24T13:30:48Z

examples/language_model/gpt-3/static/run_pretrain_static.py

+
+
+    if args.use_amp and args.amp_level=="O2":
+        optimizer.amp_init(place)


这里方便科普一下使用吗?

这里的amp_init，在使用方法上，感觉有一点点奇怪

pure fp16需要将网络参数从fp32转为fp16，amp_init就是用于进行参数类型转换的：
https://github.com/PaddlePaddle/Paddle/blob/ed7a21dea0ddcffb6f7f33ce21c5c368f5c7866b/python/paddle/fluid/contrib/mixed_precision/decorator.py#L207

cc: @zhiqiu 这块API的使用要不在讨论讨论，感觉很可能用户会漏写

这个目前是需要使用的接口，后面再看看能否优化下。一个可能的途径是decorate的时候把startup也传入，转为fp16。

行，那这里可以记一个 todo 吧 @zhangbo9674

ZHUI · 2021-11-24T13:31:28Z

examples/language_model/gpt/dataset.py

@@ -329,7 +329,8 @@ def build_dataset(index, name, num_samples):
            sample_ids=sample_ids,
            sample_lens=sample_lens,
            eos_id=eos_id,


此处代码复原

…nto dev/support_gpt3_static_fp16

ZHUI

LGTM

…nto dev/support_gpt3_static_fp16

…ngbo9674/PaddleNLP into dev/support_gpt3_static_fp16

ZHUI

LGTM

zhangbo9674 added 15 commits September 24, 2021 06:21

support pure fp16 for static

9d381c8

add python3.8 path

5426684

add amp_init

3374371

temp

048b344

test

0519938

fp16

8c43273

Speed up for hybrid parallel

dd6d9ae

test

9cda1fc

add clipByNorm

94a0de9

test

e6d5bdb

test

50c403b

test

ddd13d4

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

8cc76b4

…nto dev/support_gpt3_static_fp16

support gpt3 static fp16 train

65bca0d

refine code

231d92a

zhangbo9674 changed the title ~~Support gpt-3 for static purefp16~~ Support pure fp16 for gpt static Nov 23, 2021

ZHUI reviewed Nov 23, 2021

View reviewed changes

zhangbo9674 added 2 commits November 23, 2021 10:26

refine some code by comment

cb4c1c6

refine arg for amp_level

446480b

refine black_list

dc8538a

ZHUI reviewed Nov 24, 2021

View reviewed changes

zhangbo9674 and others added 3 commits November 25, 2021 05:01

refine code

7daf9ba

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

91d05f9

…nto dev/support_gpt3_static_fp16

Merge branch 'develop' into dev/support_gpt3_static_fp16

d37e258

ZHUI previously approved these changes Nov 29, 2021

View reviewed changes

ZHUI requested a review from wangxicoding November 29, 2021 08:00

zhangbo9674 added 3 commits November 29, 2021 09:07

check mp pp sharding for amp-O2

2735996

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

22936e3

…nto dev/support_gpt3_static_fp16

Merge branch 'dev/support_gpt3_static_fp16' of https://github.com/zha…

7798c1f

…ngbo9674/PaddleNLP into dev/support_gpt3_static_fp16

zhangbo9674 dismissed ZHUI’s stale review via 7798c1f November 29, 2021 09:12

ZHUI approved these changes Nov 29, 2021

View reviewed changes

ZHUI merged commit d49b6b1 into PaddlePaddle:develop Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support pure fp16 for gpt static #1353

Support pure fp16 for gpt static #1353

zhangbo9674 commented Nov 23, 2021 •

edited

Loading

ZHUI Nov 23, 2021

zhangbo9674 Nov 23, 2021

ZHUI Nov 23, 2021

zhangbo9674 Nov 23, 2021

ZHUI Nov 23, 2021

ZHUI Nov 23, 2021

zhangbo9674 Nov 23, 2021

ZHUI Nov 23, 2021

ZeyuChen Nov 23, 2021

zhangbo9674 Nov 23, 2021

zhangbo9674 Nov 23, 2021 •

edited

Loading

ZHUI Nov 24, 2021 •

edited

Loading

ZHUI Nov 23, 2021

zhangbo9674 Nov 23, 2021

ZHUI commented Nov 24, 2021

ZHUI Nov 24, 2021 •

edited

Loading

zhangbo9674 Nov 25, 2021

ZHUI Nov 24, 2021

ZHUI Nov 24, 2021

zhangbo9674 Nov 25, 2021

ZHUI Nov 25, 2021

zhiqiu Nov 25, 2021

ZHUI Nov 26, 2021

ZHUI Nov 24, 2021

zhangbo9674 Nov 25, 2021

ZHUI left a comment

ZHUI left a comment



		if args.use_amp and args.amp_level=="O2":
		optimizer.amp_init(place)

Support pure fp16 for gpt static #1353

Support pure fp16 for gpt static #1353

Conversation

zhangbo9674 commented Nov 23, 2021 • edited Loading

PR types

PR changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhangbo9674 Nov 23, 2021 • edited Loading

Choose a reason for hiding this comment

ZHUI Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZHUI commented Nov 24, 2021

ZHUI Nov 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

zhangbo9674 commented Nov 23, 2021 •

edited

Loading

zhangbo9674 Nov 23, 2021 •

edited

Loading

ZHUI Nov 24, 2021 •

edited

Loading

ZHUI Nov 24, 2021 •

edited

Loading