[NVIDIA] TE Integration #7229

Wong4j · 2023-10-15T12:08:57Z

PR types

New features

PR changes

Others

Description

PaddleNLP with Transformer Engine Integration.

codecov · 2023-10-16T03:48:30Z

Codecov Report

Attention: Patch coverage is 38.46154% with 80 lines in your changes are missing coverage. Please review.

Project coverage is 58.28%. Comparing base (c1ccafa) to head (9c7ef55).
Report is 197 commits behind head on develop.

❗ Current head 9c7ef55 differs from pull request most recent head 5fcdcfb. Consider uploading reports for the commit 5fcdcfb to get more accurate results

Files	Patch %	Lines
paddlenlp/utils/transformer_engine_utils.py	33.33%	58 Missing ⚠️
paddlenlp/transformers/gpt/modeling.py	40.90%	13 Missing ⚠️
paddlenlp/transformers/gpt/modeling_pp.py	30.00%	7 Missing ⚠️
paddlenlp/trainer/trainer.py	77.77%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #7229      +/-   ##
===========================================
+ Coverage    56.67%   58.28%   +1.60%     
===========================================
  Files          588      580       -8     
  Lines        89243    85655    -3588     
===========================================
- Hits         50580    49922     -658     
+ Misses       38663    35733    -2930

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

paddlenlp/trainer/trainer.py

ZHUI · 2023-11-09T02:56:07Z

paddlenlp/trainer/trainer.py

@@ -1823,7 +1839,8 @@ def training_step(self, model: nn.Layer, inputs: Dict[str, Union[paddle.Tensor,
        inputs = self._prepare_inputs(inputs)

        with self.autocast_smart_context_manager():
-            loss = self.compute_loss(model, inputs)
+            with TransformerEngineHelper.fp8_autocast(enabled=self.use_fp8):


how about move TransformerEngineHelper.fp8_autocast into autocast_smart_context_manager

嗯，确实考虑过这样设计。但是我最近在测试pipeline parallel+fp8+gradient_accumulation的时候发现，fp8_autocast不能包在这个compute_loss外面。会报错，参考这里PR#93。因为pipeline parallel里面有类似这种for循环，fp8_autocast要放在for循环里面，而不能放在外面。所以我放到了这里paddlenlp/transformers/gpt/modeling.py#L656

paddlenlp/trainer/trainer.py

ZHUI · 2023-11-09T03:00:13Z

paddlenlp/trainer/trainer.py

@@ -1886,7 +1903,8 @@ def training_pipeline_step(self, model: nn.Layer, inputs: Dict[str, Union[paddle
        model.lr_scheduler = None

        with self.autocast_smart_context_manager():
-            loss = model.forward_backward_pipeline(inputs, self.scaler if self.do_grad_scaling else None)
+            with TransformerEngineHelper.fp8_autocast(enabled=self.use_fp8, fp8_group=self.dp_group):


the same as above

ZHUI · 2023-11-09T03:02:43Z

paddlenlp/te_utils/te_modeling.py

+from .te_helper import TransformerEngineHelper
+
+
+class GPTDecoderLayerWithNVTEBackend(nn.Layer):


The code of GPT model should be in transformers/gpt

放到了paddlenlp/transformers/gpt/modeling.py中

paddlenlp/te_utils/te_helper.py

ZHUI · 2023-11-09T03:09:11Z

@DrownFish19 一起review一下

github-actions · 2024-03-18T00:16:49Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

Xreki

PR能否拆分下，先把代码修改了合进去，后面再统一看下文档、运行脚本如何管理。

Xreki · 2024-04-03T01:23:03Z

llm/gpt-3/run_pretrain_with_te.py

@@ -0,0 +1,533 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


代码重复较多，建议将TE的功能支持，直接加到已有的run_pretrain.py中，可参考Megatron-LM中的支持方式：https://github.com/NVIDIA/Megatron-LM/blob/2b92e61dac1c5ff84629239198659b447da118e1/megatron/training/arguments.py#L579

这个文件不需要了，gpt和llama都通过已有的run_pretrain.py来跑。我分拆到这个PR中#8228

Xreki · 2024-04-03T01:30:22Z

llm/gpt-3/te_ckpt_converter.py

@@ -0,0 +1,203 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


Checkpoint转换，GPT、LLaMA能否抽取公共部分，实现paddlenlp/utils目录中？

paddlenlp里面ckpt文件命名规则和模型参数命名规则修改过（以前写的版本后来就不能用了），并且参数的排布受很多参数影响，比如需要考虑TP并行, weight是否融合，还有llama需要额外考虑GQA参数的分拆，所以目前只是针对gpt和llama分别实现了能用的版本。
应该可以做一个更通用的版本，但还要梳理清楚这些细节，这个之后可以做。

Xreki · 2024-04-03T03:11:28Z

llm/llama/scripts/llama_single_node_interactive.sh

+    --use_fused_rope 1 \
+    --fuse_attention_ffn 1 \
+    --bf16  \
+    --fp16_opt_level "O2"  \


BF16训练没有开启master_grad（使用--amp_master_grad true开启）？

TE内还没有支持这个功能，等支持后再增加这个参数

Xreki · 2024-04-03T03:14:48Z

llm/llama/scripts/llama_single_node_interactive.sh

+    $recompute_flag \
+    $init_weight_flag \
+    $sp_flag \
+    --device "gpu"


一些依赖框架的Overlap优化似乎没有开启，develop支持Sharding梯度通信Overlap、P2P通信Overlap，其中P2P通信Overlap 2.6版本暂不支持，设置方式如下：

--sharding_parallel_config "split_param enable_stage1_overlap" \ --pipeline_parallel_config "enable_sharding_comm_overlap enable_overlap_p2p_comm" \

MP的反向AllReduce Overlap、梯度累加融合等优化，都固化实现在了TE里面？

develop支持Sharding梯度通信Overlap、P2P通信Overlap

好的，我会加上这两个优化参数。

MP的反向AllReduce Overlap、梯度累加融合等优化，都固化实现在了TE里面？

MP的反向AllReduce Overlap是指什么？
梯度累加融合这个优化应该是依赖main_grad?

github-actions · 2024-07-15T00:19:07Z

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动，被标记为stale。

Wong4j changed the title ~~TE Integrtion~~ [NVIDIA] TE Integration Oct 15, 2023

Wong4j force-pushed the jaywan/te_integration branch from 33de6ce to 90eeef7 Compare October 15, 2023 14:17

ZHUI reviewed Oct 16, 2023

View reviewed changes

paddlenlp/trainer/trainer.py Outdated Show resolved Hide resolved

Wong4j force-pushed the jaywan/te_integration branch 2 times, most recently from 9d4ec48 to 17bfad2 Compare October 24, 2023 03:19

ZHUI reviewed Nov 9, 2023

View reviewed changes

Wong4j force-pushed the jaywan/te_integration branch from 8205e73 to 9c7ef55 Compare November 10, 2023 09:55

Wong4j force-pushed the jaywan/te_integration branch from c5f71a8 to 6fbe37a Compare December 18, 2023 16:29

Wong4j force-pushed the jaywan/te_integration branch from 842ab05 to b03295e Compare January 16, 2024 12:56

github-actions bot added the stale label Mar 18, 2024

Wong4j added 4 commits March 31, 2024 19:38

TE integration

c8e9167

update scripts

3d46969

minor changes

f5363be

TE llama integration

cd3b8f7

Wong4j force-pushed the jaywan/te_integration branch from 6e1bc5e to cd3b8f7 Compare April 1, 2024 02:51

github-actions bot removed the stale label Apr 2, 2024

Xreki reviewed Apr 3, 2024

View reviewed changes

Wong4j mentioned this pull request Apr 3, 2024

TE integration: part 1 #8228

Open

Wong4j and others added 2 commits April 26, 2024 09:53

Update README_TE.md

eb1a670

add training args: fp8 recipe

5fcdcfb

github-actions bot added the stale label Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] TE Integration #7229

[NVIDIA] TE Integration #7229

Wong4j commented Oct 15, 2023

codecov bot commented Oct 16, 2023 •

edited

Loading

ZHUI Nov 9, 2023

Wong4j Nov 12, 2023

ZHUI Nov 9, 2023

ZHUI Nov 9, 2023

Wong4j Nov 12, 2023

ZHUI commented Nov 9, 2023

github-actions bot commented Mar 18, 2024

Xreki left a comment

Xreki Apr 3, 2024

Wong4j Apr 3, 2024

Xreki Apr 3, 2024

Wong4j Apr 3, 2024

Xreki Apr 3, 2024

Wong4j Apr 3, 2024

Xreki Apr 3, 2024

Wong4j Apr 3, 2024 •

edited

Loading

github-actions bot commented Jul 15, 2024

		from .te_helper import TransformerEngineHelper


		class GPTDecoderLayerWithNVTEBackend(nn.Layer):

		@@ -0,0 +1,533 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,203 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.

[NVIDIA] TE Integration #7229

Are you sure you want to change the base?

[NVIDIA] TE Integration #7229

Conversation

Wong4j commented Oct 15, 2023

PR types

PR changes

Description

codecov bot commented Oct 16, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZHUI commented Nov 9, 2023

github-actions bot commented Mar 18, 2024

Xreki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wong4j Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jul 15, 2024

codecov bot commented Oct 16, 2023 •

edited

Loading

Wong4j Apr 3, 2024 •

edited

Loading