Set sequence_parallel strategy true in default. #846

GhostScreaming · 2022-10-25T04:02:49Z

GPT with sequence_parallel has an improvement of performance in 8.76% ~ 20.37%.

GPT with sequence_parallel has the same training loss curve compared with its model-parallel and pipeline-parallel peer.
Sequence_parallel strategy needs GPT mp_degree > 1. If mp_degree <= 1, it will be set false automatically in hybrid_model.py

… develop

default. When mp_degree == 1, it will be turned off automatically.

sneaxiy

LGTM

ForFishes

LGTM

GhostScreaming added 6 commits September 28, 2022 08:18

Add introduction for sequence_parallel in README.

cfa48d9

Add introduction for sequence_parallel in README.

3a15387

Merge branch 'develop' of https://github.com/PaddlePaddle/FleetX into…

b83e8fe

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/FleetX into…

1ce6488

… develop

Add pretrain_gpt_1.3B_mp8.yaml. Sequence parallel strategey is True in

4c6fed8

default. When mp_degree == 1, it will be turned off automatically.

Polish code.

19215ec

sneaxiy approved these changes Oct 25, 2022

View reviewed changes

Merge branch 'develop' into default_sequence_parallel

e46ef2f

ForFishes approved these changes Oct 25, 2022

View reviewed changes

ForFishes merged commit b6ed5ed into PaddlePaddle:develop Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set sequence_parallel strategy true in default. #846

Set sequence_parallel strategy true in default. #846

GhostScreaming commented Oct 25, 2022

sneaxiy left a comment

ForFishes left a comment

Set sequence_parallel strategy true in default. #846

Set sequence_parallel strategy true in default. #846

Conversation

GhostScreaming commented Oct 25, 2022

sneaxiy left a comment

Choose a reason for hiding this comment

ForFishes left a comment

Choose a reason for hiding this comment