[Schedule] Support sequence parallelism by comaniac · Pull Request #6 · awslabs/slapo

comaniac · 2023-01-19T00:45:24Z

Description

Support sequential parallelism. Specifically, now we can schedule as follows:

# output would be partial
sch["attention.out_proj"].shard("weight", axis=1)
# this indicates that we sync the output in forward pass, but defer the gather
# (at output axis=1) until resid_dropout. in other words, the schedule becomes:
# linear -> reduce_scatter -> dropout -> all_gather.
sch["attention.out_proj"].sync(
    mode="forward_defer_gather",
    gather_at=(sch["attention.resid_dropout"], 1)
)

Note that dist.reduce_scatter always scatters along the first dimension, so we implicitly transpose input and output tensors when needed.

UPDATE
Per offline discussion, the programming model is changed:

sch["attention.out_proj"].sync(mode="fwd_post", sync_op_or_fn="reduce_scatter", axis=1)
sch["attention.resid_dropout"].sync(mode="fwd_post", sync_op_or_fn="all_gather", axis=1)

Other 1: For readability and flexibility, now .sync requires users to always specify the op.
Other 2: The .hook primitive is integrated to .sync, which now allows a custom hook function.

Accordingly, a unit test was added to verify the correctness for both forward and backward.
Use dist.all_gather_into_tensor when available. This API was dist._all_gather_base, which will be deprecated soon. Note that dist.all_gather_into_tensor always concats along the first dimension, so we implicitly transpose input and output tensors when needed. Even with the transpose overheads, it seems still faster than dist.all_gather + torch.cat based on my micro benchmarks.
Move sharding logic to another place and establish a registration mechanism for sharable modules.
[MIsc] Refactor task_lint.sh so that we could run it locally to verify linting without being installed transformers everytime.
[Misc] Add conftest.py to enforce the test order; otherwise the distributed tests may stuck if distributed devices are not running the same test at the same time.
[Misc] Remove DeepSpeed from docker image for now, so that we could make it public for CI.

cc @szhengac @chhzh123

Checklist

PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

examples/opt/schedule.py

slapo/schedule.py

slapo/sharding/utils.py

conftest.py

examples/gpt/schedule.py

slapo/schedule.py

examples/albert/schedule.py

comaniac · 2023-01-23T18:09:49Z

Thanks @szhengac

comaniac changed the title ~~Seq para~~ [Schedule] Support sequence parallelism Jan 19, 2023

chhzh123 reviewed Jan 19, 2023

View reviewed changes

examples/opt/schedule.py Outdated Show resolved Hide resolved

chhzh123 reviewed Jan 19, 2023

View reviewed changes

slapo/schedule.py Outdated Show resolved Hide resolved

szhengac reviewed Jan 20, 2023

View reviewed changes

slapo/schedule.py Outdated Show resolved Hide resolved

szhengac reviewed Jan 20, 2023

View reviewed changes

slapo/schedule.py Outdated Show resolved Hide resolved

szhengac reviewed Jan 20, 2023

View reviewed changes

slapo/schedule.py Outdated Show resolved Hide resolved

szhengac reviewed Jan 20, 2023

View reviewed changes

slapo/sharding/utils.py Outdated Show resolved Hide resolved

comaniac force-pushed the seq_para branch from ebecbe6 to 96d52b2 Compare January 20, 2023 19:49

comaniac added 10 commits January 21, 2023 02:18

[Schedule] Support sequence parallelism

a74d5c6

[Schedule] Support sequence parallelism

d93631f

lint

22fa287

readme

98dff6f

ci miner

e170420

lint

a1ce92d

address comment

3868307

refactor

adad6b0

fix

7e3fe67

lint

962d7ec

comaniac force-pushed the seq_para branch from a528f27 to 962d7ec Compare January 21, 2023 02:23

szhengac approved these changes Jan 21, 2023

View reviewed changes

comaniac added 2 commits January 21, 2023 18:10

fix tests

b9ed0ba

fix tests

735a9d4

comaniac commented Jan 21, 2023

View reviewed changes

conftest.py Show resolved Hide resolved

Update conftest.py

4fe28bc

szhengac reviewed Jan 21, 2023

View reviewed changes

examples/gpt/schedule.py Show resolved Hide resolved

examples/gpt/schedule.py Outdated Show resolved Hide resolved

slapo/schedule.py Show resolved Hide resolved

szhengac reviewed Jan 21, 2023

View reviewed changes

examples/albert/schedule.py Show resolved Hide resolved

fix comment

d270517

comaniac merged commit 54af148 into awslabs:main Jan 23, 2023

comaniac deleted the seq_para branch January 23, 2023 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Schedule] Support sequence parallelism#6

[Schedule] Support sequence parallelism#6
comaniac merged 14 commits intoawslabs:mainfrom
comaniac:seq_para

comaniac commented Jan 19, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comaniac commented Jan 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

comaniac commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

comaniac commented Jan 23, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

comaniac commented Jan 19, 2023 •

edited

Loading