Pipeline Model Parallel #1202

crcrpar · 2021-10-25T23:39:45Z

This PR introduces pipeline model parallel. The implementation is based on https://github.com/nvidia/megatron-lm.

Changes

Introduces pipeline model parallel functionalities.
Some functions/classes in apex.transformer.tensor_parallel are moved to apex.transformer namespace in order to avoid cyclic imports
Most import statements are canonicalized to absolute import. This largely contributes to the number of changed files.

Caveats

Interleaving scheduling is private: _forward_backward_pipelining_with_interleaving.
Compatibility with PyTorch native AMP has not been verified.

Reference commit: ``` commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main) Merge: 14f2c684 7b293d9b Author: Mohammad Shoeybi <mshoeybi@nvidia.com> Date: Wed Sep 22 22:57:54 2021 -0700 Merge branch 'add_BOS' into 'main' Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives See merge request ADLR/megatron-lm!328 ```

…ia.com:12051/mkozuki/apex into pipeline-model-parallelism

* Init apex.ppu (pipeline model parallel utility) Reference commit: ``` commit 5ab646376d67831601d5552c193241d017f1b35c (HEAD -> main, internal/main) Merge: 14f2c684 7b293d9b Author: Mohammad Shoeybi <mshoeybi@nvidia.com> Date: Wed Sep 22 22:57:54 2021 -0700 Merge branch 'add_BOS' into 'main' Add Beginning of Sentence token option and adding semaphore while multi-threading to prevent crashes and hangs due to connection keep-alives See merge request ADLR/megatron-lm!328 ``` * removing get_args and replace import - phase 1 * removing get_args and replace import - phase 2 * move ppu to apex.transformer.pipeline_parallel * update two __init__.py * update READMEs * mpu -> parallel_state & tensor_parallel * fix * remove not pipeline files * separate schedules.py - phase 1 * dissect schedules.py * data_iterators -> batch * remove optimizer from forward_backward_step funcs * init test * Apply 2 suggestion(s) to 2 file(s) * fix cyclic import * fix syntax of Callable * fix - 1 * move directory as testing used for pp test as well * add some functions for num microbatches calculator * model is a list in pipeline parallel * skip build num microbatch calculator * fix test * assert -> raise * skip args printing * specify tensor shape everywhere even if None - phase 1 * private timers * passing tensor shape & dtype around * update dtype handling by introducing helper func * write helper func to reduce cyclomatic complexity * remove duplicate * update * move split_tensor_into_1d_equal_chunks to avoid cyclic import * tmp * cosmetic * move gather_split_1d_tensor to avoid cyclic imports * remove debug print * add outer loop * early return if possible * cosmetic * passing around tensor shape * refactor test * add script to learn batch sampler behavior * update * minibatch splitter * add minibatch splitter * split minibatch into microbatches * minor changes * uncomment split batch for test sake * set as attribute * study the behavior of no pipelining * debug 1 * reflect test util namespace change * update readme * cosmetic in test * add model build helper func for interleaving shced * adding model builder from megatron * canbe cyclic import * fix * enable interleaving test, but failing even if forward only * fix batch preparation * add explanation * print data parallel size * fix typo * Add Megatron style GPT model by Rishi Co-authored-by: Rishi Puri <riship@nvidia.com> * update * type hint for jit * fix forward_backward_no_pipelining test * pipeline forward backward seem to hang if not forward only * fix typo * debug * add p2p test * simplify * fix * tentative * set both tmp and pmp to 1 * init * fix typo * fix * fix path of divide * set seed for tmp * update upon Eddie comment * fix typo * adding failing data loader test * fix * megatron still failing * check in * with the nested loop of new order, interleaving seems fine * cosmetic change * make `forward_backward_pipelining_with_interleaving private * warn users that interleaving sched is unstable * move noop handler to no pipelining * comment out rank_print * make `build_model` more flexible * skip megatron test tentatively * correctly comment out rank_print * correctly comment out rank_print * correctly comment out rank_print * skip appropriately * remove wip p2p comm test * update type hint of model_provider_func * disable tf32 in each test script * skip interleaving w/ backward * rename as mpu is the old name * remove broken case * expose build_model func * delete `dist.ring_exchange` func call and `use_ring_exchange` argument * nit fixes * check in * remove unused file * update the list * update tensor shape * remove mixed dtype case * use torch.distributed.run * 2020 -> 2021 * another 2020 -> 2021 * docstring & type hint * fix teardown * update * change to experimental * check if warned Co-authored-by: Rishi Puri <riship@nvidia.com> Co-authored-by: Eddie Yan <eddiey@nvidia.com>

This reverts commit 63d5dd6.

crcrpar added 30 commits October 20, 2021 13:50

removing get_args and replace import - phase 1

0767175

removing get_args and replace import - phase 2

2d3164c

move ppu to apex.transformer.pipeline_parallel

dd4b160

update two __init__.py

8a27f34

update READMEs

326d276

mpu -> parallel_state & tensor_parallel

1a658eb

fix

564046c

remove not pipeline files

63a0b4c

separate schedules.py - phase 1

8b7808a

dissect schedules.py

c8c3aa4

data_iterators -> batch

ad5274c

remove optimizer from forward_backward_step funcs

f7b3d4e

init test

2513e0d

Apply 2 suggestion(s) to 2 file(s)

641b686

fix cyclic import

5862425

fix syntax of Callable

4a66dbd

fix - 1

d647c0e

move directory as testing used for pp test as well

695f372

add some functions for num microbatches calculator

4991c47

model is a list in pipeline parallel

dd437fe

skip build num microbatch calculator

9e7195c

fix test

a94cfe1

assert -> raise

cf0b8cf

skip args printing

688d934

specify tensor shape everywhere even if None - phase 1

25d436f

private timers

481598f

passing tensor shape & dtype around

a40fe94

update dtype handling by introducing helper func

0bb3968

write helper func to reduce cyclomatic complexity

280904e

crcrpar and others added 25 commits October 20, 2021 15:01

correctly comment out rank_print

711161e

skip appropriately

e3be10e

remove wip p2p comm test

fd8fdab

update type hint of model_provider_func

d94c6c3

disable tf32 in each test script

75fcf72

skip interleaving w/ backward

b23549a

rename as mpu is the old name

6110c2f

remove broken case

cc171bb

expose build_model func

5575f15

delete dist.ring_exchange func call and use_ring_exchange argument

a863eb0

nit fixes

a54ebbe

check in

97dc6fd

remove unused file

20a05d1

update the list

20a3cfa

update tensor shape

bbfe761

remove mixed dtype case

87f9e6a

use torch.distributed.run

7a9757b

2020 -> 2021

bae2a5c

Merge branch 'pipeline-model-parallelism' of ssh://gitlab-master.nvid…

c077a8a

…ia.com:12051/mkozuki/apex into pipeline-model-parallelism

another 2020 -> 2021

d8a707d

docstring & type hint

a1e5f30

fix teardown

2e3f43d

update

754d6c6

change to experimental

4cff26e

check if warned

ed73453

ptrblck merged commit 63d5dd6 into NVIDIA:master Oct 27, 2021

crcrpar deleted the pipeline-model-parallelism branch October 27, 2021 06:05

lunasara mentioned this pull request Nov 2, 2021

pipeline_parallel - ModuleNotFoundError: No module named 'amp_C' #1204

Open

mattkellough added a commit to neon-wild/apex that referenced this pull request Nov 12, 2021

Revert "Pipeline Model Parallel (NVIDIA#1202)"

6d617ed

This reverts commit 63d5dd6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Model Parallel #1202

Pipeline Model Parallel #1202

crcrpar commented Oct 25, 2021

Pipeline Model Parallel #1202

Pipeline Model Parallel #1202

Conversation

crcrpar commented Oct 25, 2021

Changes

Caveats