Add function `to_sequential` to PipelineModule by sdtblck · Pull Request #1014 · deepspeedai/DeepSpeed

sdtblck · 2021-04-28T21:26:31Z

In https://github.com/EleutherAI/gpt-neox we were previously maintaining two separate models - one if the user wanted to use pipeline parallel, and one if they didn't.

The more straightforward solution was to add a to_sequential function to export the PipelineModule as an nn.Sequential model, so we could train with deepspeed features that aren't compatible with pipe parallel (i.e ZeRO 2+).

Figure this might be a useful addition to the base module, too. I'm not 100% sure if the support for tied layers here is as flexible as it could / should be, since their capabilities are not very well documented, but it works at least for our purposes (with tied Embeddings as the output layer).

In https://github.com/EleutherAI/gpt-neox we were previously maintaining two separate models - one if the user wanted to use pipeline parallel, and one if they didn't. The more straightforward solution was to add a `to_sequential` function to export the PipelineModule as an nn.Sequential model, so we could train with deepspeed features that aren't compatible with pipe parallel (i.e ZeRO 2+). Figure this might be a useful addition to the base module, too. I'm not 100% sure if the support for tied layers here is as flexible as it could / should be, since their capabilities are not very well documented, but it works at least for our purposes (with tied Embeddings as the output layer).

ShadenSmith

This is a great idea, thanks @sdtblck !

One caveat is that we lose the activation checkpointing that the PipelineModule's forward can be configured to use. But users can instead use torch's checkpoint_sequential() if they want checkpointing. Or we could wrap the layers in a similar way as Lambda if we really want to mirror functionality. What are your thoughts?

ShadenSmith · 2021-04-30T02:05:46Z

deepspeed/runtime/pipe/module.py

+            else:
+                # check that it's a lambda function
+                LAMBDA = lambda:0
+                if isinstance(spec, type(LAMBDA)) and spec.__name__ == LAMBDA.__name__:


PipelineModule should work with any callable object, and I think the Lambda module above will too. Maybe the filtering condition could be hasattr(spec, '__call__') to support things like named methods?

good point yes! I'll make that change too

ShadenSmith · 2021-04-30T03:12:39Z

In addition to to_sequential, there may be another way we could accomplish this while keeping the normal PipelineModule, if that would be useful.

If we short-circuit this condition and use the regular training engine, I think that PipelineModule should behave as a normal torch.nn.Module and you can use ZeRO-2, etc. I intended for that to be the case, but not tested these days.

https://github.com/microsoft/DeepSpeed/blob/dad26428e3f28898b8d0f5ace1b3df3e6db8f8e8/deepspeed/__init__.py#L119-L120

sdtblck · 2021-04-30T12:00:20Z

In addition to to_sequential, there may be another way we could accomplish this while keeping the normal PipelineModule, if that would be useful.

If we short-circuit this condition and use the regular training engine, I think that PipelineModule should behave as a normal torch.nn.Module and you can use ZeRO-2, etc. I intended for that to be the case, but not tested these days.

https://github.com/microsoft/DeepSpeed/blob/dad26428e3f28898b8d0f5ace1b3df3e6db8f8e8/deepspeed/__init__.py#L119-L120

Hi @ShadenSmith , I actually tried this as well - and it seems this way of doing things drops any tied modules (since the pipe engine handles them specially.) So for example, if we used this with a model with tied embeddings, the to_logits function that uses the word embedding weights would just get silently dropped.

sdtblck · 2021-04-30T12:02:26Z

This is a great idea, thanks @sdtblck !

One caveat is that we lose the activation checkpointing that the PipelineModule's forward can be configured to use. But users can instead use torch's checkpoint_sequential() if they want checkpointing. Or we could wrap the layers in a similar way as Lambda if we really want to mirror functionality. What are your thoughts?

Hm. Yeah this is a good point that I had overlooked. I'll spend some time looking into the best way to get this working today.

Used to convert a deepspeed PipelineModule to an nn.Sequential like model whilst retaining activation checkpointing.

sdtblck · 2021-04-30T14:15:23Z

Hi @ShadenSmith

I think the two latest commits should fix both the above requirements. There is maybe some repeated code between SequentialModel and PipelineModule that could be slimmed down - but I have tested with gpt-neox and it works well.

rocm-mici · 2022-06-09T20:19:48Z

Can one of the admins verify this patch?

loadams · 2023-11-14T23:44:25Z

@sdtblck - just fixed some formatting issues that were preventing this - if the tests pass, would this be good to merge now?

sdtblck requested review from RezaYazdaniAminabadi, ShadenSmith, arashashari, awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, samyam and tjruwase as code owners April 28, 2021 21:26

Update module.py

ab9a92c

ShadenSmith reviewed Apr 30, 2021

View reviewed changes

sdtblck added 2 commits April 30, 2021 16:11

Add 'SequentialWrapper'

4328b9f

Used to convert a deepspeed PipelineModule to an nn.Sequential like model whilst retaining activation checkpointing.

Update to_sequential call

e55a72f

sdtblck mentioned this pull request Apr 30, 2021

fix gradient checkpointing issue with to_sequential EleutherAI/gpt-neox#276

Merged

jeffra requested a review from duli2012 as a code owner June 23, 2023 21:31

ShadenSmith self-assigned this Aug 18, 2023

loadams and others added 4 commits August 30, 2023 08:06

Merge branch 'master' into patch-2

60b6aee

Merge branch 'master' into patch-2

5ead8fc

Formatting whitespace

49ed4e3

Add back collections -> dict back in to fix import issue

1fdd84f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function `to_sequential` to PipelineModule#1014

Add function `to_sequential` to PipelineModule#1014
sdtblck wants to merge 8 commits intodeepspeedai:masterfrom
sdtblck:patch-2

sdtblck commented Apr 28, 2021

Uh oh!

ShadenSmith left a comment

Uh oh!

ShadenSmith Apr 30, 2021

Uh oh!

sdtblck Apr 30, 2021

Uh oh!

ShadenSmith commented Apr 30, 2021

Uh oh!

sdtblck commented Apr 30, 2021 •

edited

Loading

Uh oh!

sdtblck commented Apr 30, 2021

Uh oh!

sdtblck commented Apr 30, 2021

Uh oh!

rocm-mici commented Jun 9, 2022

Uh oh!

loadams commented Nov 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sdtblck commented Apr 28, 2021

Uh oh!

ShadenSmith left a comment

Choose a reason for hiding this comment

Uh oh!

ShadenSmith Apr 30, 2021

Choose a reason for hiding this comment

Uh oh!

sdtblck Apr 30, 2021

Choose a reason for hiding this comment

Uh oh!

ShadenSmith commented Apr 30, 2021

Uh oh!

sdtblck commented Apr 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdtblck commented Apr 30, 2021

Uh oh!

sdtblck commented Apr 30, 2021

Uh oh!

rocm-mici commented Jun 9, 2022

Uh oh!

loadams commented Nov 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sdtblck commented Apr 30, 2021 •

edited

Loading