Skip to content

[Pipeline] Tie weight analysis#8

Merged
comaniac merged 5 commits intoawslabs:mainfrom
comaniac:tie_weights
Jan 20, 2023
Merged

[Pipeline] Tie weight analysis#8
comaniac merged 5 commits intoawslabs:mainfrom
comaniac:tie_weights

Conversation

@comaniac
Copy link
Contributor

@comaniac comaniac commented Jan 20, 2023

Description

This PR analyzes whether parameters are tie together by looking at their nn.Paramter object IDs. Accordingly, this analysis has to be done before the consolidation (if needed); otherwise all nn.Parameter objects will be replaced so that their object IDs won't be the same anymore.

Note that the above analysis works only when parameters are pointing to the same Python object. In other words, for the distributed training that requires weight consolidation, users have to explicitly call tie_weights one more time when initializing the model with empty weights. For example:

with slapo.init_empty_weights():
    model = get_model() # tie_weights() is called implicitly when constructing this model.

sch = slapo.create_schedule(model, ...)
# The .build does the following in order:
# 1. Partition pipeline stages.
# 2. Tie weight analysis.
# 3. Consolidation (this wipes out the previous tie weights in the model)
# 4. Liveness analysis.
# 5. Pipeline module construction (this makes use the tie weight and liveness analysis results)
sch.build(...)

In addition, this PR also refactors the pipeline logic in schedu.build. The ultimate goal is to ensure that schedule.build doesn't have any framework specific logic (e.g., if target == "deepspeed":).

cc @szhengac @zarzen

Checklist

  • PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

@szhengac
Copy link
Contributor

LGTM.

@szhengac
Copy link
Contributor

Let me know if it is ready for merge

@comaniac comaniac merged commit 574349a into awslabs:main Jan 20, 2023
@comaniac
Copy link
Contributor Author

Thanks @szhengac

@comaniac comaniac deleted the tie_weights branch January 20, 2023 19:46
from deepspeed import pipe

model = pipe.PipelineModule(
stage_modules,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stage_modules a list of layers? Or is it partitioned by the Slapo schedule?

chhzh123 pushed a commit to chhzh123/slapo that referenced this pull request Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants