Skip to content

[Schedule][replace] Transfer hooks when replacing modules#27

Merged
szhengac merged 2 commits intoawslabs:mainfrom
comaniac:transfer_hook
Jan 30, 2023
Merged

[Schedule][replace] Transfer hooks when replacing modules#27
szhengac merged 2 commits intoawslabs:mainfrom
comaniac:transfer_hook

Conversation

@comaniac
Copy link
Contributor

@comaniac comaniac commented Jan 30, 2023

Description

The current .trace did the following two steps:

  1. trace the target module to be a GraphModule.
  2. replace the original module with the new GraphModule.

However, the traced GraphModule won't include any hooks in the original module, so the .sync primitives specified before tracing, including .trace and .trace_for_pipeline are dropped. This PR fixes this issue by transferring all hooks from the original module to the new one. For example, a hook registered to broadcast inputs will be transferred 2 times during scheduling:

  1. After tracing the top module, the hook is transferred from the original module to the traced module.
  2. After wrapping pipeline stage to partitioned modules, the (pre-forward) hook is transferred from the original module to the first pipeline stage module (wrapped by Slapo pipeline module).

Checklist

  • PR's title starts with a category (e.g. [Bugfix], [Model], [Tutorial], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

cc @szhengac @chhzh123

@comaniac comaniac mentioned this pull request Jan 30, 2023
4 tasks
@comaniac comaniac linked an issue Jan 30, 2023 that may be closed by this pull request
HOOK_TYPE_TO_ATTR = {
"fwd_pre": ("_forward_pre_hooks", "register_forward_pre_hook"),
"fwd_post": ("_forward_hooks", "register_forward_hook"),
"bwd_post": ("_backward_hooks", "register_backward_hook"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we also include bwd_pre?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a backward pre-hook in PyTorch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@szhengac szhengac merged commit 2b61159 into awslabs:main Jan 30, 2023
@comaniac comaniac deleted the transfer_hook branch January 31, 2023 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Original hooks are discarded in pipeline module

2 participants