Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support overlapping pipeline communication and computation #773

Merged
merged 20 commits into from
Dec 1, 2022

Conversation

ZYHowell
Copy link
Collaborator

@ZYHowell ZYHowell commented Nov 12, 2022

This PR:

  • Refactor XLANcclGroup's implementation. We move more things into the XLA side to make code cleaner.
  • Support overlapping cross-mesh communication and SPMD computation. We do so by adding GpuEvents during the pipeline execution, then use the event level synchronization. The overlapping is only supported with XLA implementation, because with CuPy implementation, there is a CPU-GPU synchronization when transforming from Xla buffer to Cupy buffers.
  • Fix bugs in apply gradient. Previous version cannot handle the following case: a and b are the output of layer i and j, respectively. c = a + b is used in layer k.
  • Add overlap friendly pipeline schedule. For more information, please refer to this paper. The broadcast optimization was already merged in earlier prs.
  • Add U-Net Transformer model implementation and benchmark script. The model is modified from huggingface diffusers repo, and we insert pipeline markers at correct places.

TODO of next PR:

  • Support HloInstruction Rescheduling to overlap backward communication with computation.
    The current blocker is that some code seems to synchronize cpu thread to gpus. We need to find out where they are.
  • Either run dynamic slice on other streams or avoid generating shards that make communicated buffers uncontinuous.
    This is because in the slow path, there is a dynamic slice enqueued to the end of computational stream, making communication have to wait until all computations are done, which has no overlap then.

@ZYHowell ZYHowell changed the title [FEATURE] Support overlapping pipeline communication and computation [WIP][FEATURE] Support overlapping pipeline communication and computation Nov 13, 2022
@ZYHowell ZYHowell changed the title [WIP][FEATURE] Support overlapping pipeline communication and computation [FEATURE] Support overlapping pipeline communication and computation Nov 23, 2022
@ZYHowell ZYHowell merged commit 21da68e into main Dec 1, 2022
@ZYHowell ZYHowell deleted the overlapping branch December 1, 2022 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants