Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a more general between-stage communication class/abstraction #39

Closed
zhisbug opened this issue May 26, 2021 · 2 comments
Closed

Make a more general between-stage communication class/abstraction #39

zhisbug opened this issue May 26, 2021 · 2 comments
Assignees

Comments

@zhisbug
Copy link
Member

zhisbug commented May 26, 2021

In our current design of 3D parallelism, there are possibilities that:

  • stage 0 working with a mesh with 1 node 2 GPUs
  • stage 1 working with a mesh with 2 node 2 GPUs

on each mesh, the model might be sharded differently.

The communication between stage0 and stage1 is not necessarily based on device-device send/recv, because it might involve communications from multiple devices to multiple devices.

We should make some abstraction in the pipeline runtime to handle this communication pattern

@zhisbug zhisbug self-assigned this May 26, 2021
@zhisbug
Copy link
Member Author

zhisbug commented Jun 1, 2021

This issue will tackle #25

@zhisbug
Copy link
Member Author

zhisbug commented Jul 28, 2021

#47

@zhisbug zhisbug closed this as completed Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant