You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In our current design of 3D parallelism, there are possibilities that:
stage 0 working with a mesh with 1 node 2 GPUs
stage 1 working with a mesh with 2 node 2 GPUs
on each mesh, the model might be sharded differently.
The communication between stage0 and stage1 is not necessarily based on device-device send/recv, because it might involve communications from multiple devices to multiple devices.
We should make some abstraction in the pipeline runtime to handle this communication pattern
The text was updated successfully, but these errors were encountered:
In our current design of 3D parallelism, there are possibilities that:
on each mesh, the model might be sharded differently.
The communication between stage0 and stage1 is not necessarily based on device-device send/recv, because it might involve communications from multiple devices to multiple devices.
We should make some abstraction in the pipeline runtime to handle this communication pattern
The text was updated successfully, but these errors were encountered: