Is your feature request related to a problem? Please describe.
When using DeepSpeed ZeRO-3, it appears that each module is expected to be invoked the same number of times across all ranks.
For example, suppose I have an LLM module A and a task-specific head B. Depending on the output of A, module B may be executed a different number of times on different ranks. For instance, B may be called once on rank 0, but three times on rank 1. In this case, rank 1 hangs on the second invocation and waits indefinitely for rank 0.
This makes it difficult to support dynamic control flow where different ranks may follow different execution paths.
Describe the solution you'd like
It would be helpful if ZeRO-3 could support different data/control flows across ranks, allowing modules to be executed a different number of times depending on the input or model behavior.
Is your feature request related to a problem? Please describe.
When using DeepSpeed ZeRO-3, it appears that each module is expected to be invoked the same number of times across all ranks.
For example, suppose I have an LLM module A and a task-specific head B. Depending on the output of A, module B may be executed a different number of times on different ranks. For instance, B may be called once on rank 0, but three times on rank 1. In this case, rank 1 hangs on the second invocation and waits indefinitely for rank 0.
This makes it difficult to support dynamic control flow where different ranks may follow different execution paths.
Describe the solution you'd like
It would be helpful if ZeRO-3 could support different data/control flows across ranks, allowing modules to be executed a different number of times depending on the input or model behavior.