-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Closed
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers
Description
DeepSpeed's data loader will use DistributedSampler by default unless another is provided:
If DeepSpeed is configured with model parallelism, or called from a library with a sub-group of the world processes, the default behavior of DistributedSampler is incorrect because it queries the global world size and rank information. We should specify num_replicas and rank when creating the sampler.
If mpu is provided to deepspeed.initialize(), we should query mpu.get_data_parallel_world_size() and mpu.get_data_parallel_rank() and forward that information to the sampler.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinggood first issueGood for newcomersGood for newcomers