-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on
Description
🐛 Bug
When running with DDPPlugin, the DataLoaders are not configured with a DistributedSampler.
To Reproduce
Run this code in an LSFEnvironment. Regardless of the number of gpus specified (i.e. the first argument), the number of batches is always 3750. After doing some digging, it looks like auto_add_sampler is not setting the sampler correctly.
Expected behavior
When running with DDP training, the sampler for the DataLoaders should be set to a DistributedSampler.
Environment
- PyTorch Version (e.g., 1.0): 1.7
- OS (e.g., Linux): RedHat
- How you installed PyTorch (
conda,pip, source): source - Build command you used (if compiling from source): python setup.py develop
- Python version: 3.8
- CUDA/cuDNN version:
- GPU models and configuration: 1 or 6 GPUs
- Any other relevant information:
Additional context
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked on