Skip to content

DistributedSampler not used when using DDPPlugin  #7813

@ajtritt

Description

@ajtritt

🐛 Bug

When running with DDPPlugin, the DataLoaders are not configured with a DistributedSampler.

To Reproduce

Run this code in an LSFEnvironment. Regardless of the number of gpus specified (i.e. the first argument), the number of batches is always 3750. After doing some digging, it looks like auto_add_sampler is not setting the sampler correctly.

Expected behavior

When running with DDP training, the sampler for the DataLoaders should be set to a DistributedSampler.

Environment

  • PyTorch Version (e.g., 1.0): 1.7
  • OS (e.g., Linux): RedHat
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): python setup.py develop
  • Python version: 3.8
  • CUDA/cuDNN version:
  • GPU models and configuration: 1 or 6 GPUs
  • Any other relevant information:

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions