Sync Batchnorm #24

dougsouza · 2019-05-16T16:46:37Z

Pytorch guys recently released an official SyncBatchnorm implementation. It requires a specific setup where we use torch.parallel.DistributedDataParallel(...) instead of nn.DataParallel(...) and launch a separate process for each GPU.

I wrote a small step-by-step here: https://github.com/dougsouza/pytorch-sync-batchnorm-example.

In my experiments SyncBatchnorm worked well. Also, using torch.parallel.DistributedDataParallel(...) with one process per GPU provides a huge speed up in training. The gain of adding more GPUs is almost linear, it performs a lot faster than nn.DataParallel(...). I believe you could reduce training time drastically by switching to torch.parallel.DistributedDataParallel(...).

BTW, thanks for this implementation!

The text was updated successfully, but these errors were encountered:

ajbrock · 2019-05-16T17:41:00Z

If you're interested in testing and adding it to this codebase, please feel free to make a PR.

ajbrock closed this as completed May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync Batchnorm #24

Sync Batchnorm #24

dougsouza commented May 16, 2019

ajbrock commented May 16, 2019

Sync Batchnorm #24

Sync Batchnorm #24

Comments

dougsouza commented May 16, 2019

ajbrock commented May 16, 2019