Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Batchnorm #24

Closed
dougsouza opened this issue May 16, 2019 · 1 comment
Closed

Sync Batchnorm #24

dougsouza opened this issue May 16, 2019 · 1 comment

Comments

@dougsouza
Copy link

Pytorch guys recently released an official SyncBatchnorm implementation. It requires a specific setup where we use torch.parallel.DistributedDataParallel(...) instead of nn.DataParallel(...) and launch a separate process for each GPU.

I wrote a small step-by-step here: https://github.com/dougsouza/pytorch-sync-batchnorm-example.

In my experiments SyncBatchnorm worked well. Also, using torch.parallel.DistributedDataParallel(...) with one process per GPU provides a huge speed up in training. The gain of adding more GPUs is almost linear, it performs a lot faster than nn.DataParallel(...). I believe you could reduce training time drastically by switching to torch.parallel.DistributedDataParallel(...).

BTW, thanks for this implementation!

@ajbrock
Copy link
Owner

ajbrock commented May 16, 2019

If you're interested in testing and adding it to this codebase, please feel free to make a PR.

@ajbrock ajbrock closed this as completed May 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants