Torch Multi-GPU support #480

gheinrich · 2015-12-18T12:51:18Z

Close #138

This commit enables data parallelism, i.e. train
batches are evenly spread across selected GPUs.

Strong scaling is applied by default, i.e. batch
size is unchanged and each GPU receives its share
during training.

For better performance, installation of the NCCL
module is recommended.

On a 2xTitanX machine (with NCCL):

GoogleNet: 51m4s (1 GPU) -> 34m3s (2 GPUs)
Alexnet: 12m31s (1 GPU) -> 10m38s (2 GPUs)
LeNet*: 53s (1GPU) -> 1m30s (2 GPUs)
the overhead of inter-GPU communication seems to
outweigh the compute gain on LeNet

lukeyeager · 2015-12-18T19:03:03Z

Nice work! Everything seems to be working for me. I'm trying out the NCCL integration now...

One things I've noticed is that we should update the Torch warning box for creating new classification/generic models.

lukeyeager · 2015-12-18T22:41:20Z

docs/BuildTorch.md

+% make CUDA_HOME=/usr/local/cuda test
+```
+
+> NOTE: if the above command fails due to missing libraries you may explicitely point the makefile to the location of your NVidia driver. For example:


Typo - should be "explicitly"

Thanks, I always make this spelling mistake!

lukeyeager · 2015-12-18T22:46:25Z

This is working for me both with and without NCCL. Looks good except for the little doc nitpicks above.

This commit enables data parallelism, i.e. train batches are evenly spread across selected GPUs. Strong scaling is applied by default, i.e. batch size is unchanged and each GPU receives its share during training. For better performance, installation of the NCCL module is recommended. On a 2xTitanX machine (with NCCL): - GoogleNet: 51m4s (1 GPU) -> 34m3s (2 GPUs) - Alexnet: 12m31s (1 GPU) -> 10m38s (2 GPUs) - LeNet*: 53s (1GPU) -> 1m30s (2 GPUs) * the overhead of inter-GPU communication seems to outweigh the compute gain on LeNet

Torch Multi-GPU support

gheinrich added enhancement GPU torch labels Dec 18, 2015

lukeyeager reviewed Dec 18, 2015
View reviewed changes

gheinrich force-pushed the dev/torch-multi-gpu branch from 644f461 to 32553a4 Compare December 19, 2015 15:26

lukeyeager added a commit that referenced this pull request Jan 5, 2016

Merge pull request #480 from gheinrich/dev/torch-multi-gpu

8d4ee4b

Torch Multi-GPU support

lukeyeager merged commit 8d4ee4b into NVIDIA:master Jan 5, 2016

gheinrich deleted the dev/torch-multi-gpu branch April 14, 2016 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch Multi-GPU support #480

Torch Multi-GPU support #480

gheinrich commented Dec 18, 2015

lukeyeager commented Dec 18, 2015

lukeyeager Dec 18, 2015

gheinrich Dec 19, 2015

lukeyeager commented Dec 18, 2015

Torch Multi-GPU support #480

Torch Multi-GPU support #480

Conversation

gheinrich commented Dec 18, 2015

lukeyeager commented Dec 18, 2015

lukeyeager Dec 18, 2015

Choose a reason for hiding this comment

gheinrich Dec 19, 2015

Choose a reason for hiding this comment

lukeyeager commented Dec 18, 2015