Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch error when testing a model trained with multiple GPUs #736

Closed
lukeyeager opened this issue May 13, 2016 · 3 comments
Closed

Torch error when testing a model trained with multiple GPUs #736

lukeyeager opened this issue May 13, 2016 · 3 comments

Comments

@lukeyeager
Copy link
Member

lukeyeager commented May 13, 2016

digits.inference.errors.InferenceError:
  torch classify one task failed with error message -
  ...6-05-12/install/share/lua/5.1/cunn/DataParallelTable.lua:374:
  Model was serialized on 2 nGPUs, but you are running on 1 please set 
  DataParallelTable.deserializeNGPUs to ignore  serialized tower-GPU assignments

@gheinrich Would #732 fix this?

gheinrich added a commit to gheinrich/DIGITS that referenced this issue May 17, 2016
Datapoints:

MNIST+LeNet (30 epochs)
1 GPU: 56s
2 GPUs: 2m51s
(not unexpected due to communication overhead)

Upscaled CIFAR + Alexnet (10 epochs):
1 GPU: 13m11s
2 GPUs: 13m7s

Upscaled CIFAR + Googlenet (2 epochs):
1 GPU: 16m20s
2 GPUs: 11m13s

Fix NVIDIA#736
@gheinrich
Copy link
Contributor

Thanks for the bug report, I have updated the commit on #734 to fix this (with the new programming model we also need to set the number of GPUs when we deserialize a model when doing inference or fine-tuning).

@lukeyeager
Copy link
Member Author

^ I think you meant #732?

@gheinrich
Copy link
Contributor

I think you meant #732?

Whoops. Indeed!

SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this issue Mar 30, 2017
Datapoints:

MNIST+LeNet (30 epochs)
1 GPU: 56s
2 GPUs: 2m51s
(not unexpected due to communication overhead)

Upscaled CIFAR + Alexnet (10 epochs):
1 GPU: 13m11s
2 GPUs: 13m7s

Upscaled CIFAR + Googlenet (2 epochs):
1 GPU: 16m20s
2 GPUs: 11m13s

Fix NVIDIA#736
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants