-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Torch performance issues #339
Comments
cc @soumith as he expressed interest in this. I have pushed a couple of changes there which speed Torch training up (figures below are for 20 epochs of training LeNet on MNIST 45k training samples and 15k validation samples, with one validation per epoch): |
nice. Could you point me to instructions on how to create the imagenet lmdb? I'm trying to get things up and running, but that's kind of a blocker for me at the moment... Should I use caffe's imagenet instructions? https://github.com/BVLC/caffe/tree/master/examples/imagenet |
Thanks Soumith! I have submitted PR#64 with decently complete cud_v4 bindings - please take a look: I will merge it later today if there are no objections. Best, From: Soumith Chintala <notifications@github.commailto:notifications@github.com> cc @soumithhttps://github.com/soumith as he expressed interest in this. I have pushed a couple of changes therehttps://github.com/gheinrich/DIGITS/commits/dev/torch-speed which speed Torch training up (figures below are for 20 epochs of training LeNet on MNIST 45k training samples and 15k validation samples, with one validation per epoch): Reply to this email directly or view it on GitHubhttps://github.com//issues/339#issuecomment-146944476. This email message is for the sole use of the intended recipient(s) and may contain reply email and destroy all copies of the original message. |
Correction - had to resubmit PR against the right branch (R4), here: From: Boris Fomitchev <bfomitchev@nvidia.commailto:bfomitchev@nvidia.com> Thanks Soumith! I have submitted PR#64 with decently complete cud_v4 bindings - please take a look: I will merge it later today if there are no objections. Best, From: Soumith Chintala <notifications@github.commailto:notifications@github.com> cc @soumithhttps://github.com/soumith as he expressed interest in this. I have pushed a couple of changes therehttps://github.com/gheinrich/DIGITS/commits/dev/torch-speed which speed Torch training up (figures below are for 20 epochs of training LeNet on MNIST 45k training samples and 15k validation samples, with one validation per epoch): Reply to this email directly or view it on GitHubhttps://github.com//issues/339#issuecomment-146944476. This email message is for the sole use of the intended recipient(s) and may contain reply email and destroy all copies of the original message. |
You need to create the LMDB using DIGITS if you want to use DIGITS to train a model. This page explains how to structure your image folders in a way that DIGITS can understand. There is a section on this page that explains how to create a subset of imagenet (it takes a while to create the full imagenet LMDB). This page then explains how to create the LMDB using DIGITS. Let us know if you need any more information. Thanks! |
This is now merged into master cudnn.torch/R4, thanks Soumith! Best, From: Boris Fomitchev Correction - had to resubmit PR against the right branch (R4), here: From: Boris Fomitchev <bfomitchev@nvidia.commailto:bfomitchev@nvidia.com> Thanks Soumith! I have submitted PR#64 with decently complete cud_v4 bindings - please take a look: I will merge it later today if there are no objections. Best, From: Soumith Chintala <notifications@github.commailto:notifications@github.com> cc @soumithhttps://github.com/soumith as he expressed interest in this. I have pushed a couple of changes therehttps://github.com/gheinrich/DIGITS/commits/dev/torch-speed which speed Torch training up (figures below are for 20 epochs of training LeNet on MNIST 45k training samples and 15k validation samples, with one validation per epoch): Reply to this email directly or view it on GitHubhttps://github.com//issues/339#issuecomment-146944476. This email message is for the sole use of the intended recipient(s) and may contain reply email and destroy all copies of the original message. |
Original (20 epochs LeNet on MNIST): 187s Now: 142s Helps with bug NVIDIA#339
Original (20 epochs LeNet on MNIST): 187s Now: 142s Helps with bug NVIDIA#339
I don't think there is a general performance issue with the integration of Torch into DIGITS anymore. Some models train faster with Torch, other models train faster with Caffe. Some numbers below:
(training is slower with multiple GPUs presumably due to the communication overhead)
(Torch slowliness mostly due to extra Batch Normalization layers) |
Interesting data points. Thanks for sharing. Just a note on BatchNorm, latest nn/cunn have super optimized batchnorm (faster than CuDNN R4). |
Breaking discussion from #324 (comment) out into a separate issue.
The text was updated successfully, but these errors were encountered: