-
Notifications
You must be signed in to change notification settings - Fork 951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to speed up? I'm trying to run GoogLeNet with Batch normalization, but it is not fast enough #312
Comments
Your best bet is to use the Theano profiler first and see if there's anything that's taking much longer than it should. Just run your script with EDIT: it's also worthwhile to define a helper function that creates an inception module. It reduces code duplication a lot. Also with the Titan Z, which is a dual GPU card, Theano will only ever use one of them (at least for now). I don't know what the multi-GPU situation is with Caffe right now, but maybe that could explain part of the performance difference. |
For GPU profiling, you'll also need |
Have you tried just using In any case, it would be cool to have GoogLeNet in Lasagne/Recipes, so let's try to make it work! |
Thank you for comments!! First of all, I'm sorry that I made a huge mistake to measure processing time. Currently, googlenet with batch normalisation in my system takes 0.4 ~ 0.5 sec per iteration (Intel i7-4820K, titan Z, 24GB RAM, Ubuntu 14.04.2 running on SSD, cuDNN v2 installed). It is slightly faster than the googlenet with batch normalisation running on caffe-dev with cuDNN v1. Cool!
@benanne I will try it asap!
The version of caffe I'm using now is caffe-dev, and it works with only one GPU at a time. Anyway,
@f0k you're right! I fixed it. Your implementation look cooler than mine :) I will also try it. I will implement read/write functions for imagenet and learning rate scheduling. I will report the results of training and upload trained weights :) |
Can you still do the Theano profile? As you compare with caffe without the On Fri, Jun 26, 2015 at 2:04 PM, Jaehyun Lim notifications@github.com
|
@nouiz
I've tried to run googlenet on caffe with cuDNNv2, but there was an issue I couldn't solve. BVLC/caffe#2508 |
Sorry! This error should not happen if the network architecture is properly designed. The above error was raised because I changed kernel size for my personal experiments. Anyway, googlenet with bn on caffe-dev (w/ cuDNN v2) took 0.525 sec per iteration (about 21 sec per 40 iterations). I'm using Intel i7-4820K, titan Z, 24GB RAM, Ubuntu 14.04.2 running on SSD, and 1TB HDD for imagenet data. |
Thanks for the profile. It confirm that Theano run relatively ok. It show that one elemwise 31.0% 31.0% 174.307s 2.03e-04s C 860720 3074 On Mon, Jun 29, 2015 at 2:19 AM, Jaehyun Lim notifications@github.com
|
Hi, I'm sorry that I made a mistake (again) to give you guys wrong information about processing time.
Thus, my implementation is about 3.5 times slower than caffe-dev with cuDNN v2. I updated my repository, including imagenet batchiterator, which can read lmdb data, crop image, and git shuffled batch at each iteration. (https://github.com/lim0606/lasagne-googlenet) I'm now running my system, and it seems working fine. If the training finishes (properly), I will update the results including weights. Thank you for all your assistance. |
Both Caffe and Theano use the same primitives (cublas, cudnn) so if the Theano version really is 3.5x slower we need to get to the bottom of this. If it is a Theano issue we should find a way to fix it in Theano itself, but if it is an issue in your code we should figure out how to ensure that people do not run into it when they write their own code. So either way we should definitely investigate what's causing this slowness. Where did the 0.4~0.5 sec come from then? What was the source of the confusion? |
I think the first step is to find where time is spent. Can you check how This would tell where more detailed investigation check are needed. Also, I suppose you use Theano dev version. If not, update, there is speed I'll be not available this week and very busy the next one. So I can't help On Sat, Jul 4, 2015 at 7:48 AM, Sander Dieleman notifications@github.com
|
Thank you for comments!
@benanne I made a mistake to put
@nouiz The mini-batch creation takes about 0.01 sec per each mini batch. I uploaded an example python script to test mini-batch creation for imagenet dataset (https://github.com/lim0606/lasagne-googlenet/blob/master/tools/example_batchiterator.py). I also uploaded a log file (https://github.com/lim0606/lasagne-googlenet/blob/master/example_batchiterator_20150705_014348.log). I ran the example with uncommenting line number 150, 182, 183, 282, 327, and 328 in https://github.com/lim0606/lasagne-googlenet/blob/master/utils/batchiterator.py. This batch iterator read lmdb dataset converted via caffe.
I just checked my version, and it is 0.7.0.dev-e6a4c073e97cadde0a8398b9a672dd91120882f8. |
Hi, everyone I've trained googlenet with batch normalisation, but with caffe. I got 71.7% top1 accuracy, and 90.68% top5 accuracy for imagenet validation dataset. The model was trained
The result was evaluated only with single random crop. I'd like to share the results, including learning rate scheduling, after I confirm that the results are reproducible. I'm running it from the beginning. Anyway, I strongly believe that it is worth to train this model with lasagne (and theano), but my lasagne implementation is still not fast enough. Best regards, Jaehyun |
Has anybody succeeded in training a state of the art computer vision model in Theano (Lasagne or otherwise?), such as GoogLeNet on ImageNet? I find that Theano scales very poorly to complex models and large datasets. I'd be happy to have examples to the contrary! |
Theano's lack of multi-GPU support (for now) is the main thing holding it back in this respect. Also, people who work on convnets for computer vision seem to prefer other libraries (e.g. Caffe). Theano seems to be more popular in different deep learning subfields. As a result, there is indeed very little work in terms of training these large-scale image classification models in Theano directly, but I don't think it's impossible. The fact is that these models are a lot less demanding in terms of the types of neural network building blocks they consist of, and all the more demanding in terms of raw training performance, so Theano might not be the best fit in that case. That said, it definitely isn't impossible, and nothing stops you from implementing custom optimized ops for certain operations yourself. I believe this is currently being done for batch normalization: Theano/Theano#3410 This type of work is what many other libraries expect you to do all the time anyway. Theano's symbolic paradigm just makes rapid prototyping of new ideas a bit easier. Besides GoogLeNet though, I don't really know of any 'complex' models that Theano is less suitable for. (OxfordNet shouldn't be a problem.) Do you have any more examples? |
Closing because it's not really a Lasagne issue. |
Hi, everyone.
I'm trying to run GoogLeNet with Batchnormalization, but it is not fast enough.
My implementation is here. (I have to write some scripts to read/writes images and learning rate scheduling)
https://github.com/lim0606/lasagne-googlenet
I'm testing the running time, and it took about 1.8 sec per iteration.
(I'm using Intel i7-4820K, titan Z, 24GB RAM, and OS running on SSD)
I used to use caffe, and I already tested googlenet with bn on caffe. it took 0.6 sec per iteration.
Is there any one who can help me to speed up this model? :)
Thank you,
Jaehyun
The text was updated successfully, but these errors were encountered: