-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update standard networks for Caffe #733
Update standard networks for Caffe #733
Conversation
Still use Power layer for deploy phase
Give train/val data layers different names - Better visualizations Use stage instead of phase to differentiate - Consistent with other layers
I ran some experiments on the new networks to verify that the batch sizes changes didn't break anything. OS: Ubuntu 16.04 Runtime (most are slightly improved - 👍)
Memory utilization (nothing runs out of memory - 👍)
Full data: |
That looks good to me. I suppose I'll need to update batch sizes for Torch too. |
Hi Luke, You mention slight performance improvement for change #2. Is that due to power scaling in data layer being faster than scale layer? In any case, new users may find it a little confusing to see three power scaling operations rather than just one:
Also, what does the comment "# 1/(standard deviation)" actually mean? |
Yes, that's the reason. It's handled in the multi-threaded data loader.
The standard deviation for the MNIST dataset is ~80 per pixel (from a range of [0-255] per pixel). |
Thank for the feedback Luke. Seeing that training the MNIST dataset is not very computationally expensive, I would still suggest the simpler network definition with a single scale layer, much clearer for new users. Would it be helpful to change the comment |
I'm comfortable with those changes, yes. Would you like to make a PR for it? |
That's great, thank you. Just submitted #976 . |
Update standard networks for Caffe
This pull request makes 3 updates to the Caffe standard networks:
train
/val
Data
layers in each network, and use stage for include rules instead of phaseData
layer for input scaling duringtrain
/val
, but still use aPower
layer duringdeploy
.The first change is purely cosmetic. The second may have a slight but negligible improvement in performance. I made the third change because cuDNN typically prefers batch sizes that are even powers of two.