Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update standard networks for Caffe #733

Merged
merged 3 commits into from
May 16, 2016

Conversation

lukeyeager
Copy link
Member

This pull request makes 3 updates to the Caffe standard networks:

  1. Give unique names to the train/val Data layers in each network, and use stage for include rules instead of phase
  2. Change LeNet to use the Data layer for input scaling during train/val, but still use a Power layer during deploy.
  3. Update the batch sizes
    • All are now powers of two
    • AlexNet and GoogLeNet use the training batch sizes that were used in their original papers

The first change is purely cosmetic. The second may have a slight but negligible improvement in performance. I made the third change because cuDNN typically prefers batch sizes that are even powers of two.

Still use Power layer for deploy phase
Give train/val data layers different names
  - Better visualizations

Use stage instead of phase to differentiate
  - Consistent with other layers
@lukeyeager
Copy link
Member Author

I ran some experiments on the new networks to verify that the batch sizes changes didn't break anything.

OS: Ubuntu 16.04
NVcaffe: 0.14.5
cuDNN: 4.0.7
GPU0: GTX980 - Maxwell, 4GB
GPU1: K40 - Kepler, 12GB

Runtime (most are slightly improved - 👍)

Network 980+cuDNN 980 K40+cuDNN K40
Old AlexNet (100/100) 1m06s 1m56s 1m13s 2m32s
New AlexNet (128/32) 1m10s 1m51s 1m12s 2m30s
Old GoogLeNet (24/24) 2m32s 6m53s 4m57s 8m37s
New GoogLeNet (32/16) 2m24s 6m50s 4m25s 8m35s

Memory utilization (nothing runs out of memory - 👍)

Network 980+cuDNN 980 K40+cuDNN K40
Old AlexNet (100/100) 3572 3219 3809 3214
New AlexNet (128/32) 3559 2991 3699 2985
Old GoogLeNet (24/24) 2974 3056 3097 3050
New GoogLeNet (32/16) 3542 3387 4682 3381

Full data:
https://gist.github.com/lukeyeager/4ddd9e1388f8bd70d337b2c80dd0a035

@gheinrich
Copy link
Contributor

That looks good to me. I suppose I'll need to update batch sizes for Torch too.

@lukeyeager lukeyeager merged commit 9d9ff58 into NVIDIA:master May 16, 2016
@lukeyeager lukeyeager deleted the update-standard-networks branch May 16, 2016 17:13
@mpbrigham
Copy link
Contributor

Hi Luke,

You mention slight performance improvement for change #2. Is that due to power scaling in data layer being faster than scale layer? In any case, new users may find it a little confusing to see three power scaling operations rather than just one:

layer {
  name: "train-data"
  type: "Data"
  top: "data"
  top: "label"
  include { stage: "train" }
  data_param { batch_size: 64 }
}
layer {
  name: "val-data"
  type: "Data"
  top: "data"
  top: "label"
  include { stage: "val" }
  data_param { batch_size: 64 }
}
layer {
  name: "scale"
  type: "Power"
  bottom: "data"
  top: "scaled"
  power_param { scale: 0.0125 }
}

Also, what does the comment "# 1/(standard deviation)" actually mean?

@lukeyeager
Copy link
Member Author

Is that due to power scaling in data layer being faster than scale layer?

Yes, that's the reason. It's handled in the multi-threaded data loader.

Also, what does the comment "# 1/(standard deviation)" actually mean?

The standard deviation for the MNIST dataset is ~80 per pixel (from a range of [0-255] per pixel).

@mpbrigham
Copy link
Contributor

Thank for the feedback Luke. Seeing that training the MNIST dataset is not very computationally expensive, I would still suggest the simpler network definition with a single scale layer, much clearer for new users.

Would it be helpful to change the comment # 1/(standard deviation) to # 1/(standard deviation on MNIST dataset)?

@lukeyeager
Copy link
Member Author

I'm comfortable with those changes, yes. Would you like to make a PR for it?

@mpbrigham
Copy link
Contributor

That's great, thank you. Just submitted #976 .

SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this pull request Mar 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants