Update standard networks for Caffe #733

lukeyeager · 2016-05-10T20:33:26Z

This pull request makes 3 updates to the Caffe standard networks:

Give unique names to the train/val Data layers in each network, and use stage for include rules instead of phase
Change LeNet to use the Data layer for input scaling during train/val, but still use a Power layer during deploy.
Update the batch sizes
- All are now powers of two
- AlexNet and GoogLeNet use the training batch sizes that were used in their original papers

The first change is purely cosmetic. The second may have a slight but negligible improvement in performance. I made the third change because cuDNN typically prefers batch sizes that are even powers of two.

Still use Power layer for deploy phase

Give train/val data layers different names - Better visualizations Use stage instead of phase to differentiate - Consistent with other layers

lukeyeager · 2016-05-10T20:33:36Z

I ran some experiments on the new networks to verify that the batch sizes changes didn't break anything.

OS: Ubuntu 16.04
NVcaffe: 0.14.5
cuDNN: 4.0.7
GPU0: GTX980 - Maxwell, 4GB
GPU1: K40 - Kepler, 12GB

Runtime (most are slightly improved - 👍)

Network	980+cuDNN	980	K40+cuDNN	K40
Old AlexNet (100/100)	1m06s	1m56s	1m13s	2m32s
New AlexNet (128/32)	1m10s	1m51s	1m12s	2m30s

Old GoogLeNet (24/24)	2m32s	6m53s	4m57s	8m37s
New GoogLeNet (32/16)	2m24s	6m50s	4m25s	8m35s

Memory utilization (nothing runs out of memory - 👍)

Network	980+cuDNN	980	K40+cuDNN	K40
Old AlexNet (100/100)	3572	3219	3809	3214
New AlexNet (128/32)	3559	2991	3699	2985

Old GoogLeNet (24/24)	2974	3056	3097	3050
New GoogLeNet (32/16)	3542	3387	4682	3381

Full data:
https://gist.github.com/lukeyeager/4ddd9e1388f8bd70d337b2c80dd0a035

gheinrich · 2016-05-11T14:36:35Z

That looks good to me. I suppose I'll need to update batch sizes for Torch too.

mpbrigham · 2016-08-15T08:17:30Z

Hi Luke,

You mention slight performance improvement for change #2. Is that due to power scaling in data layer being faster than scale layer? In any case, new users may find it a little confusing to see three power scaling operations rather than just one:

layer {
  name: "train-data"
  type: "Data"
  top: "data"
  top: "label"
  include { stage: "train" }
  data_param { batch_size: 64 }
}
layer {
  name: "val-data"
  type: "Data"
  top: "data"
  top: "label"
  include { stage: "val" }
  data_param { batch_size: 64 }
}
layer {
  name: "scale"
  type: "Power"
  bottom: "data"
  top: "scaled"
  power_param { scale: 0.0125 }
}

Also, what does the comment "# 1/(standard deviation)" actually mean?

lukeyeager · 2016-08-15T16:31:53Z

Is that due to power scaling in data layer being faster than scale layer?

Yes, that's the reason. It's handled in the multi-threaded data loader.

Also, what does the comment "# 1/(standard deviation)" actually mean?

The standard deviation for the MNIST dataset is ~80 per pixel (from a range of [0-255] per pixel).

mpbrigham · 2016-08-15T16:46:28Z

Thank for the feedback Luke. Seeing that training the MNIST dataset is not very computationally expensive, I would still suggest the simpler network definition with a single scale layer, much clearer for new users.

Would it be helpful to change the comment # 1/(standard deviation) to # 1/(standard deviation on MNIST dataset)?

lukeyeager · 2016-08-15T17:07:01Z

I'm comfortable with those changes, yes. Would you like to make a PR for it?

mpbrigham · 2016-08-15T17:39:10Z

That's great, thank you. Just submitted #976 .

Update standard networks for Caffe

lukeyeager added 3 commits May 9, 2016 14:32

LeNet - use Data layer for train/val input scaling

fec6199

Still use Power layer for deploy phase

Update Data layers for standard networks

2160783

Give train/val data layers different names - Better visualizations Use stage instead of phase to differentiate - Consistent with other layers

Update batch sizes in standard networks

2be3381

lukeyeager added the caffe label May 10, 2016

lukeyeager merged commit 9d9ff58 into NVIDIA:master May 16, 2016

lukeyeager deleted the update-standard-networks branch May 16, 2016 17:13

This was referenced May 16, 2016

Update batch sizes for standard torch networks #749

Merged

Update default batch sizes for cuDNN #155

Closed

lukeyeager mentioned this pull request May 25, 2016

Update python-layer example for new LeNet #787

Merged

gheinrich mentioned this pull request May 31, 2016

Python Layer Documentation update #802

Closed

lukeyeager mentioned this pull request Aug 15, 2016

Simplify input scaling on LeNet network #976

Merged

SlipknotTN pushed a commit to cynnyx/DIGITS that referenced this pull request Mar 30, 2017

Merge pull request NVIDIA#733 from lukeyeager/update-standard-networks

9272622

Update standard networks for Caffe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update standard networks for Caffe #733

Update standard networks for Caffe #733

lukeyeager commented May 10, 2016

lukeyeager commented May 10, 2016

gheinrich commented May 11, 2016

mpbrigham commented Aug 15, 2016

lukeyeager commented Aug 15, 2016

mpbrigham commented Aug 15, 2016

lukeyeager commented Aug 15, 2016

mpbrigham commented Aug 15, 2016

Update standard networks for Caffe #733

Update standard networks for Caffe #733

Conversation

lukeyeager commented May 10, 2016

lukeyeager commented May 10, 2016

gheinrich commented May 11, 2016

mpbrigham commented Aug 15, 2016

lukeyeager commented Aug 15, 2016

mpbrigham commented Aug 15, 2016

lukeyeager commented Aug 15, 2016

mpbrigham commented Aug 15, 2016