Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Scale and Bias Layers #3591
Conversation
This was referenced Jan 23, 2016
|
@jeffdonahue LGTM. |
|
Thanks @ducha-aiki and @jeffdonahue for the scale + bias layers! |
shelhamer
added a commit
that referenced
this pull request
Jan 27, 2016
|
|
shelhamer |
dc831aa
|
shelhamer
merged commit dc831aa
into
BVLC:master
Jan 27, 2016
1 check passed
siddharthm83
commented
Jan 27, 2016
|
cool! |
jeffdonahue
referenced
this pull request
Jan 27, 2016
Closed
Added layer for learnable eltwise y=kx+b #2996
jeffdonahue
deleted the
jeffdonahue:scale-bias-layer branch
Jan 27, 2016
|
@shelhamer nice! |
lfrdm
commented
Jan 29, 2016
|
Hi guys. Thanks a lot for the great work with the batch normalization layer. To understand the correct implementation accordingly the paper for the train_val.prototxt: First one has to compute the normalized batch with the layer type batchNorm (after my ReLUs) and then use the scaleLayer with scale_param{ bias_term: true } and the biasLayer to learn the scale and bias of my normalized batch? |
|
Batch norm is, in the original paper and in typical use, placed before the activation (ReLU or otherwise), not after. You should use ScaleLayer with |
lfrdm
commented
Jan 29, 2016
|
Thanks @jeffdonahue, for the answer and explanation. I'm curious though, how the normalization is handled while testing? In the paper introduction they refer to only use the normalization on the training batches. Do I have to use different use_global_stats for the training and testing phase or is it handled internally in the batch_norm_layer? |
This was referenced Feb 9, 2016
cuihenggang
commented
Apr 1, 2016
|
Does anyone have the new train_val file for the Inception-BN network with the ScaleBias layers added (for the ILSVRC12 dataset)? |
jeffdonahue commentedJan 23, 2016
This PR combines @ducha-aiki's
ChannelwiseAffineLayer(#2996) with my Scalar (#3021) and Bias (#3550) for appropriate credit and should have all the advantages of each. (After some discussion we decided to name the scaling partScalefor simplicity.)ScaleLayeralone can now replaceChannelwiseAffineLayerby settingscale_param { bias_term: true }with a combined GPU kernel to both scale and add to the input, which should give the performance advantage that @ducha-aiki measured as part of the discussion in #3229, while still allowing for the modularity of separating the two when desired. BothScaleLayerandBiasLayercan take a single bottom to learn the scale/bias as a parameter, or two bottoms so the scale/bias can be taken as an input*. The dimensions of the scale/bias blob may be any subsequence of the dimensions in the first bottom. The operation can be thought of as a (virtual) reshaping+tiling to the shape of the first Blob, followed by element-wise addition/multiplication. The operations could alternatively be performed by composingReshape,Tile, andEltwiselayers, but in any case except whereEltwiseLayeralone suffices, this would be less efficient in terms of memory and performance, often substantially so.@ducha-aiki hopefully this is the best of both worlds in terms of performance, generality, and modularity -- let me know if you have any feedback though. Otherwise we will try to get this reviewed and merged soon.
*I'm happy to see this excess logic simplified/removed if/when @longjon's
param_bottomandParameterLayerwork is merged, but for now this is the best way I could think of to address many different use cases for the layers.