MSRA weight filler #1946

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
6 participants

This PR adds MSRAFiller, which implements an Xavier-like filler designed for use with ReLUs instead of tanh, based on the paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015.

It also adds a VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. VarianceNorm applies to the MSRAFiller and the XavierFiller (default behavior unchanged). It also adds tests for MSRAFiller and XavierFiller.

Replaces #1883 (updates based on that discussion and rebased against master).

Like the XavierFiller, the fan_in and fan_out dimensions are not correct for inner product layers (as pointed out by @seanbell in #1883). However, I did update the documentation to note this.

Nick Carlevaris-Bianco Added MSRAFiller, which implements an Xavier-like filler designed for…
… use

with ReLUs instead of tanh. Based on paper: He et al, "Delving Deep into
Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,"
2015. Added VarianceNorm option to FillerParameters which allows one to
normalize by fan_in, fan_out or their average. Updated XavierFiller to use the
VarianceNorm option (default behavior unchanged). Added tests for MSRAFiller and
XavierFiller.
1aac6b8
Contributor

seanbell commented Feb 28, 2015

Note that #1970 should fix the fan_in and fan_out calculations for InnerProductLayer since the weights will now be 2D with shape output x input.

@shelhamer shelhamer commented on the diff Mar 7, 2015

include/caffe/filler.hpp
*
- * It fills the incoming matrix by randomly sampling uniform data from
- * [-scale, scale] where scale = sqrt(3 / fan_in) where fan_in is the number
- * of input nodes. You should make sure the input blob has shape (num, a, b, c)
- * where a * b * c = fan_in.
+ * It fills the incoming matrix by randomly sampling uniform data from [-scale,
+ * scale] where scale = sqrt(3 / n) where n is the fan_in, fan_out, or their
+ * average, depending on the variance_norm option. You should make sure the
+ * input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c
+ * = fan_out. Note that this is currently not the case for inner product layers.
@shelhamer

shelhamer Mar 7, 2015

Owner

#1970 is in so this filler is now right for InnerProduct layers too.

Owner

shelhamer commented Mar 7, 2015

@nickcarlevaris thanks -- this looks good. The only potential issue is naming and attribution. I am not certain but if I understand correctly the same sqrt(2) gain may have been suggested by Andrew Saxe et al. through derivations in http://arxiv.org/abs/1312.6120v3. Although I suggested "MSRA" earlier, I think a citation to both and a functional name is perhaps best.

@nickcarlevaris you suggested "ReLU" since this is intended for use with the so-named nonlinearity. It could be this is the right choice.

@longjon ?

futurely commented Apr 9, 2015

#1940 has been merged for a month. Can these two work together to reproduce the paper's results?

omgteam commented May 21, 2015

This issue has been open for a long time. Hope it merged quickly.

omgteam commented May 23, 2015

Why hasn't this been merged into master? anything wrong?

@shelhamer shelhamer added a commit that referenced this pull request May 27, 2015

@shelhamer shelhamer Merge pull request #1946 from nickcarlevaris/msra_init
  Add MSRAFiller, an Xavier-like filler designed for use with ReLUs
c255709
Owner

shelhamer commented May 27, 2015

Merged to master in c255709. Thanks @nickcarlevaris!

I did a manual merge to re-format the commit message and add my own commit to note potentially related work. Closing since my edit threw off the github merge.

shelhamer closed this May 27, 2015

Why there is no parameter to specify the \alpha defined in Equation 15?
Since PReLU layer has been added to Caffe, I think we should also introduce this parameter into the filler.

vchuravy referenced this pull request in apache/incubator-mxnet Nov 18, 2015

Merged

Update Xavier #610

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment