Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
MSRA weight filler #1946
Conversation
This was referenced Feb 23, 2015
|
Note that #1970 should fix the |
shelhamer
added enhancement JL
labels
Mar 7, 2015
shelhamer
commented on the diff
Mar 7, 2015
| * | ||
| - * It fills the incoming matrix by randomly sampling uniform data from | ||
| - * [-scale, scale] where scale = sqrt(3 / fan_in) where fan_in is the number | ||
| - * of input nodes. You should make sure the input blob has shape (num, a, b, c) | ||
| - * where a * b * c = fan_in. | ||
| + * It fills the incoming matrix by randomly sampling uniform data from [-scale, | ||
| + * scale] where scale = sqrt(3 / n) where n is the fan_in, fan_out, or their | ||
| + * average, depending on the variance_norm option. You should make sure the | ||
| + * input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c | ||
| + * = fan_out. Note that this is currently not the case for inner product layers. |
|
|
|
@nickcarlevaris thanks -- this looks good. The only potential issue is naming and attribution. I am not certain but if I understand correctly the same @nickcarlevaris you suggested "ReLU" since this is intended for use with the so-named nonlinearity. It could be this is the right choice. @longjon ? |
futurely
commented
Apr 9, 2015
|
#1940 has been merged for a month. Can these two work together to reproduce the paper's results? |
omgteam
commented
May 21, 2015
|
This issue has been open for a long time. Hope it merged quickly. |
omgteam
commented
May 23, 2015
|
Why hasn't this been merged into master? anything wrong? |
shelhamer
added a commit
that referenced
this pull request
May 27, 2015
|
|
shelhamer |
c255709
|
|
Merged to master in c255709. Thanks @nickcarlevaris! I did a manual merge to re-format the commit message and add my own commit to note potentially related work. Closing since my edit threw off the github merge. |
shelhamer
closed this
May 27, 2015
happynear
commented
Jun 10, 2015
|
Why there is no parameter to specify the \alpha defined in Equation 15? |
nickcarlevaris commentedFeb 23, 2015
This PR adds MSRAFiller, which implements an Xavier-like filler designed for use with ReLUs instead of tanh, based on the paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015.
It also adds a VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. VarianceNorm applies to the MSRAFiller and the XavierFiller (default behavior unchanged). It also adds tests for MSRAFiller and XavierFiller.
Replaces #1883 (updates based on that discussion and rebased against master).
Like the XavierFiller, the fan_in and fan_out dimensions are not correct for inner product layers (as pointed out by @seanbell in #1883). However, I did update the documentation to note this.