Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the epsilon in batch_norm_layer to a variable in v2. #5692

Merged
merged 6 commits into from
Nov 22, 2017

Conversation

peterzhang2029
Copy link
Contributor

resolve #5548

@@ -94,6 +94,8 @@ class BatchNormBaseLayer : public Layer {
bool useGlobalStats_;
// use to compute moving mean and variance.
real movingAvgFraction_;
// Epsilon value used in the batch normalization formula.
real EPS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • EPS 已经不再是静态常量了,成员变量用小写加下划线, epsilon_ 。
  • 注释修改一下,Epsilon is a small random noise used in batch normalization for stability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2482,6 +2483,8 @@ def __init__(self,
self.config.use_global_stats = use_global_stats
if moving_average_fraction is not None:
self.config.moving_average_fraction = moving_average_fraction
if epsilon is not None:
Copy link
Contributor

@lcy-seso lcy-seso Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个逻辑有问题吧?默认值 epsilon 已经被设置成 1e-5。让用户把 epsilon 设置成 None,然后又是默认值吗?这个很奇怪。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里原本是按照moving_average_fraction的方式赋值的,已修改。

@@ -3123,6 +3126,9 @@ def batch_norm_layer(input,
assert (batch_norm_type is None) or (batch_norm_type == "batch_norm") or \
(batch_norm_type == "mkldnn_batch_norm") or \
(batch_norm_type == "cudnn_batch_norm")

assert epsilon >= 1e-5, "Parameter epsilon must be no less than 1e-5."
Copy link
Contributor

@lcy-seso lcy-seso Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这里 epsilon 不是 parameter 。去掉 Parameter。
  • 代码的逻辑不统一,配置里面已经限制了 epsilon 不可能设置到比 1e-5 小,C++ 实现里面又会比较 epsilon 和 1e-5,取其中的较小值。C++ 里面的逻辑相当于永远都不会走到,就不再需要了?这里想要的行为到底是什么呢?

Copy link
Contributor Author

@peterzhang2029 peterzhang2029 Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Parameter已修改。
  • layers.py中的assert是为了限制BatchNormalizationLayer.cpp、MKLDNNBatchNormLayer.cpp中epsilon的大小,这两部分的c++源码部分并没有对epsilon再限制。
  • 在CudnnBatchNormLayer.cpp中Cudnn提供的接口要求epsilon是double类型,所以对输入的epsilon进行类型转换为eps_,测试发现当输入是epsilon是1e-5时,Cudnn中bn的接口会报 CUDNN_BN_MIN_EPSILON错误。查看了Cudnn中的API,原因可能是类型转化后的eps_并没有满足最小值的要求,所以在这个部分又做了一次取两者的较大值。当epsilon默认输入是1e-5并使用GPU训练时,确保eps_的输入合法。

@@ -3106,6 +3107,8 @@ def batch_norm_layer(input,
will use the mean and variance of the current batch
of test data.
:type use_global_stats: bool | None.
:param epsilon: Small constant added to the variance to avoid numerical problems.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Small --> A small
  • to avoid numerical problems
    • This makes me think what are the problems?
    • to improve numeric stability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// for batch normalization layer
// small constant added to the variance to avoid numerical problems.
optional double epsilon = 60 [ default = 0.00001 ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • small --> A / The . Please add the article.
  • This comment makes me think what are the problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2483,6 +2484,8 @@ def __init__(self,
if moving_average_fraction is not None:
self.config.moving_average_fraction = moving_average_fraction

self.config.epsilon = epsilon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 请像 moving_average_fraction 一样来写。
  • 这里为什么不检查 epsilon 的合法性呢?如果直接调用老的接口来创建 Batch Norm 层,传入非法的
    epsilon 呢?
  • 请在 2476 行,当 tyep= cudnn_batch_norm 时,直接检查并且给提示信息 。静态参数在配置网络时候就可以检查出来,可以不在运行时做。
    cuDNN does not allow an eps value less than 1e-5
    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了确保老接口使用bn层一致,这里统一将epsilon大小的判断放在了config_parser.py中,所有类型的batch norm layer都需要保证输入的epsilon大小必须大于或等于1e-5,和cuDNN的约束一致。

@@ -3123,6 +3126,9 @@ def batch_norm_layer(input,
assert (batch_norm_type is None) or (batch_norm_type == "batch_norm") or \
(batch_norm_type == "mkldnn_batch_norm") or \
(batch_norm_type == "cudnn_batch_norm")

assert epsilon >= 1e-5, "epsilon must be no less than 1e-5."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 这个 epsilon 的限制是针对 cudnn 还是对 PaddlePaddle 自己实现的batch norm 也要满足限制?
  • cudnn 的限制在静态配置的时候检查一下,如果小于 1e-5 给用户一个warning,告诉用户PaddlePaddle
    会截断到 1e-5。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

所有类型的bn层都需要统一满足限制,已改为在config_parse.py中统一判断。

@@ -21,7 +21,7 @@ namespace paddle {

REGISTER_LAYER(cudnn_batch_norm, CudnnBatchNormLayer);

const double CudnnBatchNormLayer::EPS = 1E-5;
const double CudnnBatchNormLayer::MIN_EPS = 1E-5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要用这个 magic number ,直接用 CUDNN_BN_MIN_EPSILON。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* static_cast<double>(epsilon_), The CUDNN_STATUS_BAD_PARAM error
* will occur due to eps_ value is less than
* CUDNN_BN_MIN_EPSILON.
* The following code is to ensure that the eps_ meets requirement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment 信息太冗余,不要解释这个赋值的逻辑,没有啥用处。

cuDNN does not allow an epsilon value less than CUDNN_BN_MIN_EPSILON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* static_cast<double>(epsilon_), The CUDNN_STATUS_BAD_PARAM error
* will occur due to eps_ value is less than
* CUDNN_BN_MIN_EPSILON.
* The following code is to ensure that the eps_ meets requirement.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment 信息太冗余。

cuDNN does not allow an epsilon value less than CUDNN_BN_MIN_EPSILON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lcy-seso lcy-seso merged commit 6577760 into PaddlePaddle:develop Nov 22, 2017
@peterzhang2029 peterzhang2029 deleted the add_bn_eq branch November 22, 2017 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should we change the epsilon in batch_norm_layer to a variable instead of a fixed value 1e-5.
2 participants