-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the epsilon in batch_norm_layer to a variable in v2. #5692
Conversation
7a32f0b
to
8a49f7f
Compare
d492421
to
1882471
Compare
@@ -94,6 +94,8 @@ class BatchNormBaseLayer : public Layer { | |||
bool useGlobalStats_; | |||
// use to compute moving mean and variance. | |||
real movingAvgFraction_; | |||
// Epsilon value used in the batch normalization formula. | |||
real EPS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- EPS 已经不再是静态常量了,成员变量用小写加下划线, epsilon_ 。
- 注释修改一下,Epsilon is a small random noise used in batch normalization for stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2482,6 +2483,8 @@ def __init__(self, | |||
self.config.use_global_stats = use_global_stats | |||
if moving_average_fraction is not None: | |||
self.config.moving_average_fraction = moving_average_fraction | |||
if epsilon is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个逻辑有问题吧?默认值 epsilon 已经被设置成 1e-5。让用户把 epsilon 设置成 None,然后又是默认值吗?这个很奇怪。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里原本是按照moving_average_fraction的方式赋值的,已修改。
@@ -3123,6 +3126,9 @@ def batch_norm_layer(input, | |||
assert (batch_norm_type is None) or (batch_norm_type == "batch_norm") or \ | |||
(batch_norm_type == "mkldnn_batch_norm") or \ | |||
(batch_norm_type == "cudnn_batch_norm") | |||
|
|||
assert epsilon >= 1e-5, "Parameter epsilon must be no less than 1e-5." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 这里 epsilon 不是 parameter 。去掉 Parameter。
- 代码的逻辑不统一,配置里面已经限制了 epsilon 不可能设置到比 1e-5 小,C++ 实现里面又会比较 epsilon 和 1e-5,取其中的较小值。C++ 里面的逻辑相当于永远都不会走到,就不再需要了?这里想要的行为到底是什么呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Parameter已修改。
- layers.py中的assert是为了限制BatchNormalizationLayer.cpp、MKLDNNBatchNormLayer.cpp中epsilon的大小,这两部分的c++源码部分并没有对epsilon再限制。
- 在CudnnBatchNormLayer.cpp中Cudnn提供的接口要求epsilon是double类型,所以对输入的epsilon进行类型转换为eps_,测试发现当输入是epsilon是1e-5时,Cudnn中bn的接口会报
CUDNN_BN_MIN_EPSILON
错误。查看了Cudnn中的API,原因可能是类型转化后的eps_并没有满足最小值的要求,所以在这个部分又做了一次取两者的较大值。当epsilon默认输入是1e-5并使用GPU训练时,确保eps_的输入合法。
1882471
to
9580c45
Compare
@@ -3106,6 +3107,8 @@ def batch_norm_layer(input, | |||
will use the mean and variance of the current batch | |||
of test data. | |||
:type use_global_stats: bool | None. | |||
:param epsilon: Small constant added to the variance to avoid numerical problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Small --> A small
- to avoid numerical problems
- This makes me think what are the problems?
- to improve numeric stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
proto/ModelConfig.proto
Outdated
|
||
// for batch normalization layer | ||
// small constant added to the variance to avoid numerical problems. | ||
optional double epsilon = 60 [ default = 0.00001 ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- small --> A / The . Please add the article.
- This comment makes me think what are the problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -2483,6 +2484,8 @@ def __init__(self, | |||
if moving_average_fraction is not None: | |||
self.config.moving_average_fraction = moving_average_fraction | |||
|
|||
self.config.epsilon = epsilon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 请像
moving_average_fraction
一样来写。 - 这里为什么不检查 epsilon 的合法性呢?如果直接调用老的接口来创建 Batch Norm 层,传入非法的
epsilon 呢? - 请在 2476 行,当 tyep= cudnn_batch_norm 时,直接检查并且给提示信息 。静态参数在配置网络时候就可以检查出来,可以不在运行时做。
cuDNN does not allow an eps value less than 1e-5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为了确保老接口使用bn层一致,这里统一将epsilon大小的判断放在了config_parser.py中,所有类型的batch norm layer都需要保证输入的epsilon大小必须大于或等于1e-5,和cuDNN的约束一致。
@@ -3123,6 +3126,9 @@ def batch_norm_layer(input, | |||
assert (batch_norm_type is None) or (batch_norm_type == "batch_norm") or \ | |||
(batch_norm_type == "mkldnn_batch_norm") or \ | |||
(batch_norm_type == "cudnn_batch_norm") | |||
|
|||
assert epsilon >= 1e-5, "epsilon must be no less than 1e-5." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 这个 epsilon 的限制是针对 cudnn 还是对 PaddlePaddle 自己实现的batch norm 也要满足限制?
- cudnn 的限制在静态配置的时候检查一下,如果小于 1e-5 给用户一个warning,告诉用户PaddlePaddle
会截断到 1e-5。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所有类型的bn层都需要统一满足限制,已改为在config_parse.py中统一判断。
@@ -21,7 +21,7 @@ namespace paddle { | |||
|
|||
REGISTER_LAYER(cudnn_batch_norm, CudnnBatchNormLayer); | |||
|
|||
const double CudnnBatchNormLayer::EPS = 1E-5; | |||
const double CudnnBatchNormLayer::MIN_EPS = 1E-5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不要用这个 magic number ,直接用 CUDNN_BN_MIN_EPSILON。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* static_cast<double>(epsilon_), The CUDNN_STATUS_BAD_PARAM error | ||
* will occur due to eps_ value is less than | ||
* CUDNN_BN_MIN_EPSILON. | ||
* The following code is to ensure that the eps_ meets requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment 信息太冗余,不要解释这个赋值的逻辑,没有啥用处。
cuDNN does not allow an epsilon value less than CUDNN_BN_MIN_EPSILON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* static_cast<double>(epsilon_), The CUDNN_STATUS_BAD_PARAM error | ||
* will occur due to eps_ value is less than | ||
* CUDNN_BN_MIN_EPSILON. | ||
* The following code is to ensure that the eps_ meets requirement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment 信息太冗余。
cuDNN does not allow an epsilon value less than CUDNN_BN_MIN_EPSILON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
resolve #5548