Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fluid Clean]move BatchNorm from flud.dygraph.nn to paddle.nn.layer.norm #48734

Merged
merged 28 commits into from
Dec 12, 2022
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
07bcdcc
move BatchNorm from flud.dygraph.nn to paddle.nn.layer.norm
risemeup1 Dec 5, 2022
fd9fa46
modify confilict
risemeup1 Dec 5, 2022
fc5a81e
modfiy conflict
risemeup1 Dec 5, 2022
b867666
Merge branch 'develop' into batch_norm
risemeup1 Dec 5, 2022
91f1a4d
modify pre-commit error
risemeup1 Dec 5, 2022
c5393eb
Merge branch 'batch_norm' of github.com:risemeup1/Paddle into batch_norm
risemeup1 Dec 5, 2022
f383b37
modify static-check ci error
risemeup1 Dec 5, 2022
c0f88d3
fix failed tests
risemeup1 Dec 6, 2022
01b0f6c
Merge branch 'develop' into batch_norm
risemeup1 Dec 6, 2022
2141dac
modify conflict
risemeup1 Dec 6, 2022
f4e95c6
Merge branch 'develop' into batch_norm
risemeup1 Dec 6, 2022
ffb4a44
modify conflict
risemeup1 Dec 6, 2022
59ceebe
Merge branch 'batch_norm' of github.com:risemeup1/Paddle into batch_norm
risemeup1 Dec 6, 2022
04824b5
delete import modelu GRUUnit
risemeup1 Dec 6, 2022
7064e9f
fix falied test
risemeup1 Dec 6, 2022
61d48ff
fix failed testes
risemeup1 Dec 6, 2022
e52ee55
fix failed tests
risemeup1 Dec 6, 2022
57bfe4c
fix failed tests
risemeup1 Dec 6, 2022
b358a3c
fix failed test
risemeup1 Dec 7, 2022
8e06fa4
fix error in test_fused_resenet_basic_block_op_xpu.py
risemeup1 Dec 7, 2022
a4847e1
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
risemeup1 Dec 7, 2022
405410a
fix conflict
risemeup1 Dec 7, 2022
7bc2aa2
fix conflict
risemeup1 Dec 8, 2022
73470e0
fix conflict
risemeup1 Dec 8, 2022
9fb71b4
modify after xiaoguang reviewed
risemeup1 Dec 9, 2022
e9609cd
Merge branch 'develop' into batch_norm
risemeup1 Dec 11, 2022
2274529
fix conflict
risemeup1 Dec 11, 2022
ba4b998
fix conflict
risemeup1 Dec 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@
from paddle.fluid.contrib.slim.quantization import ImperativeQuantAware
from paddle.fluid.dygraph.io import INFER_MODEL_SUFFIX, INFER_PARAMS_SUFFIX
from paddle.nn.layer import ReLU, LeakyReLU, Sigmoid, Softmax, ReLU6
from paddle.nn import Linear, Conv2D, Softmax, BatchNorm
from paddle.nn.layer.norm import BatchNorm
from paddle.nn import Linear, Conv2D, Softmax
from paddle.fluid.log_helper import get_logger

from imperative_test_utils import (
Expand Down
306 changes: 0 additions & 306 deletions python/paddle/fluid/dygraph/nn.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@
__all__ = [
'Conv3D',
'Linear',
'BatchNorm',
'Embedding',
'Conv3DTranspose',
'GroupNorm',
Expand Down Expand Up @@ -639,311 +638,6 @@ def forward(self, input):
return self._helper.append_activation(pre_activation, act=self._act)


class BatchNorm(layers.Layer):
r"""

This interface is used to construct a callable object of the ``BatchNorm`` class.
For more details, refer to code examples.
It implements the function of the Batch Normalization Layer and can be used
as a normalizer function for conv2d and fully connected operations.
The data is normalized by the mean and variance of the channel based on the current batch data.
Refer to `Batch Normalization: Accelerating Deep Network Training by Reducing
Internal Covariate Shift <https://arxiv.org/pdf/1502.03167.pdf>`_
for more details.

When use_global_stats = False, the :math:`\mu_{\beta}`
and :math:`\sigma_{\beta}^{2}` are the statistics of one mini-batch.
Calculated as follows:

.. math::

\mu_{\beta} &\gets \frac{1}{m} \sum_{i=1}^{m} x_i \qquad &
//\ mini-batch\ mean \\
\sigma_{\beta}^{2} &\gets \frac{1}{m} \sum_{i=1}^{m}(x_i - \mu_{\beta})^2 \qquad &
//\ mini-batch\ variance \\

- :math:`x` : mini-batch data
- :math:`m` : the size of the mini-batch data

When use_global_stats = True, the :math:`\\mu_{\\beta}`
and :math:`\\sigma_{\\beta}^{2}` are not the statistics of one mini-batch.
They are global or running statistics (moving_mean and moving_variance). It usually got from the
pre-trained model. Calculated as follows:

.. math::
moving\_mean = moving\_mean * momentum + \mu_{\beta} * (1. - momentum) \quad &// global mean \\
moving\_variance = moving\_variance * momentum + \sigma_{\beta}^{2} * (1. - momentum) \quad &// global variance \\

The normalization function formula is as follows:

.. math::

\hat{x_i} &\gets \frac{x_i - \mu_\beta} {\sqrt{\
\sigma_{\beta}^{2} + \epsilon}} \qquad &//\ normalize \\
y_i &\gets \gamma \hat{x_i} + \beta \qquad &//\ scale\ and\ shift


- :math:`\epsilon` : add a smaller value to the variance to prevent division by zero
- :math:`\gamma` : trainable proportional parameter
- :math:`\beta` : trainable deviation parameter

Parameters:
num_channels(int): Indicate the number of channels of the input ``Tensor``.
act(str, optional): Activation to be applied to the output of batch normalization. Default: None.
is_test (bool, optional): A flag indicating whether it is in test phrase or not.
This flag only has effect on static graph mode. For dygraph mode, please use ``eval()``.
Default: False.
momentum(float, optional): The value used for the moving_mean and moving_var computation. Default: 0.9.
epsilon(float, optional): The small value added to the variance to prevent division by zero. Default: 1e-5.
param_attr(ParamAttr, optional): The parameter attribute for Parameter `scale`
of batch_norm. If it is set to None or one attribute of ParamAttr, batch_norm
will create ParamAttr as param_attr. If the Initializer of the param_attr
is not set, the parameter is initialized with Xavier. Default: None.
bias_attr(ParamAttr, optional): The parameter attribute for the bias of batch_norm.
If it is set to None or one attribute of ParamAttr, batch_norm
will create ParamAttr as bias_attr. If the Initializer of the bias_attr
is not set, the bias is initialized zero. Default: None.
dtype(str, optional): Indicate the data type of the input ``Tensor``,
which can be float32 or float64. Default: float32.
data_layout(str, optional): Specify the input data format, the data format can be "NCHW" or "NHWC". Default: NCHW.
in_place(bool, optional): Make the input and output of batch norm reuse memory. Default: False.
moving_mean_name(str, optional): The name of moving_mean which store the global Mean. Default: None.
moving_variance_name(str, optional): The name of the moving_variance which store the global Variance. Default: None.
do_model_average_for_mean_and_var(bool, optional): Whether parameter mean and variance should do model
average when model average is enabled. Default: True.
use_global_stats(bool, optional): Whether to use global mean and
variance. In inference or test mode, set use_global_stats to true
or is_test to true, and the behavior is equivalent.
In train mode, when setting use_global_stats True, the global mean
and variance are also used during train period. Default: False.
trainable_statistics(bool, optional): Whether to calculate mean and var in eval mode. In eval mode, when
setting trainable_statistics True, mean and variance will be calculated by current batch statistics.
Default: False.

Returns:
None

Examples:
.. code-block:: python

import paddle.fluid as fluid
from paddle.fluid.dygraph.base import to_variable
import numpy as np

x = np.random.random(size=(3, 10, 3, 7)).astype('float32')
with fluid.dygraph.guard():
x = to_variable(x)
batch_norm = fluid.BatchNorm(10)
hidden1 = batch_norm(x)
"""

def __init__(
self,
num_channels,
act=None,
is_test=False,
momentum=0.9,
epsilon=1e-05,
param_attr=None,
bias_attr=None,
dtype='float32',
data_layout='NCHW',
in_place=False,
moving_mean_name=None,
moving_variance_name=None,
do_model_average_for_mean_and_var=True,
use_global_stats=False,
trainable_statistics=False,
):
super().__init__()
self._param_attr = param_attr
self._bias_attr = bias_attr
self._act = act
self._use_mkldnn = _global_flags()["FLAGS_use_mkldnn"]

assert (
bias_attr is not False
), "bias_attr should not be False in batch_norm."

if dtype == "float16":
self._dtype = "float32"
else:
self._dtype = dtype

param_shape = [num_channels]

# create parameter
self.weight = self.create_parameter(
attr=self._param_attr,
shape=param_shape,
dtype=self._dtype,
default_initializer=Constant(1.0),
)
self.weight.stop_gradient = (
use_global_stats and self._param_attr.learning_rate == 0.0
)

self.bias = self.create_parameter(
attr=self._bias_attr,
shape=param_shape,
dtype=self._dtype,
is_bias=True,
)
self.bias.stop_gradient = (
use_global_stats and self._param_attr.learning_rate == 0.0
)

self._mean = self.create_parameter(
attr=ParamAttr(
name=moving_mean_name,
initializer=Constant(0.0),
trainable=False,
do_model_average=do_model_average_for_mean_and_var,
),
shape=param_shape,
dtype=self._dtype,
)
self._mean.stop_gradient = True

self._variance = self.create_parameter(
attr=ParamAttr(
name=moving_variance_name,
initializer=Constant(1.0),
trainable=False,
do_model_average=do_model_average_for_mean_and_var,
),
shape=param_shape,
dtype=self._dtype,
)
self._variance.stop_gradient = True

self._in_place = in_place
self._data_layout = data_layout
self._momentum = momentum
self._epsilon = epsilon
self._is_test = is_test
self._fuse_with_relu = False
self._use_global_stats = use_global_stats
self._trainable_statistics = trainable_statistics

def forward(self, input):
# create output
# mean and mean_out share the same memory
mean_out = self._mean
# variance and variance out share the same memory
variance_out = self._variance

if _non_static_mode():
if in_dygraph_mode():
batch_norm_out, t1, t2, t3, t4, _ = _C_ops.batch_norm(
input,
self._mean,
self._variance,
self.weight,
self.bias,
not self.training,
self._momentum,
self._epsilon,
self._data_layout,
self._use_global_stats,
self._trainable_statistics,
)
return dygraph_utils._append_activation_in_dygraph(
batch_norm_out, act=self._act, use_mkldnn=self._use_mkldnn
)

elif _in_legacy_dygraph():
attrs = (
"momentum",
self._momentum,
"epsilon",
self._epsilon,
"is_test",
not self.training,
"data_layout",
self._data_layout,
"use_mkldnn",
self._use_mkldnn,
"fuse_with_relu",
self._fuse_with_relu,
"use_global_stats",
self._use_global_stats,
'trainable_statistics',
self._trainable_statistics,
)
batch_norm_out, _, _, _, _, _ = _legacy_C_ops.batch_norm(
input,
self.weight,
self.bias,
self._mean,
self._variance,
None,
mean_out,
variance_out,
*attrs
)

return dygraph_utils._append_activation_in_dygraph(
batch_norm_out, act=self._act, use_mkldnn=self._use_mkldnn
)

check_variable_and_dtype(
input, 'input', ['float16', 'float32', 'float64'], 'BatchNorm'
)

attrs = {
"momentum": self._momentum,
"epsilon": self._epsilon,
"is_test": self._is_test,
"data_layout": self._data_layout,
"use_mkldnn": False,
"fuse_with_relu": self._fuse_with_relu,
"use_global_stats": self._use_global_stats,
"trainable_statistics": self._trainable_statistics,
}

inputs = {
"X": [input],
"Scale": [self.weight],
"Bias": [self.bias],
"Mean": [self._mean],
"Variance": [self._variance],
}

saved_mean = self._helper.create_variable_for_type_inference(
dtype=self._dtype, stop_gradient=True
)
saved_variance = self._helper.create_variable_for_type_inference(
dtype=self._dtype, stop_gradient=True
)
reserve_space = self._helper.create_variable_for_type_inference(
dtype=self._helper.input_dtype(input), stop_gradient=True
)

batch_norm_out = (
input
if self._in_place
else self._helper.create_variable_for_type_inference(self._dtype)
)

outputs = {
"Y": [batch_norm_out],
"MeanOut": [mean_out],
"VarianceOut": [variance_out],
"SavedMean": [saved_mean],
"SavedVariance": [saved_variance],
}
if reserve_space is not None:
outputs["ReserveSpace"] = [reserve_space]

self._helper.append_op(
type="batch_norm", inputs=inputs, outputs=outputs, attrs=attrs
)

# Currently, we don't support inplace in dygraph mode
return self._helper.append_activation(batch_norm_out, self._act)


class Embedding(layers.Layer):
r"""
:alias_main: paddle.nn.Embedding
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -708,7 +708,7 @@ def test_skip_BatchNorm_Layer_norm(self):
for param in model.parameters():
self.assertEqual((param.dtype == paddle.float32), True)

model = paddle.nn.BatchNorm(1)
model = paddle.nn.layer.norm.BatchNorm(1)
model = paddle.amp.decorate(models=model, level='O2')
for param in model.parameters():
self.assertEqual((param.dtype == paddle.float32), True)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -707,7 +707,7 @@ def test_skip_BatchNorm_Layer_norm(self):
for param in model.parameters():
self.assertEqual((param.dtype == paddle.float32), True)

model = paddle.nn.BatchNorm(1)
model = paddle.nn.layer.norm.BatchNorm(1)
model = paddle.amp.decorate(models=model, level='O2')
for param in model.parameters():
self.assertEqual((param.dtype == paddle.float32), True)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@

import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph.nn import BatchNorm
from paddle.fluid.param_attr import ParamAttr
from paddle.fluid.regularizer import L2Decay
from paddle.nn.layer.norm import BatchNorm


class ConvBNLayer(fluid.dygraph.Layer):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,16 @@
import numpy as np
from PIL import Image, ImageOps

import paddle.fluid as fluid

# Use GPU:0 to elimate the influence of other tasks.
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph import to_variable
from paddle.fluid.dygraph.nn import BatchNorm
from paddle.jit import ProgramTranslator
from paddle.jit.api import declarative
from paddle.nn.layer.norm import BatchNorm

# Note: Set True to eliminate randomness.
# 1. For one operation, cuDNN has several algorithms,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,12 @@
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph.io import INFER_MODEL_SUFFIX, INFER_PARAMS_SUFFIX
from paddle.fluid.dygraph.nn import BatchNorm
from paddle.fluid.initializer import MSRA
from paddle.fluid.param_attr import ParamAttr
from paddle.jit import ProgramTranslator
from paddle.jit.api import declarative
from paddle.nn import Linear
from paddle.nn.layer.norm import BatchNorm

# Note: Set True to eliminate randomness.
# 1. For one operation, cuDNN has several algorithms,
Expand Down
Loading