Add layer normalization operator #7789

chengduoZH · 2018-01-23T09:22:13Z

fix #7174
The process of writing this Op is a bit complicated and encountered some rare problems, I write those problems here and hope that others can learn some things from these problems.
layer normalization is a new op, different from batch normalization, it normalizes for a sample, but not for features of min_batch. So I write this op but not reuse batch normalization. This paper is an introduction to it.
The problem is in the process of computing the gradient. When the dy is the same value, according to the formula, the dx should be zero or nearly zero(about 1e-10), but the numerical gradient is not that. The dx of the numerical gradient is a big data(about 1e-3). So I think the dx of the numerical gradient is wrong.

Because of above reasons, I assign dy to random data and compute the result of dx by Python. And I compare the result of dx of by Python and that of by C++. The equation of comparison is the same of op_test's.

guoshengCS · 2018-01-30T02:44:23Z

paddle/operators/layer_norm_op.cc

+
+    auto input_map = ConstEigenMatrixMapRowMajor<T>(x->data<T>(), left, right);
+    auto scale_map = ConstEigenMatrixMapRowMajor<T>(scale->data<T>(), left, 1);
+    auto bias_map = ConstEigenMatrixMapRowMajor<T>(bias->data<T>(), left, 1);


Should the size of scale and bias be right rather than left. If left, the size of scale might be coupled with batch size.

You are right, thanks!

lcy-seso

Thank you for this work.

lcy-seso · 2018-01-30T06:05:18Z

paddle/operators/layer_norm_op.cc

+    PADDLE_ENFORCE(ctx->HasInput("X"), "");
+    PADDLE_ENFORCE(ctx->HasInput("Scale"), "");
+    PADDLE_ENFORCE(ctx->HasInput("Bias"), "");
+    PADDLE_ENFORCE(ctx->HasOutput("Y"), "");


Line 36 ~ 39, would you please complete the comments before merging.

lcy-seso · 2018-01-30T06:08:06Z

paddle/operators/layer_norm_op.cc

+    AddInput("X", "The input tensor");
+    AddInput("Scale",
+             "Scale is a 1-dimensional tensor of size H "
+             "that is applied to the output");


At a full stop at the end of the comment. Also for X, Bias and Y.

lcy-seso · 2018-01-30T06:08:51Z

paddle/operators/layer_norm_op.cc

+        });
+    AddAttr<int>("begin_norm_axis",
+                 "(int default:1), the "
+                 "axis of `begin_norm_axis ... Rank(X) - 1` will be normalized")


At a full stop at the end of the comment.

lcy-seso · 2018-01-30T06:32:30Z

paddle/operators/layer_norm_op.cc

+        .AddCustomChecker([](const int &begin_norm_axis) {
+          PADDLE_ENFORCE_GT(begin_norm_axis, 0,
+                            "'begin_norm_axis' should be greater than zero.");
+        });


here may need two attributes to let the user decide whether to apply the scale and bias to normed output.

I don't think it is necessary, we can mark Bias and Scale with AsDispensable.
If the inputs don't include ``BiasorScale`, the program will ignore it.

It's OK. I mean just mark Bias and Scale as internal variables for internal implementation, because for most users they do not care about these. Admittedly, this is not very important.

because for most users they do not care about these.

You are right. The interface of layer_norm in Python can use two arguments to make user decide whether to apply the scale or bias to normed output.

lcy-seso · 2018-01-30T06:33:40Z

paddle/operators/layer_norm_op.cc

+    PADDLE_ENFORCE(ctx->HasInput("Scale"), "");
+    PADDLE_ENFORCE(ctx->HasInput("Mean"), "");
+    PADDLE_ENFORCE(ctx->HasInput("Variance"), "");
+    PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Y")), "");


would you please complete the comment before merging.

lcy-seso · 2018-01-30T06:37:35Z

paddle/operators/layer_norm_op.cc

+    AddInput("Bias",
+             "Bias is a 1-dimensional tensor of size H "
+             "that is applied to the output");
+    AddOutput("Y", "result after normalization");


At a full stop at the end of the comment. Also for X, Bias and Y.

lcy-seso · 2018-01-30T06:44:11Z

paddle/operators/layer_norm_op.cc

+             "that is applied to the output");
+    AddOutput("Y", "result after normalization");
+    AddOutput("Mean", "Mean of the current mini batch.");
+    AddOutput("Variance", "Variance of the current mini batch.");


mark mean and variance with .AsIntermediate();. User will not use it in layer norm.

lcy-seso · 2018-01-30T06:47:53Z

python/paddle/v2/fluid/tests/test_layer_norm_op.py

+    d_mean_0 = np.sum(-np.sqrt(1.0 / var) * grad_y, axis=1).reshape([N, 1])
+    # d_mean_1 = np.sum(-1.0 / var * (x - mean) * grad_y, axis=1).reshape(
+    #     [N, 1]) * (-1.0 / D * np.sqrt(1.0 / var) *
+    #                np.sum(x - mean, axis=1).reshape([N, 1])).reshape([N, 1])


Are line 67 ~ 69 useless? If so, they should be removed.

lcy-seso

LGTM. We can merge this implementation first.

lcy-seso

LGTM

chengduoZH force-pushed the feature/layer_norm branch 5 times, most recently from 0bbc2c7 to 2b9ac13 Compare January 24, 2018 05:33

add layer_norm

ca01771

chengduoZH force-pushed the feature/layer_norm branch 5 times, most recently from f35cae2 to 681a95a Compare January 25, 2018 16:53

chengduoZH changed the title ~~[WIP] Add layer normalization operator~~ Add layer normalization operator Jan 25, 2018

fix unit test

ae0ea54

chengduoZH force-pushed the feature/layer_norm branch from 681a95a to ae0ea54 Compare January 25, 2018 16:54

fix unit test and c++ code

4ce3979

chengduoZH force-pushed the feature/layer_norm branch from 5c0363b to 4ce3979 Compare January 27, 2018 04:44

add begin_norm_axis

0f47703

lcy-seso requested review from guoshengCS and lcy-seso January 29, 2018 11:22

chengduoZH force-pushed the feature/layer_norm branch 3 times, most recently from d6c2df6 to 2ad0642 Compare January 29, 2018 14:59

fix scale and bias dim

87b5559

chengduoZH force-pushed the feature/layer_norm branch from 2ad0642 to 87b5559 Compare January 30, 2018 02:35

guoshengCS reviewed Jan 30, 2018

View reviewed changes

fix scale and bias dim

7e0d21d

lcy-seso reviewed Jan 30, 2018

View reviewed changes

chengduoZH added 2 commits January 30, 2018 19:29

layer norm -> scale + bias

09570b4

follow comments

263e019

lcy-seso previously approved these changes Jan 30, 2018

View reviewed changes

chengduoZH dismissed lcy-seso’s stale review via 471ecf3 January 30, 2018 13:00

chengduoZH force-pushed the feature/layer_norm branch from 471ecf3 to 2e48160 Compare January 30, 2018 13:10

refine unit test

acb9078

chengduoZH force-pushed the feature/layer_norm branch from 2e48160 to acb9078 Compare January 30, 2018 13:14

Add unit test for with_scale and with_bias

e5058ed

lcy-seso approved these changes Jan 31, 2018

View reviewed changes

lcy-seso merged commit e261c79 into PaddlePaddle:develop Jan 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add layer normalization operator #7789

Add layer normalization operator #7789

chengduoZH commented Jan 23, 2018 •

edited

guoshengCS Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso left a comment

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso Jan 30, 2018

chengduoZH Jan 30, 2018

lcy-seso left a comment

lcy-seso left a comment

Add layer normalization operator #7789

Add layer normalization operator #7789

Conversation

chengduoZH commented Jan 23, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

chengduoZH commented Jan 23, 2018 •

edited