[Semi-Auto] Add layer_norm infer_backward rule #56505

pkuzyc · 2023-08-21T09:06:26Z

PR types

Function optimization

PR changes

Others

Description

Pcard-70448

Add infer_backward rule for layer_norm to infer inputs' dims mappings from outputs'.

paddle-bot · 2023-08-21T09:06:31Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-ci-bot · 2023-09-01T03:09:44Z

Sorry to inform you that 17ba34b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

JZ-LIANG · 2023-09-07T09:21:13Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

+            << "dst_dims_mapping: ["
+            << str_join(output_dist_attrs[i].dims_mapping()) << "]";
+  }
+  VLOG(4) << "*********";


remove meaningless log

JZ-LIANG · 2023-09-07T09:23:43Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

-  // begin_norm_axis=2, x=ij, y=kl)
+  // ijk,k,k->ijk,z,z (x,scale,bias->out,mean,variance, begin_norm_axis=2, z=ij)
+  // ijkl,y(kl),y(kl)->ijkl,z(ij),z(ij) (x,scale,bias->out,mean,variance,
+  // begin_norm_axis=2, z=ij, y=kl)


below in line 101: mean_axes = "j"; the notation should be "x" ?
otherwise "j" maybe confuse with the broadcast axis in before begin_norm_axis.

Line 101 is the case when begin_norm_axis<=1. When begin_norm_axis<=1, the first axis can be propagated to mean and var, so mean_axes and var_axes is set to be the same as input's first axis, which is "j".

U are right

JZ-LIANG · 2023-09-07T09:30:20Z

paddle/fluid/distributed/auto_parallel/spmd_rules/layer_norm_spmd_rule.cc

@@ -127,8 +126,8 @@ LayerNormSPMDRule::InferForward(const std::vector<DistTensorSpec>& input_specs,
  for (size_t i = 0; i < out_axes.size(); ++i) {
    if (i < static_cast<size_t>(begin_norm_axis)) {
      out_dims_mapping.push_back(x_dims_mapping[i]);
-      // if ijk,k,k->ijk,x,x (x,scale,bias->out,mean,variance,
-      // begin_norm_axis=2, x=ij), and the dims_mapping of input is (0,1,-1),
+      // if ijk,k,k->ijk,z,z (x,scale,bias->out,mean,variance,


the current rule for LN is problematic:

not all axes before begin_norm_axis could be sharded.

the first axis after begin_norm_axis could sharded in current implementation.

it would be better to refactor the LN rule using TransDim algorithms: the axes mapping is like axes-flatten in TransDim.

but it is ok for now, since most usage of LN is DP.

Done, now only the first axis of the input can be sharded, other axes will be set to replicated.

JZ-LIANG

LGTM

pkuzyc force-pushed the layer_norm_backward branch from 17ba34b to 7fa7b30 Compare September 6, 2023 03:06

JZ-LIANG reviewed Sep 7, 2023

View reviewed changes

pkuzyc force-pushed the layer_norm_backward branch from 7fa7b30 to 117c259 Compare September 10, 2023 12:32

pkuzyc added 4 commits September 11, 2023 12:05

add layer_norm backward rule

b314799

fix the layer_norm spmd rule, only the first axis can be sharded

23de103

bug fix

43c4d4f

fix bug in unit test

5f4c449

pkuzyc force-pushed the layer_norm_backward branch from 117c259 to 5f4c449 Compare September 11, 2023 05:57

JZ-LIANG approved these changes Sep 12, 2023

View reviewed changes

JZ-LIANG merged commit 39da7e7 into PaddlePaddle:develop Sep 12, 2023
26 of 27 checks passed

pkuzyc deleted the layer_norm_backward branch September 27, 2023 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Semi-Auto] Add layer_norm infer_backward rule #56505

[Semi-Auto] Add layer_norm infer_backward rule #56505

pkuzyc commented Aug 21, 2023

paddle-bot bot commented Aug 21, 2023

paddle-ci-bot bot commented Sep 1, 2023

JZ-LIANG Sep 7, 2023

pkuzyc Sep 11, 2023

JZ-LIANG Sep 7, 2023

pkuzyc Sep 11, 2023

JZ-LIANG Sep 12, 2023

JZ-LIANG Sep 7, 2023

JZ-LIANG Sep 7, 2023

pkuzyc Sep 11, 2023

JZ-LIANG left a comment

[Semi-Auto] Add layer_norm infer_backward rule #56505

[Semi-Auto] Add layer_norm infer_backward rule #56505

Conversation

pkuzyc commented Aug 21, 2023

PR types

PR changes

Description

paddle-bot bot commented Aug 21, 2023

paddle-ci-bot bot commented Sep 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG left a comment

Choose a reason for hiding this comment