[AutoParallel] Adapt static spmd rules for dynamic graph #56367

chenwhql · 2023-08-16T12:21:34Z

PR types

New features

PR changes

Others

Description

Pcard-73145

[AutoParallel] Adapt static infer spmd

本PR尝试在动半适配现有的切分推导规则，需要对现有的设计做一些局部改动，设计说明如下

现有切分推导规则基类核心函数如下：

class SPMDRuleBase {
 public:
  virtual ~SPMDRuleBase() {}

  InferForward(const std::vector<DistTensorSpec>& input_specs,
               const paddle::framework::AttributeMap& attrs);

  virtual std::pair<std::vector<TensorDistAttr>, std::vector<TensorDistAttr>>
  InferBackward(const std::vector<DistTensorSpec>& output_specs,
                const paddle::framework::AttributeMap& attrs);
  ...
};

由于动态图对执行时调度性能有较高的要求，且目前整体架构采用特异化生成接口的设计，因此切分推导函数本身需要采用变参（variadic）的形式实现，类似InferMeta和phi Kernel的形式，例如

using SpmdInfo =
    std::pair<std::vector<TensorDistAttr>, std::vector<TensorDistAttr>>;

SpmdInfo MatmulSpmdInferForward(const DistMetaTensor& x,
                         const DistMetaTensor& y,
                         bool trans_x,
                         bool trans_y);

由于需要兼顾动静一体的需求，我们需要能够将变参的SPMD推导函数归一化为统一的函数形式，以供静半统一调度，因此使用模板元编程实现相关归一化的宏，例如

using InferSpmdFn = SpmdInfo (*)(const InferSpmdContext&);

#define PD_INFER_SPMD(...)                                    \
  ::phi::distributed::InferSpmdFnImpl<decltype(&__VA_ARGS__), \
                                      &__VA_ARGS__>::Call

// PD_INFER_SPMD(MatmulInferSpmd) 即可将前述函数转换为InferSpmdFn的形式，具体归一之后的函数形式还需要商定

由于动半关于切分推导和转换的核心逻辑需要在phi中调用，且phi不能依赖fluid，因此切分推导规则及其核心数据结构需要迁移至phi中，目前在本PR中给出的迁移方式如下，具体迁移方式可以再讨论

核心数据结构迁移至 phi/core/distributed/auto_parallel 中
具体算子SPMD推导迁移至 phi/infermeta/spmd_rules 中，算子特异化的实现原则上不能放在core目录下，且spmd属于tensor meta信息的一种，放到infermeta下也合理

Spmd推导函数的返回值暂时仍使用 std::pair<std::vector<TensorDistAttr>, std::vector<TensorDistAttr>> ，与原先设计保持一致，但考虑到动态图对调度性能的要求，这大概率不是最终状态，STL容器构造析构对API调度性能有较大的影响，最后有可能还是需要设置到DistTensor的dist_attr_成员上，看后期调度性能的影响再决定
Spmd推导函数的输入参数采用了InferSpmdContext进行归一化，考虑如下：

目前看const std::vector<DistTensorSpec>& input_specs, const paddle::framework::AttributeMap& attrs能够满足需求，但如果后面出现Tensor和vector<Tensor>并存的输入参数，可能需要进一步引入range进行区分，用context归一化可以适配将来可能的变化，而不需要将来去统一改变函数签名
Context中可以根据需求灵活换用高效的容器实现，而不影响推导函数签名

Spmd推导函数的输入Tensor要额外构建一个容器去存放，换用small_vector，相比std::vector可节省一些堆上构造析构开销
Spmd推导函数的输入属性需要使用vector结构，无法使用map结构，因为在动态图执行流程中传入的时候，属性没有name，用vector结构也可以适配静态图的map输入，有必要的话此处可以复用phi之前建设实现的大量的arg mapping函数
基于CodeStyle中的命名约定，命名风格上采用Spmd而不是SPMD
原先SPMD rules迁移而不是拷贝，仅保留一份代码；utils函数由于多处使用，先拷贝一份，后续可以迁移完成后移除原先的实现，暂时不在本PR中全局替换
原先python端单测的形式先尽可能保持不变，因此在pybind层通过参数处理以匹配不同的参数形式，后续视静半的使用需求再调整

paddle-bot · 2023-08-16T12:21:39Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… ap/adapt_infer_spmd

JZ-LIANG · 2023-08-28T02:58:37Z

paddle/fluid/pybind/auto_parallel_py.cc

+           })
+      .def("infer_backward",
+           [](const phi::distributed::SpmdRule &self,
+              const std::vector<DistTensorSpec> &input_specs,


infer_backward need the info of input tensors and output tensors for inference, please ref to new api:
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/distributed/auto_parallel/spmd_rules/common.h#L62

done, change pybind infer_backward api to this format

JZ-LIANG · 2023-08-28T03:04:22Z

paddle/fluid/pybind/auto_parallel_py.cc

@@ -340,6 +343,44 @@ void BindAutoParallel(py::module *m) {
      .def("infer_forward", &SPMDRuleBase::InferForward)
      .def("infer_backward", &SPMDRuleBase::InferBackward);

+  py::class_<phi::distributed::SpmdRule>(*m, "SpmdRule")
+      .def("infer_forward",
+           [](const phi::distributed::SpmdRule &self,


DistTensorSpec seen to be redundant now, would it be better that expose the InferSpmdContext and MetaTensor API into python and static mode build the input ctx directly ?

Yes, this can be determined according to the needs of semi-static. This PR try not to change the original test framework as much as possible.

JZ-LIANG · 2023-08-28T03:16:31Z

test/cpp/auto_parallel/spmd_rule_test.cc

-  y_dist_tensor_spec.set_dims_mapping({-1, 0});
-  infered_dist_attrs = matmul_rule->InferForward(
-      {x_dist_tensor_spec, y_dist_tensor_spec}, attrs);
+  x_dist_attr.set_dims_mapping({-1, -1});


would it be better that provide an API which build MetaTensor for "shape" and "distattr" directly?
or build inferspmdcontext from "shape" and "distattr" and attribute directly

Not recommended, MetaTensor is a thin encapsulation and does not hold the object life cycle. If such a constructor is required, I tend to inherit a DistMetaTensor to do so

JZ-LIANG · 2023-08-28T08:16:40Z

paddle/phi/core/distributed/auto_parallel/inferspmd_utils.h

+  }
+
+  // TODO(chenweihang): support other attr type later by needed
+  PD_SPECIALIZE_InferSpmdFnCallHelper_FOR_ATTRIBUTE(bool);


will this method cover complex attribute type like std::vector<int64_t>, std::vector, std::vector

yes, we can support these types later, now matmul cannot cover these types

paddle/phi/infermeta/CMakeLists.txt

… ap/adapt_infer_spmd

JZ-LIANG · 2023-08-29T07:32:06Z

paddle/phi/core/distributed/auto_parallel/inferspmd_utils.h

+    static SpmdInfo Call(const InferSpmdContext& ctx, PreviousArgs&... pargs) {
+      static_assert(attr_idx == 0,
+                    "InferSpmd's Input should appear before Attributes.");
+      const DistMetaTensor& arg = ctx.InputAt(in_idx);


should ctx maintains input_tensor_list and output_tensor_list separately ?

in case of variadic input/output Op, it maybe a problem:

variadic input and single output op（concat，addn）：could be adapted by assume the last tensor is output

single input and variadic output op（split，unstack）：could be adapted by assume the first tensor is input

variadic input and variadic output op （not yet，but future ？）：could not be adapted

No need to distinguish the list of input and output here.

For the case of vector input and output, our rule function will also faithfully reflect its type at that time, so there is no need for additional merging.

For example:

concat op

SpmdInfo ConcatSpmdInferForward(const std::vector<const DistMetaTensor*>& x, const DistMetaTensor& out, const Scalar& axis_scalar);

split op

SpmdInfo SplitSpmdInferBackward(const DistMetaTensor& x, const std::vector<const DistMetaTensor*>& out, const IntArray& sections, const Scalar& axis);

JZ-LIANG · 2023-08-29T07:41:00Z

paddle/phi/infermeta/spmd_rules/matmul.cc

-  auto out_shape = output_specs[0].shape();
+SpmdInfo MatmulSpmdInferBackward(const DistMetaTensor& x,
+                                 const DistMetaTensor& y,
+                                 const DistMetaTensor& out,


for variadic op like split and concat， should use vector for the variadic slot ？

Phi api for concat：
PADDLE_API Tensor concat(const std::vector& x, const Scalar& axis)

spmd for concat:
SpmdInfo ConcatSpmdInferBackward(const std::vector& x, const DistMetaTensor& out, const Scalar& axis)

AND for ReplicatedSpmd Rule which is a bottom line rule for all Ops that have not specific rule:
SpmdInfo ReplicatedSpmdInferBackward(const std::vector& x, const std::vector&out)

same as above

for ReplicatedSpmd Rule, we can use the general format

SpmdInfo ReplicatedSpmdInferBackward( const std::vector<const DistMetaTensor*>& x, const std::vector<const DistMetaTensor*>& out, const std::vector<phi::Attribtue>& attrs)

we also can unify its format into SpmdInfo (*)(const InferSpmdContext&)

zyfncg · 2023-08-30T02:32:01Z

test/auto_parallel/spmd_rules/test_matmul_rule.py

+        # After replaced all spmd rules by phi impl, we can recover the
+        # api name to `get_spmd_rule`
+        self.rule = core.get_phi_spmd_rule("matmul")


这里的注释可以加到pybind接口上

thx, 下一个PR调整

JZ-LIANG

LGTM

LiYuRio

LGTM

XieYunshen

LGTM

…e#56367) * move matmul spmd rules into phi * add basic infer spmd utils * addspmd factory * fix compile error * add unittest * refine infer spmd test and utils * debug infer spmd test * adapt python test * poish details * change to vector attr arg * revert needless change * update matmul spmd rule test * remove original rule * polish details * fix marco error * add comment * pass backward test * fix compile error * add cmake rule for spmd_rules_test * add dist meta tensor * update pybind impl * add marco for rules

chenwhql added 5 commits August 15, 2023 07:04

move matmul spmd rules into phi

612bb5e

add basic infer spmd utils

1974d2b

addspmd factory

3878288

merge and resolve conflict with develop

63b10fa

fix compile error

2d12ee2

chenwhql added 5 commits August 21, 2023 07:09

add unittest

3d44d8d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7bb8a18

… ap/adapt_infer_spmd

refine infer spmd test and utils

2166624

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7480ecc

… ap/adapt_infer_spmd

debug infer spmd test

51cd20e

chenwhql changed the title ~~[AutoParallel] Adapt static infer spmd~~ [AutoParallel] Adapt static infer spmd demo Aug 22, 2023

chenwhql added 9 commits August 23, 2023 02:20

adapt python test

babba63

poish details

1b1b490

change to vector attr arg

be3249d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4051b5e

… ap/adapt_infer_spmd

revert needless change

e105c94

update matmul spmd rule test

12b5648

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

651f974

… ap/adapt_infer_spmd

remove original rule

d7fcfe5

polish details

cf08fe5

chenwhql changed the title ~~[AutoParallel] Adapt static infer spmd demo~~ [AutoParallel] Adapt static spmd rules for dynamic graph Aug 25, 2023

chenwhql added 4 commits August 28, 2023 02:02

fix marco error

8da7b0c

resolve conflict with develop

9226857

add comment

006b67b

pass backward test

25a1557

JZ-LIANG reviewed Aug 28, 2023

View reviewed changes

chenwhql added 3 commits August 28, 2023 09:28

resolve conflict with develop

2072094

fix compile error

5d55545

resolve conflict with develop

b262c03

LiYuRio reviewed Aug 28, 2023

View reviewed changes

paddle/phi/infermeta/CMakeLists.txt Show resolved Hide resolved

chenwhql added 4 commits August 29, 2023 02:04

add cmake rule for spmd_rules_test

0603212

add dist meta tensor

5259b14

update pybind impl

eafd092

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e1cd297

… ap/adapt_infer_spmd

JZ-LIANG reviewed Aug 29, 2023

View reviewed changes

add marco for rules

d44e43a

zyfncg approved these changes Aug 30, 2023

View reviewed changes

JZ-LIANG approved these changes Aug 30, 2023

View reviewed changes

LiYuRio approved these changes Aug 30, 2023

View reviewed changes

XieYunshen approved these changes Aug 30, 2023

View reviewed changes

luotao1 approved these changes Aug 30, 2023

View reviewed changes

raindrops2sea approved these changes Aug 30, 2023

View reviewed changes

chenwhql merged commit 54fcd9a into PaddlePaddle:develop Aug 31, 2023
25 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoParallel] Adapt static spmd rules for dynamic graph #56367

[AutoParallel] Adapt static spmd rules for dynamic graph #56367

chenwhql commented Aug 16, 2023 •

edited

paddle-bot bot commented Aug 16, 2023

JZ-LIANG Aug 28, 2023

chenwhql Aug 28, 2023

JZ-LIANG Aug 28, 2023

chenwhql Aug 28, 2023

JZ-LIANG Aug 28, 2023

chenwhql Aug 28, 2023 •

edited

chenwhql Aug 29, 2023

JZ-LIANG Aug 28, 2023

chenwhql Aug 28, 2023

JZ-LIANG Aug 29, 2023

chenwhql Aug 29, 2023

JZ-LIANG Aug 29, 2023

chenwhql Aug 29, 2023 •

edited

zyfncg Aug 30, 2023

chenwhql Aug 30, 2023

JZ-LIANG left a comment

LiYuRio left a comment

XieYunshen left a comment

[AutoParallel] Adapt static spmd rules for dynamic graph #56367

[AutoParallel] Adapt static spmd rules for dynamic graph #56367

Conversation

chenwhql commented Aug 16, 2023 • edited

PR types

PR changes

Description

paddle-bot bot commented Aug 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql Aug 28, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql Aug 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JZ-LIANG left a comment

Choose a reason for hiding this comment

LiYuRio left a comment

Choose a reason for hiding this comment

XieYunshen left a comment

Choose a reason for hiding this comment

chenwhql commented Aug 16, 2023 •

edited

chenwhql Aug 28, 2023 •

edited

chenwhql Aug 29, 2023 •

edited