【Complex op】add complex support for index_select and index_sample #56457

ScottWong98 · 2023-08-18T17:32:58Z

PR types

New features

PR changes

OPs

Description

add complex support for index_select and index_sample op

paddle-bot · 2023-08-18T17:33:02Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…support_for_index_select

GGBond8488 · 2023-08-21T12:16:10Z

paddle/phi/kernels/cpu/index_sample_grad_kernel.cc

+                   int64_t,
+                   phi::dtype::complex<float>,
+                   phi::dtype::complex<double>) {
+  kernel->InputAt(1).SetDataType(phi::DataType::INT64);


在这里改不太合适，可以试试在data_transform.cc 的NeedTransformDataType 中去掉对complex64和complex128的判断

GGBond8488 · 2023-08-21T12:16:29Z

paddle/fluid/eager/grad_node_info.cc

@@ -554,6 +554,7 @@ void GradNodeBase::HandleComplexGradToRealGrad(
  for (size_t slot_id = 0; slot_id < out_grads->size(); slot_id++) {
    const std::vector<paddle::Tensor>& slot_out_grads = (*out_grads)[slot_id];
    for (size_t rank_id = 0; rank_id < slot_out_grads.size(); rank_id++) {
+      if (bwd_out_meta_[slot_id].size() == 0) continue;


这里增加判断的理由是什么呢

paddle/fluid/eager/api/generated/eager_generated/backwards/nodes.h 中生成了 IndexSelectGradNode 类（集成自 egr::GradNodeBase）

paddle/fluid/eager/api/generated/eager_generated/forwards/dygraph_functions.cc 中的 index_select_ad_func 函数里对 IndexSelectGradNode 类进行了初始化
grad_node = std::shared_ptr<IndexSelectGradNode>(new IndexSelectGradNode(1, 2));
其中 2 是根据 ops.yaml 中 index_select 里 args 输入参数的数据类型确定的，即数据类型为 Tensor 的个数。

paddle/fluid/eager/api/generated/eager_generated/backwards/nodes.cc 中的 IndexSelectGradNode::operator() 方法最后有：
if(NeedComplexToRealConversion()) HandleComplexGradToRealGrad(&returns);

paddle/fluid/eager/grad_node_info.cc 中的 GradNodeBase::HandleComplexGradToRealGrad 函数里的 const GradSlotMeta& slot_meta = bwd_out_meta_[slot_id][rank_id]; 会报错
原因在于 bwd_out_meta_ 初始化时的 size 为 2，但当 bwd_out_meta_[1] 所维持的 vector size 为 0，因此此处在执行 bwd_out_meta_[1][0] 时会造成 Segmentation faults.

所以在此处加了特判。

那这里是不是不是根本问题？正常情况下，这里如果有out_grads, 那么对应的out_grad_meta应该也是记录好的才对，应该要看看为什么这里的out_grad_meta没有值

paddle-ci-bot · 2023-08-27T03:08:05Z

Sorry to inform you that 8a00648's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

ScottWong98 · 2023-08-28T17:36:46Z

Q: 为什么输入 x 的数据类型为 complex 的情况下，`index` 的数据类型会被转化成 complex 类型?

api.cc 中会根据输入的 index 和当前 Kernel 准备 input_index

auto input_index = PrepareData(index, GetKernelInputArgDef(kernel.InputAt(1), kernel_backend), {}, kernel_result.is_stride_kernel);

在data_transform.cc 中的 PrepareData 方法里，由于传入的 transform_flag 为 {}, 导致需要进入 TransformData 方法. 在 TransformData 方法中，判断数据类型转化的方法是 NeedTransformDataType:

inline bool NeedTransformDataType(const DataType& input,
                                  const DataType& target,
                                  const TransformFlag& transform_flag) {
  return input != target &&
         (transform_flag.need_trans_data_type() ||
          target == DataType::COMPLEX64 || target == DataType::COMPLEX128);
}

因此，对于非 complex 类型的 Kernel，一般不会做数据类型转化。但是对于 complex 类型的 Kernel，需要将输入数据转化成 Kernel 的数据类型，即 complex。这就导致了 index_select 和 index_sample api 中的 index 参数在输入 x 为 complex 类型时也会转化成 complex 类型。

ScottWong98 · 2023-08-28T17:58:13Z

`index` 数据类型转化的解决方法

方法1

inline bool NeedTransformDataType(const DataType& input,
                                  const DataType& target,
                                  const TransformFlag& transform_flag) {
  return input != target &&
         (transform_flag.need_trans_data_type() ||
          target == DataType::COMPLEX64 || target == DataType::COMPLEX128);
}

如果在 data_transform.cc 中 NeedTransformDataType 里去掉对 complex 类型的判断，会影响到其他算子，比如 elementwise_add_op 中的两个输入 x 和 y，当 x 为 complex 类型，y 为 float 类型时，就需要将 y 转化为 complex 类型。

因此该方法不太适合

方法2

观察到 PrepareData 方法中可以传入 TransformFlag，而在 TransformFlag 中定义了 stop_transform_ 变量，该变量默认为 false.

private:
  // This is the highest priority in flags,
  // and can be setted by api[data_transform->skip_transform] in the yaml file.
  bool stop_transform_ = false;

由注释可知，在 yaml 文件中通过配置 skip_transform 即可阻止某个输入参数做类型转化

观察 ops.yaml 中，已使用该方法的 op 有 linear_interp, bilinear_interp 等，形式为:

- op : bilinear_interp
  args : (Tensor x, Tensor out_size, Tensor[] size_tensor, Tensor scale_tensor, str data_layout="NCHW", ...)
  ...
  data_transform :
    skip_transform : out_size, size_tensor, scale_tensor

对应 PrepareData (api.cc) 中的 transform_flag 中也都相应地变成了 {true} (即 stop_transform_=true)，并保证不在输入 x 的 PrepareData 方法中添加该 flag

auto input_x = PrepareData(x, GetKernelInputArgDef(kernel.InputAt(0), kernel_backend), {}, kernel_result.is_stride_kernel);
auto input_out_size = PrepareData(out_size, GetKernelInputArgDef(kernel.InputAt(1), kernel_backend), {true}, kernel_result.is_stride_kernel);

但是在 ops.yaml 和 backward.yaml 文件中对 index_select 和 index_sample 加入类似的修改后:

- op : index_select
  args : (Tensor x, Tensor index, int axis = 0)
  output : Tensor(out)
  ...
  data_transform :
    skip_transform : index

发现在生成的 api.cc 文件中，输入 x 的 PrepareData 方法里也加入了 {true} 的 transform_flag，即 x 的 stop_transform_ flag 也为 true.

在分析了 api.cc 的生成文件 (api_gen.py) 后，发现问题出现在 api_base.py 中的 gene_trans_flag 方法里:

def gene_trans_flag(self, input_name):
    trans_flag = "{}"
    if input_name in self.data_transform['skip_transform']:
        trans_flag = "{true}"
    elif input_name in self.data_transform['support_trans_dtype']:
        trans_flag = "{false, true}"
    return trans_flag

其中 self.data_transform['skip_transform'] 和 self.data_transform['support_trans_dtype'] 的值均为字符串，即 xxx,xxx,xxx 或 xxx。这就导致当 input_name 为 "x" 时，trans_flag 也会为 "{true}"

因此问题出现在 skip_transform 的解析上，进一步分析发现问题是出在了 api_base.py 中的 parse_data_transform 方法里，其并没有将对应的字符串解析成 list。借助 paddle/fluid/operators/generator/parse_utils.py 中的 parse_plain_list 方法做了相应的解析。

最后，在生成的 api.cc 文件中，对 index 参数的获取添加了 stop_transform_=true 的 flag，保证了 index 的数据类型不会被转成 complex 类型

ScottWong98 · 2023-08-28T18:27:30Z

反向传播问题

index_select 和 index_sample 的输入均是两个 Tensor (x 和 index)，但是它们对应的 grad 的输出只有一个 (x_grad).

在生成的 index_select_ad_func 方法 (dygraph_function.cc) 里, 会创建 IndexSelectGradNode (grad_node)，其初始化时，传入的 bwd_out_slot_num 为 2。

这是因为在 eager_gen.py 里，bwd_out_slot_num 是由前向算子 (yaml 文件里）参数列表中 Tensor 的个数决定的，而 index_select 中 x 和 index 的定义均为 Tensor。

bwd_out_slot_num 为 2 会使得 IndexSelectGradNode 中 bwd_out_meta_ 的大小为 2（它所包含的每一个元素也是一个 vector）。
但是在 index_select_ad_func 方法 (dygraph_function.cc) 里，只对 bwd_out_meta_[0] 做了初始化，将 bwd_out_meta_[0] resize 到了 1，而 bwd_out_meta_[1] 的大小始终为 0.

// Set grad_node after API call
if(require_any_grad) {

  egr::EagerUtils::PassStopGradient(false,out_autograd_meta);

  // SetGradOutMeta & SetEdges
  grad_node->SetGradOutMeta(x, 0);
  // SetOutRank & SetHistory & SetGradInMeta
  if (out_autograd_meta) {
    egr::EagerUtils::SetOutRankWithSlot(out_autograd_meta, 0);
  }
  if (out_autograd_meta) {
    egr::EagerUtils::SetHistory(out_autograd_meta, grad_node);
  }
  grad_node->SetGradInMeta(out, 0);
  // Set TensorWrappers for Forward Outputs if needed

}

以上原因在于，index_select_grad 的配置中，输出只有一个 (x_grad)。导致在 eager_gen.py 生成 dygraph_function.cc 中，并不会对生成类似 grad_node->SetGradOutMeta(index, 1); 的代码。

而在 IndexSelectGradNode::operator() 方法中 (node.cc) 的最后，会有:

if(NeedComplexToRealConversion()) HandleComplexGradToRealGrad(&returns);

returns 的定义为:

const auto& out_metas = OutputMeta();
paddle::small_vector<std::vector<paddle::Tensor>, egr::kSlotSmallVectorSize> returns(2);
for (int i = 0; i < 2; ++i) {
  out_metas[i].size() == 0 ? returns[i].resize(1) : returns[i].resize(out_metas[i].size());
}

out_metas 即为 bwd_out_meta_, 此时 out_metas[0].size() 为 1, out_metas[1].size 为 0. 最终导致 returns 中的两个元素的大小均为 1.

在 HandleComplexGradToRealGrad (grad_node_info.cc) 中，有

void GradNodeBase::HandleComplexGradToRealGrad(
    paddle::small_vector<std::vector<paddle::Tensor>, kSlotSmallVectorSize>*
        out_grads) {
  for (size_t slot_id = 0; slot_id < out_grads->size(); slot_id++) {
    const std::vector<paddle::Tensor>& slot_out_grads = (*out_grads)[slot_id];
    for (size_t rank_id = 0; rank_id < slot_out_grads.size(); rank_id++) {
      const GradSlotMeta& slot_meta = bwd_out_meta_[slot_id][rank_id];

其中的 out_grads 即为 returns。

由于 returns 的大小为 2，且 returns[1].size() = 1. 但是 bwd_out_meta_[1].size() = 0.

因此当 slot_id=1, rank_id=0 时，bwd_out_meta_[1][0] 会报错。

由于非 Complex 类型的数据并不会进入 HandleComplexGradToRealGrad 中，并不会对它们所造成影响。

GGBond8488

LGTM

GGBond8488 · 2023-08-31T09:45:02Z

前因后果很详细 👍👍👍

ScottWong98 · 2023-08-31T09:55:17Z

@luotao1 关于 PR-CI-Static-Check 中的 plain sample code style (#55629), 我是直接在本 PR 上修复，还是待该 PR merge 后，重新提个 PR 解决这个问题

luotao1 · 2023-08-31T11:34:16Z

待该 PR merge 后，重新提个 PR 解决这个问题

ScottWong98 · 2023-09-01T02:54:00Z

麻烦 @heavyrain-lzy 帮忙 review 一下 :)

heavyrain-lzy

LGTM for YAML file

ScottWong98 · 2023-09-01T06:43:12Z

@luotao1 所有的 Required CI 已经都过啦

…ddlePaddle#56457) * support index_select op * index_sample in cpu * support index_sample in gpu * change data_transform * fix api gen and use skip_transform in yaml

support index_select op

9504a36

paddle-bot bot added contributor External developers status: proposed labels Aug 18, 2023

ScottWong98 added 3 commits August 19, 2023 03:46

index_sample in cpu

3254d58

Merge branch 'add_complex_support_for_index_sample' into add_complex_…

b91204d

…support_for_index_select

support index_sample in gpu

8a00648

ScottWong98 changed the title ~~【Complex op】add complex support for index_select~~ 【Complex op】add complex support for index_select and index_sample Aug 19, 2023

luotao1 assigned luotao1 and GGBond8488 Aug 21, 2023

luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Aug 21, 2023

paddle-bot bot removed the status: proposed label Aug 21, 2023

luotao1 removed the HappyOpenSource 快乐开源活动issue与PR label Aug 21, 2023

GGBond8488 reviewed Aug 21, 2023

View reviewed changes

ScottWong98 added 3 commits August 27, 2023 15:35

Merge branch 'develop' into add_complex_support_for_index_select

fb0ebe0

change data_transform

dd48d7c

fix api gen and use skip_transform in yaml

3b891f8

GGBond8488 approved these changes Aug 31, 2023

View reviewed changes

luotao1 approved these changes Aug 31, 2023

View reviewed changes

heavyrain-lzy approved these changes Sep 1, 2023

View reviewed changes

luotao1 merged commit 0b60839 into PaddlePaddle:develop Sep 1, 2023
25 of 26 checks passed

luotao1 added HappyOpenSource Pro 进阶版快乐开源活动，更具挑战性的任务 and removed HappyOpenSource Pro 进阶版快乐开源活动，更具挑战性的任务 labels Sep 6, 2023

Difers mentioned this pull request Oct 12, 2023

Enhance dtype support for the get_value operation. #57272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Complex op】add complex support for index_select and index_sample #56457

【Complex op】add complex support for index_select and index_sample #56457

ScottWong98 commented Aug 18, 2023 •

edited

Loading

paddle-bot bot commented Aug 18, 2023

GGBond8488 Aug 21, 2023

GGBond8488 Aug 21, 2023

ScottWong98 Aug 22, 2023

GGBond8488 Aug 25, 2023

paddle-ci-bot bot commented Aug 27, 2023

ScottWong98 commented Aug 28, 2023 •

edited

Loading

ScottWong98 commented Aug 28, 2023 •

edited

Loading

ScottWong98 commented Aug 28, 2023

GGBond8488 left a comment

GGBond8488 commented Aug 31, 2023

ScottWong98 commented Aug 31, 2023

luotao1 commented Aug 31, 2023

ScottWong98 commented Sep 1, 2023

heavyrain-lzy left a comment

ScottWong98 commented Sep 1, 2023

【Complex op】add complex support for index_select and index_sample #56457

【Complex op】add complex support for index_select and index_sample #56457

Conversation

ScottWong98 commented Aug 18, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 18, 2023

GGBond8488 Aug 21, 2023

Choose a reason for hiding this comment

GGBond8488 Aug 21, 2023

Choose a reason for hiding this comment

ScottWong98 Aug 22, 2023

Choose a reason for hiding this comment

GGBond8488 Aug 25, 2023

Choose a reason for hiding this comment

paddle-ci-bot bot commented Aug 27, 2023

ScottWong98 commented Aug 28, 2023 • edited Loading

Q: 为什么输入 x 的数据类型为 complex 的情况下，index 的数据类型会被转化成 complex 类型?

ScottWong98 commented Aug 28, 2023 • edited Loading

index 数据类型转化的解决方法

方法1

方法2

ScottWong98 commented Aug 28, 2023

反向传播问题

GGBond8488 left a comment

Choose a reason for hiding this comment

GGBond8488 commented Aug 31, 2023

ScottWong98 commented Aug 31, 2023

luotao1 commented Aug 31, 2023

ScottWong98 commented Sep 1, 2023

heavyrain-lzy left a comment

Choose a reason for hiding this comment

ScottWong98 commented Sep 1, 2023

ScottWong98 commented Aug 18, 2023 •

edited

Loading

ScottWong98 commented Aug 28, 2023 •

edited

Loading

Q: 为什么输入 x 的数据类型为 complex 的情况下，`index` 的数据类型会被转化成 complex 类型?

ScottWong98 commented Aug 28, 2023 •

edited

Loading

`index` 数据类型转化的解决方法