[Dygraph API] Fix merged_momentum, provide actual inplace operations … #59161

RuohengMa · 2023-11-20T08:27:31Z

PR types

Bug fixes

PR changes

APIs

Description

在维持#58204功能的前提下修复遗留的问题，使fallback到CPU后，merged_momentum kernel里输入输出算子在CPU上的地址是一样的

以merged_momentum为例：
output部分：
修复前：

  std::tuple<std::vector<Tensor>&, std::vector<Tensor>&, paddle::optional<std::vector<Tensor>>&> api_output{param, velocity, master_param};
  auto kernel_out_0 = SetInplaceVectorKernelOutput(param.size(), &std::get<0>(api_output));
  if (kernel_result.has_fallback_cpu) {
    TransDataBackend(kernel_out_0, actual_kernel_backend, kernel_out_0);
  }
  auto kernel_out_1 = SetInplaceVectorKernelOutput(param.size(), &std::get<1>(api_output));
  if (kernel_result.has_fallback_cpu) {
    TransDataBackend(kernel_out_1, actual_kernel_backend, kernel_out_1);
  }
  auto kernel_out_2 = SetInplaceOptionalVectorKernelOutput(param.size(), std::get<2>(api_output));
  if (kernel_result.has_fallback_cpu) {
    TransDataBackend(kernel_out_2, actual_kernel_backend, kernel_out_2);
  }

修复后：

  std::tuple<std::vector<Tensor>&, std::vector<Tensor>&, paddle::optional<std::vector<Tensor>>&> api_output{param, velocity, master_param};
  auto kernel_out_0 = SetInplaceVectorKernelOutput(param.size(), &std::get<0>(api_output));
  if (kernel_result.has_fallback_cpu) {
    for (size_t i = 0; i < kernel_out_0.size(); ++i) {
      kernel_out_0[i] = const_cast<phi::DenseTensor*>(input_param[i]);
    }
  }
  auto kernel_out_1 = SetInplaceVectorKernelOutput(param.size(), &std::get<1>(api_output));
  if (kernel_result.has_fallback_cpu) {
    for (size_t i = 0; i < kernel_out_1.size(); ++i) {
      kernel_out_1[i] = const_cast<phi::DenseTensor*>(input_velocity[i]);
    }
  }
  auto kernel_out_2 = SetInplaceOptionalVectorKernelOutput(param.size(), std::get<2>(api_output));
  if (kernel_result.has_fallback_cpu) {
    for (size_t i = 0; i < kernel_out_2.size(); ++i) {
      kernel_out_2[i] = const_cast<phi::DenseTensor*>(input_master_param->at(i));
    }
  }

return前的copy back部分：
修复前：

  if (kernel_result.has_fallback_cpu) {

    TransDataBackend(kernel_out_0, kernel_backend, kernel_out_0);
    TransDataBackend(kernel_out_1, kernel_backend, kernel_out_1);
    TransDataBackend(kernel_out_2, kernel_backend, kernel_out_2);

  }

修复后：

  if (kernel_result.has_fallback_cpu) {

    TransDataBackend(kernel_out_0, kernel_backend, kernel_out_0);
    for (size_t i = 0; i < param.size(); ++i) {
      auto target_ptr = static_cast<phi::DenseTensor*>(param.at(i).impl().get());
      *target_ptr = *kernel_out_0.at(i);
    }
    TransDataBackend(kernel_out_1, kernel_backend, kernel_out_1);
    for (size_t i = 0; i < velocity.size(); ++i) {
      auto target_ptr = static_cast<phi::DenseTensor*>(velocity.at(i).impl().get());
      *target_ptr = *kernel_out_1.at(i);
    }
    TransDataBackend(kernel_out_2, kernel_backend, kernel_out_2);
    if (master_param) {
      for (size_t i = 0; i < master_param->size(); ++i) {
        auto target_ptr = static_cast<phi::DenseTensor*>(master_param->at(i).impl().get());
        *target_ptr = *kernel_out_2.at(i);
      }
    }

  }

paddle-bot · 2023-11-20T08:27:36Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

CLAassistant · 2023-11-20T08:27:37Z

All committers have signed the CLA.

paddle-bot · 2023-11-20T08:27:39Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

…after falling back to CPU

qili93

LGTM for const_cast

YuanRisheng · 2023-11-22T03:17:58Z

paddle/phi/api/yaml/generator/api_gen.py

+{code_indent}      kernel_out_{output_idx}[i] = const_cast<phi::DenseTensor*>({PREFIX_TENSOR_NAME}{self.inplace_map[self.outputs['names'][output_idx]]}->at(i));
+{code_indent}    }}


本身就是inplace，这里为何要再次把input的指针赋值给out

因为如果发生了fallback到cpu的情况的话，输入的tensor是会被从xpu copy到cpu上；因为算子是inplace，输入输出的指针应该是同一个，所以需要把kernel_out指向copy到cpu上的数据而不是指向xpu上的数据~

这里等号左右的tensor不是同一个吗？

YuanRisheng · 2023-11-22T03:20:27Z

paddle/phi/api/yaml/generator/api_base.py

+{code_indent}        auto target_ptr = static_cast<phi::DenseTensor*>({target_input}->at(i).impl().get());
+{code_indent}        *target_ptr = *{kernel_out}.at(i);


这里不是inplace吗，为何又要把输出给了输入

因为fallback到cpu之后，out是在cpu上的，所以需要重新把out的值写回xpu上的input~

cqulilujia

LGTM

…after falling back to CPU (PaddlePaddle#59161)

paddle-bot bot added the contributor External developers label Nov 20, 2023

[Dygraph API] Fix merged_momentum, provide actual inplace operations …

b52496b

…after falling back to CPU

RuohengMa force-pushed the mm branch from 2437dba to b52496b Compare November 21, 2023 03:02

qili93 approved these changes Nov 22, 2023

View reviewed changes

YuanRisheng reviewed Nov 22, 2023

View reviewed changes

zyfncg approved these changes Nov 22, 2023

View reviewed changes

cqulilujia approved these changes Nov 22, 2023

View reviewed changes

houj04 approved these changes Nov 22, 2023

View reviewed changes

YuanRisheng approved these changes Nov 22, 2023

View reviewed changes

houj04 merged commit b50313c into PaddlePaddle:develop Nov 22, 2023
28 checks passed

SecretXV pushed a commit to SecretXV/Paddle that referenced this pull request Nov 28, 2023

[Dygraph API] Fix merged_momentum, provide actual inplace operations …

db3d9d5

…after falling back to CPU (PaddlePaddle#59161)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dygraph API] Fix merged_momentum, provide actual inplace operations … #59161

[Dygraph API] Fix merged_momentum, provide actual inplace operations … #59161

RuohengMa commented Nov 20, 2023 •

edited

Loading

paddle-bot bot commented Nov 20, 2023

CLAassistant commented Nov 20, 2023 •

edited

Loading

paddle-bot bot commented Nov 20, 2023

qili93 left a comment

YuanRisheng Nov 22, 2023

RuohengMa Nov 22, 2023

zyfncg Nov 22, 2023

YuanRisheng Nov 22, 2023

RuohengMa Nov 22, 2023

cqulilujia left a comment

		{code_indent} kernel_out_{output_idx}[i] = const_cast<phi::DenseTensor*>({PREFIX_TENSOR_NAME}{self.inplace_map[self.outputs['names'][output_idx]]}->at(i));
		{code_indent} }}

		{code_indent} auto target_ptr = static_cast<phi::DenseTensor*>({target_input}->at(i).impl().get());
		{code_indent} target_ptr = {kernel_out}.at(i);

[Dygraph API] Fix merged_momentum, provide actual inplace operations … #59161

[Dygraph API] Fix merged_momentum, provide actual inplace operations … #59161

Conversation

RuohengMa commented Nov 20, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Nov 20, 2023

CLAassistant commented Nov 20, 2023 • edited Loading

paddle-bot bot commented Nov 20, 2023

qili93 left a comment

Choose a reason for hiding this comment

YuanRisheng Nov 22, 2023

Choose a reason for hiding this comment

RuohengMa Nov 22, 2023

Choose a reason for hiding this comment

zyfncg Nov 22, 2023

Choose a reason for hiding this comment

YuanRisheng Nov 22, 2023

Choose a reason for hiding this comment

RuohengMa Nov 22, 2023

Choose a reason for hiding this comment

cqulilujia left a comment

Choose a reason for hiding this comment

RuohengMa commented Nov 20, 2023 •

edited

Loading

CLAassistant commented Nov 20, 2023 •

edited

Loading