Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoParallel] Dygraph basic impl for semi auto parallel #55698

Merged
merged 29 commits into from
Aug 16, 2023

Conversation

chenwhql
Copy link
Contributor

@chenwhql chenwhql commented Jul 25, 2023

PR types

Breaking changes

PR changes

Others

Description

Pcard-73145

[AutoParallel] Dygraph basic impl for semi auto parallel

打通DistTensor在动态图执行的基本流程:

  • Python端创建DistTensor->Eager Forward API -> PHI Forward API -> Forward结果 -> Eager Backward -> PHI Backward API

本PR主要工作:

  1. 在动态图前反向执行路径中添加基本的DistTensor处理逻辑
  2. 添加动半在PHI前反向API中处理分支的代码自动生成逻辑

由于动半的API执行逻辑需要的侵入式修改PHI原先的执行逻辑,为了避免代码自动生成逻辑变得更加难以维护,因此在新增文件中重新梳理了动半相关的API分支生成逻辑,分离维护,便于后续整合InferSPMD、Reshard等功能

动半生成代码分支示例:

// Auto Parallel condition
  if (AllInputsAreDistTensor(x)) {
    // 1. Create API Output & Prepare Dist and Dense Output
    Tensor api_output;

    auto dist_out = SetKernelDistOutput(&api_output);
    auto dense_out = dist_out->mutable_value();

    // 2. InferSPMD (Infer Global Shape and DistAttr of Inputs&Outputs)
    phi::MetaTensor meta_dist_out(dist_out);
    phi::UnchangedInferMeta(MakeMetaTensor(*x.impl()), &meta_dist_out);

    // 3. Select Kernel
    VLOG(6) << "relu API dist branch: kernel key: [" << kernel_backend << ", " << kernel_layout << ", "<< kernel_data_type << "]";
    auto kernel_result = phi::KernelFactory::Instance().SelectKernelOrThrowError(
        "relu", {kernel_backend, kernel_layout, kernel_data_type});
    const auto& kernel = kernel_result.kernel;
    VLOG(6) << "relu kernel: " << kernel;
    auto* dev_ctx = GetDeviceContextByBackend(kernel_result.has_fallback_cpu ? Backend::CPU : kernel_backend);

    // 4. Reshard Input

    // 5. PrepareData (DataTransform & Prepare Dist and Dense Input)
    auto dist_input_x = PrepareDataForDistTensor(x, GetKernelInputArgDef(kernel.InputAt(0), kernel_backend), {}, kernel_result.is_stride_kernel);
    auto input_x = dist_input_x->mutable_value();

    // 6. Infer Local DenseTensor Meta
    phi::MetaTensor meta_dense_out(dense_out);
    phi::UnchangedInferMeta(MakeMetaTensor(*input_x), &meta_dense_out);

    // 7. DenseTensor Kernel Call
    using kernel_signature = void(*)(const phi::DeviceContext&, const phi::DenseTensor&, phi::DenseTensor*);
    auto* kernel_fn = kernel.GetVariadicKernelFn<kernel_signature>();
    (*kernel_fn)(*dev_ctx, *input_x, dense_out);

    // 8. Reshard Output

    // 9. Return
    return api_output;
  }

本PR只是打通的基础的执行逻辑,还有诸多问题后续需要处理:

  1. 目前仅支持了输入输出类型全部为Tensor的API生成自动并行执行分支,其他输入类型包括(optional Tensor,vector Tensor,optional vector Tensor),其他输出类型包括(vector Tensor)
  2. InferSPMD还需要设计一下,如果只需要推导Shape,DistTensor中可能不需要一个DenseTensorMeta成员,而只需要一个DDim成员,否则存储两套layout和dtype可能有不一致的问题;如何复用InferMeta,DistAttr的推导如何结合目前尚不明确,也需要设计
  3. WITH_DISTRIBUTED的界定在代码中目前比较模糊,有些被宏管理,有些又没有,比如通信相关的utils和kernel,很容易编译出错,后续需要整理一下,可以在python到C++层统一处理一下,确保用户能看到正确的提示,C++端尽量不再区分对待了,不然开发和阅读体验都比较差
  4. 原生成API中Profile的逻辑比较混乱,没有边界,需要模块化整理以便于维护,后续再添加
  5. 还有诸多TODO直接已记录在代码中,后续需要逐条完善

Comment on lines 65 to 66
mesh = dist.ProcessMesh([0, 1], dim_names=["x"])
dist_attr = dist.DistAttr(mesh=mesh, sharding_specs=["x", None])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉现在写测试可以直接创建一个replicate状态的tensor。因为目前DistTensor的构造函数还没有把local_meta和dist_meta区分开,而且还没有加根据dist_attr创建local_tensor的逻辑,默认创建的都是replicate状态,所以指定sharding_specs为非replicate状态暂时没意义。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

: meta_(meta), dist_attr_(dist_attr) {
value_ = std::make_unique<DenseTensor>(*dense_tensor);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个和上面那个也接收DenseTensor的构造函数,是不是理论上存在一个就够了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

Comment on lines 21 to 22
#include "paddle/phi/core/distributed/auto_parallel/dist_attr.h"
#include "paddle/phi/core/distributed/auto_parallel/dist_tensor.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是也该用with_distribute包一下,不过现在开不开WITH_DISTRIBUTE,dist_attr都会编

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

Comment on lines 243 to 245
#ifdef PADDLE_WITH_DISTRIBUTE
|| is_dist_tensor
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果提前写一个is_dist_tensor = false,这里可以少写一个#ifdef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

"The kernel of ({}) for input tensors is unimplemented, please check the type of input tensors."));
"""

# TODO(chenweihang): add profle function code later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thx

Comment on lines +636 to +642
if len(self.kernel['func']) > 1:
kernel_dispatch_code = ''
for kernel_name in self.kernel['func']:
kernel_dispatch_code += self.gene_dispatch_code(
kernel_name, inplace_flag
)
return API_IMPL_TEMPLATE.format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是多个kernel配置下不支持dist的处理吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前是的,DistTensor只支持DenseTensor,且复用DenseTensor kernel

@@ -178,6 +199,10 @@ void GradTensorHolder::add(size_t slot_id,
&buffer_values);
}
}
#ifdef PADDLE_WITH_DISTRIBUTE
} else if (t.is_dist_tensor()) {
buffer_tensor = add_ad_func(t, buffer_tensor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_ad_func 可以支持 distentor吗?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该可以,add_ad_func call paddle::experimental::add, 然后这个PR会为paddle::experimental::add生成dist分支

@@ -248,6 +248,27 @@ static PyObject* tensor_method_numpy(TensorObject* self,
place,
dense_tensor->Holder()->ptr(),
dense_tensor->Holder()->size());
#ifdef PADDLE_WITH_DISTRIBUTE
} else if (self->tensor.is_dist_tensor()) {
// TODO(chenweihang): deal with DistTensor as local DenseTensor now,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

直接numpy构造的disttensor,应该是global tensor,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,不过这里是tensor.numpy()方法,现在暂时取local tensor值打印了,如果用户在组网中拿任意一个tensor的numpy()值的话,有可能是shard或者partial的,看后面要不要先把完整的值拿回来再转numpy

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -93,6 +93,7 @@ std::shared_ptr<DistTensor> RToSReshardFunction::Eval(

return std::make_shared<DistTensor>(
std::make_shared<DenseTensor>(out_physical_tensor_cur_rank),
out_physical_tensor_cur_rank.meta(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里DistTensor构造函数的第二个参数是不是需要传入分布式的meta,感觉这里应该用in的meta,拿切分后的meta分布式形状就不对了。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前单测中有检查,用in的meta会失败,先使用切分后的

Copy link
Contributor

@LiYuRio LiYuRio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@raindrops2sea raindrops2sea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenwhql chenwhql merged commit 7039bef into PaddlePaddle:develop Aug 16, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants