Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support logical functors and refine tensor packing function #33089

Merged
merged 18 commits into from Jun 4, 2021

Conversation

JamesLim-sy
Copy link
Contributor

@JamesLim-sy JamesLim-sy commented May 24, 2021

PR types

Performance optimization

PR changes

OPs

Describe

  1. Feature :
  • Refine the warpper function of packing input and output tensors into respective vector
  • Support one unary op and three logical binary ops below:
    not
    and
    Or
    Xor
  1. Performance optimization:
  • The performance variation is below:
    图片
  1. Conclusion :
  • As can be seen in the table, the time cost of most of test cases reflect the great improvment in logical ops after optimization of Elementwise and Broadcast op.

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

*/
template <typename OutT>
int PackTensorsIntoVector(const framework::ExecutionContext &ctx,
std::vector<const framework::Tensor *> *ins,
std::vector<framework::Tensor *> *outs) {
std::vector<framework::Tensor *> *outs,
framework::Tensor *x_ptr = nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 参数叫x_ptr会引起疑惑。比如叫x_for_selectedrows这样的,比较直观。函数解释里面说明一下,输入x可以是LoDTensor或SelectedRows。当x是SelectedRows,需要先转换成LoDTensor参与计算,此时需要在op kernel中额外定义一个临时的tensor,并传入x_for_selectedrows参数。
  • elementwise_mul_op.h里面,ElementwiseMulKernel也改下。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,下个commit 修改完成

auto *y = ctx.Input<framework::LoDTensor>("Y");
auto *z = ctx.Output<framework::LoDTensor>("Out");

if (x_ptr == nullptr || x_var->IsType<framework::LoDTensor>()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里条件应该是if (x_var->IsType<framework::LoDTensor>)就够了,就算传入了x_ptr也是没用的。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实显得画蛇添足了,下个commit修改掉

z = ctx.Output<framework::LoDTensor>("Out");
ins->emplace_back(x);
x_dims_size = x->dims().size();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

觉得这里加空行不太好看。。。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个commit 删除

platform::errors::InvalidArgument(
"For elementwise_op, if X is Sparse, Y must be "
"scalar. But reveived the size of Y = %d.",
y->dims().size()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里要PADDLE_ENFORCE检查下x_ptr不为null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照建议修改

z = ctx.Output<framework::SelectedRows>("Out")->mutable_value();
ins->emplace_back(x_ptr);
x_dims_size = x_ptr->dims().size();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,这里不要加空行。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个commit 删除


if (y != nullptr) {
ins->emplace_back(y);
axis = ctx.HasAttr("axis") ? ctx.Attr<int>("axis") : -1;
axis = axis == -1 ? std::abs(y->dims().size() - x->dims().size()) : axis;
axis = axis == -1 ? std::abs(y->dims().size() - x_dims_size) : axis;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实我原来的review建议,只是把axis==-1时的这个换算放到LaunchElementwiseCudaKernel里面,就这一行代码。如果axis==-1只适用于二元、broadcast的情况,那就放到LaunchBroadcastElementwiseCudaKernel里面,感觉代码能简化一些?

Copy link
Contributor Author

@JamesLim-sy JamesLim-sy Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我原来一直会错意了... 之前以为是将axis的整体计算全部搬到LaunchBroadcastElementwiseCudaKernel 里面。搬到LaunchBroadcastElementwiseCudaKernel 里面确实更合适一些,代码的通用性也更强。按照comment的思路修改的话,可能最终的代码成果是下面这样的,或者其他类似形式。

void LaunchBroadcastElementwiseCudaKernel(cuda_ctx, ins,  outs , axis, func) {

   std::vector<int> dims_size;
   bool no_broadcast_flag = true;
   for (auto *in : ins) {
     no_broadcast_flag = ins[0]->dims() == in->dims();
     dims_size.emplace_back(in->dims().size());
   }

   if (no_broadcast_flag) {
     LaunchSameDimsElementwiseCudaKernel<ET, InT, OutT>(
            cuda_ctx, ins, outs, func);
   } else {
     axis = axis == -1
                ? *std::max_element(dims_size.begin(), dims_size.end()) -
                   *std::min_element(dims_size.begin(), dims_size.end()) : axis;
     LaunchBroadcastElementwiseCudaKernel<ET, InT, OutT>(cuda_ctx, ins, outs,
                                                         axis, func);
   }
}

Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xreki Xreki merged commit 941308c into PaddlePaddle:develop Jun 4, 2021
@JamesLim-sy JamesLim-sy changed the title Support logical functors Support logical functors and refine tensor packing function Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants