Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bilinear Tensor Product operator. #5014

Merged
merged 11 commits into from
Nov 14, 2017

Conversation

peterzhang2029
Copy link
Contributor

resolve #4789

auto weight_dims = ctx->GetInputDim("Weight");

PADDLE_ENFORCE_EQ(x_dims.size(), 1, "The input X must be a vector.");
PADDLE_ENFORCE_EQ(y_dims.size(), 1, "The input Y must be a vector.");
Copy link
Contributor

@lcy-seso lcy-seso Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Why X and Y are vectors? A mini-batch is a 2-D tensor. The 1st dimension is batch size, and the 2nd dimension is the hidden size of X and Y.
  • If a variable is a vector, it should also be a 2-D tensor. It is necessary to indicate it is a row vector [1 x N] or a column vector [N x 1].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator must implement the batch computation.

template <typename Place, typename T>
class BilinearTensorProductCUDAKernel : public framework::OpKernel<T> {
public:
void Compute(const framework::ExecutionContext& ctx) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里GPU下为什么需要从CPU拷贝输入输出呢?不管是Eigen还是矩阵乘法都有GPU下的实现。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经做了修正,不需要从CPU进行拷贝。

auto y_dims = ctx->GetInputDim("Y");
auto weight_dims = ctx->GetInputDim("Weight");

PADDLE_ENFORCE_EQ(x_dims.size(), 2, "The input X must be a 2D Tensor.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2UL, 39~45,下同。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"The second dimension of X must be equal with the second "
"dimension of the Weight.");
PADDLE_ENFORCE_EQ(y_dims[1], weight_dims[2],
"The second dimension of Y must be equal with the third "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be equal to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PADDLE_ENFORCE_EQ(x_dims[0], y_dims[0],
"The first dimension(batch_size) of X must be "
"equal with the first dimension of the Y.");
PADDLE_ENFORCE_EQ(x_dims[1], weight_dims[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be equal to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"The input Weight must be a 3D tensor.");
PADDLE_ENFORCE_GT(weight_dims[0], 0,
"The first dimension of Weight must be larger than 0.");
PADDLE_ENFORCE_GT(weight_dims[1], 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用PADDLE_ENFORCE~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,99 @@
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
Copy link
Contributor

@lcy-seso lcy-seso Nov 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

License的缩进有问题。按照accuracy_op.h 。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"The first dimension(batch_size) of Out@GRAD must be equal with "
"the first dimension of the X.");
PADDLE_ENFORCE_EQ(weight_dims[0], out_dims[1],
"The second dimension of Out@GRAD must be equal with "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be equal to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"the first dimension of the X.");
PADDLE_ENFORCE_EQ(weight_dims[0], out_dims[1],
"The second dimension of Out@GRAD must be equal with "
"the third dimension of the Weight.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weight --> Input(Weight)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (ctx->HasInput("Bias")) {
auto bias_dims = ctx->GetInputDim("Bias");
PADDLE_ENFORCE_EQ(bias_dims[1], out_dims[1],
"The second dimension of Bias must be equal with "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be equal to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

namespace paddle {
namespace operators {

using Tensor = framework::Tensor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using framework::LoDTensor;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为这个OP没有用到LoDTensor,所以改成了using framework::Tensor;

ctx.GetPlace());
auto left_mul_mat = EigenMatrix<T>::From(left_mul);
Tensor output_col;
output_col.mutable_data<T>(framework::make_ddim({weight_dims[0]}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

一定需要这个临时变量吗?为什么不可以从输出Tensor里面切出一片,构造为一个 EigenMatrix,将69行的计算结果直接赋值给从输出Tensor切出来的一片。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto y_dims = ctx->GetInputDim("Y");
auto weight_dims = ctx->GetInputDim("Weight");

PADDLE_ENFORCE_EQ(x_dims.size(), 2UL, "The input X must be a 2D Tensor.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is much better if the naming of inputs and outputs all the comments follow the same style. In line 28 and 32, input and output are denoted as Input(X) and Output(out), I think this is clear. So could you please keep a consistent style in all of the comments below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PADDLE_ENFORCE(weight_dims[1],
"The second dimension of Weight must be larger than 0.");
PADDLE_ENFORCE(weight_dims[2],
"The third dimension of Weight must be larger than 0.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 41 ~ 45. The three dimensions of learnable parameter Weight is determined by the dimension of X, the dimension of Y and the user-customized size of this operator. The dimension of X and Y are all customized by the user. The dimension is larger than 0 can be guaranteed when defining the network topology. This check is not necessary during the execution of this op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

PADDLE_ENFORCE_EQ(bias_dims.size(), 2UL,
"The input Bias must have 2 dimensions.");
PADDLE_ENFORCE_EQ(bias_dims[0], 1UL,
"The first dimention of input Bias must be 1.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge line 59 ~ 61

 PADDLE_ENFORCE(bias_dims.size() == 2UL &&  bias_dims[1] == 1UL, 
               "The Input(bias) should be a 2-D tensor with the 2nd "
               "dimensions fixed to 1 (a row vector).")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

BilinearTensorProductOpMaker(framework::OpProto* proto,
framework::OpAttrChecker* op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of BilinearTensorProduct op.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to user bilinear_tensor_product operator. Because "BilinearTensorProductOp" is a name for the developer (in C++ codes), while "bilinear_tensor_product operator" is the name for the user (exposed by the user interface).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "The first input of BilinearTensorProduct op.");
AddInput("Y", "The second input of BilinearTensorProduct op.");
AddInput("Weight", "The input weight of BilinearTensorProduct op.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The learnable parameters of ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

AddInput("X", "The first input of BilinearTensorProduct op.");
AddInput("Y", "The second input of BilinearTensorProduct op.");
AddInput("Weight", "The input weight of BilinearTensorProduct op.");
AddInput("Bias", "The input bias of BilinearTensorProduct op.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The learnable bias for bilinear_tensor_product operator. Do not use an abbreviation in the comments, if it is necessary (widely accept, or the name is too long).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ops::BilinearTensorProductOpGrad);
REGISTER_OP_CPU_KERNEL(
bilinear_tensor_product,
ops::BilinearTensorProductKernel<paddle::platform::CPUPlace, float>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

register a kernel support the double type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (d_x) {
d_x->mutable_data<T>(ctx.GetPlace());
set_zero(ctx.device_context(), d_x, static_cast<T>(0));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 if (d_x)  d_x->mutable_data<T>(ctx.GetPlace());

Setting zero is not necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an additive operation for d_x:

d_x = d_x + y_scale weight_i

For this reason, the elements of d_x must be initialized as 0. Otherwise this op will lead to erroneous result.

if (d_y) {
d_y->mutable_data<T>(ctx.GetPlace());
set_zero(ctx.device_context(), d_y, static_cast<T>(0));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (d_y)  d_y->mutable_data<T>(ctx.GetPlace());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same with d_x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same to ...
I see.


def test_check_grad_normal(self):
self.check_grad(['X', 'Y', 'Weight', 'Bias'], 'Out')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need TestBilinearTensorProductOp2 and TestBilinearTensorProductOp3? I do not think these tests are necessary. They are not necessary boundary case for BilinearTensorProductOp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

template <typename T, int MajorType = Eigen::RowMajor,
typename IndexType = Eigen::DenseIndex>
using EigenVector = framework::EigenVector<T, MajorType, IndexType>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 ~ 32 行删掉。并没有用到 EigenVector。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto weight_dims = weight->dims();
auto place = ctx.GetEigenDevice<Place>();

// Create the intermediate variables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please complete the comments. Otherwise, I will wonder create the intermediate variables for what?
  • You can just add the formula to the comment.
  • It is variable not variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


math::SetConstant<Place, T> set_zero;

// Set X@Grad be zero at first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove "at first".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto d_out_mat = EigenMatrix<T>::From(*d_out);
auto place = ctx.GetEigenDevice<Place>();

// Create the intermediate variables for gradient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please complete the comments. There are three gradients need to be computed in backward.
Create the intermediate variables for whose gradients?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

// Caculate the gradient of Weight.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weight --> Input(Weight) to keep a consistent naming style in comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
}

// Caculate the gradient of Bias.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bias --> Input(Bias)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

set_zero(ctx.device_context(), d_y, static_cast<T>(0));
}

// Caculate the X@Grad and Y@Grad.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output(X@Grad) and Output(Y@Grad)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (d_y) {
d_y->mutable_data<T>(ctx.GetPlace());
set_zero(ctx.device_context(), d_y, static_cast<T>(0));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same to ...
I see.

Tensor weight_i = weight->Slice(i, i + 1).Resize(
framework::make_ddim({weight_dims[1], weight_dims[2]}));
auto output_vec = d_out_mat.chip(i, 1);
if (d_x) {
Copy link
Contributor

@lcy-seso lcy-seso Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

以 dx 为例 ,dy相同,
$dx = \frac{\partial{\mathcal{L}}}{\partial{Z}}WY^T$ 其中乘以 \partial{\mathcal{L}}}{\partial{Z} 是一个broadcast 的 “scaling” 运算。

为什么不可以在 135 ~ 138 之后再进行这个 “scaling” 运算呢?这样是不是就可以直接去掉 x_scalex_scale 这样两个中间变量(也避免分配内存的问题)。

不知是否可行。因为这个 "scaling" 操作从计算的逻辑上是可以 “原地” 运算。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里由于broadcast是在batch的方向展开,且TMP = scaled(X) W,scaled(X)中每一行元素所乘的放缩系数不同,所以无法在矩阵乘法之后做scaling计算。即scaled(X) W != scaled(X W).

@luotao1
Copy link
Contributor

luotao1 commented Nov 13, 2017

There is one warning in the teamcity log, please fix it.

[07:08:48]W:	 [Step 1/1] In file included from /paddle/paddle/operators/bilinear_tensor_product_op.cc:15:0:
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h: In instantiation of 'void paddle::operators::BilinearTensorProductKernel<Place, T>::Compute(const paddle::framework::ExecutionContext&) const [with Place = paddle::platform::CPUPlace; T = double]':
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.cc:159:78:   required from here
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h:56:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
[07:08:48]W:	 [Step 1/1]      for (size_t i = 0; i < weight_dims[0]; ++i) {
[07:08:48]W:	 [Step 1/1]                           ^
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h: In instantiation of 'void paddle::operators::BilinearTensorProductKernel<Place, T>::Compute(const paddle::framework::ExecutionContext&) const [with Place = paddle::platform::CPUPlace; T = float]':
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.cc:159:78:   required from here
[07:08:48]W:	 [Step 1/1] /paddle/paddle/operators/bilinear_tensor_product_op.h:56:26: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

@peterzhang2029
Copy link
Contributor Author

Warning fixed @luotao1 @lcy-seso

lcy-seso
lcy-seso previously approved these changes Nov 13, 2017
Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -43,24 +43,26 @@ class BilinearTensorProductKernel : public framework::OpKernel<T> {

auto batch_size = x->dims()[0];
auto weight_dims = weight->dims();
int Out_dim = weight_dims[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out_dim --> out_dim
X_dim --> x_dim
Y_dim -->y_dim

第一个字母不要大写。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lcy-seso lcy-seso merged commit e9695f4 into PaddlePaddle:develop Nov 14, 2017
@peterzhang2029 peterzhang2029 deleted the bi_tensor_prod_op branch November 14, 2017 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2-D TensorProduct Operator
4 participants