Rewrite Matmul, make code cleaner #10449

reyoung · 2018-05-07T08:20:27Z

No description provided.

… feature/clean_matmul

wangkuiyi

In addition to the comments, I also feel that this PR needs accompanying C++ unit tests.

I tried to make some changes in this PR: reyoung#8. Not sure how much it helps, but feel free to reject or merge. Thanks!

wangkuiyi · 2018-05-07T22:18:18Z

paddle/fluid/operators/math/blas.h

@@ -90,6 +101,28 @@ class Blas {
                   int K, T alpha, const T* A, const T* B, T beta, T* C,
                   int batchCount, int64_t strideA, int64_t strideB) const;

+  template <typename T>


Other methods in class template Blas are all in blas_impl.h, why not more the body of this new MatMul into blas_impl.h as well?

wangkuiyi · 2018-05-07T22:20:57Z

paddle/fluid/operators/math/blas.h

@@ -46,6 +46,17 @@ namespace paddle {
 namespace operators {
 namespace math {

+struct MatDim {


Given that MatDim is only used in Blas::MatMul, it seems that we should move MatDim into Blas::MatDim?

Also, the motivation of the introduction of MatDim is pretty confusing -- why cannot we use DDim for representing the shape of a matrix?

I changed the name to MatDescriptor since it does not only describe the dimension.

Not only the dimension but also the layout and stride of a memory buffer is described. It is clearer to have an independent structure to describe (transpose, stride, batch_size, dimension) together instead of reusing DDim.

wangkuiyi · 2018-05-07T22:21:20Z

paddle/fluid/operators/math/blas.h

+  int64_t width_;
+  int64_t stride_{0};
+  int64_t batch_size_{0};
+  bool trans_;


Need a comment explaining what does trans_ mean.

wangkuiyi · 2018-05-07T22:22:06Z

paddle/fluid/operators/math/blas.h

+  int64_t height_;
+  int64_t width_;
+  int64_t stride_{0};
+  int64_t batch_size_{0};


What is batch_size? Matrix is just a 2D data brick; what is its relationship with batch?

To further optimize the performance of GEMM, some hardware(GPU and CPU) supports batched matrix computation, I think MatDim is a better choice to represent matrix or matrix with batch.
Some introduce from Intel and paper
https://software.intel.com/en-us/articles/introducing-batch-gemm-operations
http://www.netlib.org/utk/people/JackDongarra/PAPERS/batched-matrix-comp.pdf.

wangkuiyi · 2018-05-07T22:23:21Z

paddle/fluid/operators/math/blas.h

@@ -90,6 +101,28 @@ class Blas {
                   int K, T alpha, const T* A, const T* B, T beta, T* C,
                   int batchCount, int64_t strideA, int64_t strideB) const;

+  template <typename T>
+  void MatMul(const framework::Tensor& mat_a, const MatDim& dim_a,
+              const framework::Tensor& mat_b, const MatDim& dim_b, T alpha,


How are users supposed to retrieve dim_a and dim_b?

If it is something like dim_a = GetMatDim(mat_a, ...), could MatMul call this GetMatDim; instead of letting users to make the call?

The GetMatDescriptor can be used in shape inference, data validation, and determine whether to broadcast batch_size or not. It is not only be used by MatMul directly.

wangkuiyi · 2018-05-07T22:26:03Z

paddle/fluid/operators/matmul_op.h

-using DDim = framework::DDim;
-using framework::make_ddim;
-using framework::vectorize;
+inline framework::DDim GetYDim(const framework::DDim& y_dim) {


GetYDim => ColumnMatrixShapeFromVector

wangkuiyi · 2018-05-07T22:26:27Z

paddle/fluid/operators/matmul_op.h

-Tensor CombineBatchAndM(const Tensor& input) {
-  Tensor output;
-  output.ShareDataWith(input);
+inline framework::Tensor CombineBatchAndM(const framework::Tensor& input) {


CombineBatchAndM => UnfoldFirstTwoDims

This function is actually a fold function, i.e., combine the first two dimension together.

I change the name to FoldInitDims. Init and Last is a common concept in functional programming languages, which init means the elements without the last one.

wangkuiyi · 2018-05-07T22:26:44Z

paddle/fluid/operators/matmul_op.h

@@ -72,23 +73,57 @@ Tensor CombineBatchAndM(const Tensor& input) {
 // (Warning: This requires transposing data and writes into new memory.)
 // Identity op if the tensor is not of rank 3.
 template <typename DeviceContext, typename T>
-Tensor CombineBatchAndN(const DeviceContext& context, const Tensor& input) {
-  Tensor output;
+inline framework::Tensor CombineBatchAndN(const DeviceContext& context,


CombineBatchAndN => UnfoldLastTwoDims

Rename it to FoldHeadAndLastDims. It will combine the 1st and last dimension together.

wangkuiyi · 2018-05-07T22:27:40Z

paddle/fluid/operators/matmul_op.h

  return output;
 }

+inline void NormalizeTensorShape(framework::Tensor* x,


NormalizeTensorShape => ReshapeTensorIntoMatrixSequence

wangkuiyi · 2018-05-07T22:29:28Z

paddle/fluid/operators/matmul_op.h

+  }
+}
+
+inline void NormalizeXYOutTensorShape(framework::Tensor* x,


I can hardly invent a name to describe the very complicated operation that's done by this function ...

I would highly recommend hiding the definition of this function in a nested namespace or in the class template MatMulGradKernel -- just not to make it pollute the namespace paddle.fluid.operator, so could we revisit its name later when we have a better idea.

Renamed to ReshapeXYOutIntoMatrixSequence.

BTW, I found the header matmul_op.h is not necessary, I just removed it and moving the implementations into matmul_op.cc. The definitions are hiding in the source file and no symbol is exported.

Rename MatDim to MatDescriptor

chengduoZH · 2018-05-08T02:56:02Z

paddle/fluid/operators/math/blas.cc

+        for (size_t i = 0; i < dim_vec.size() - 2; ++i) {
+          retv.batch_size_ *= dim_vec[i];
+          retv.height_ = dim_vec[dim_vec.size() - 2];
+          retv.width_ = dim_vec[dim_vec.size() - 1];


retv.height_ = dim_vec[dim_vec.size() - 2]; and retv.width_ = dim_vec[dim_vec.size() - 1]; should not be in the loop.

chengduoZH · 2018-05-08T03:18:59Z

paddle/fluid/operators/math/blas.h

+  int64_t height_;
+  int64_t width_;
+  int64_t stride_{0};
+  int64_t batch_size_{0};


To further optimize the performance of GEMM, some hardware(GPU and CPU) supports batched matrix computation, I think MatDim is a better choice to represent matrix or matrix with batch.
Some introduce from Intel and paper
https://software.intel.com/en-us/articles/introducing-batch-gemm-operations
http://www.netlib.org/utk/people/JackDongarra/PAPERS/batched-matrix-comp.pdf.

… feature/clean_matmul

chengduoZH

Excellent!

reyoung added the Code Cleanup label May 7, 2018

Rewrite Matmul, make code cleaner

c6a6d87

reyoung force-pushed the feature/clean_matmul branch from c4b90a4 to c6a6d87 Compare May 7, 2018 08:22

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

3dd0182

… feature/clean_matmul

reyoung requested a review from chengduoZH May 7, 2018 11:25

wangkuiyi reviewed May 7, 2018

View reviewed changes

Move MatMul to blas_impl.h

0a13d3c

Rename MatDim to MatDescriptor

chengduoZH reviewed May 8, 2018

View reviewed changes

Follow comments and polish code names

fcd31d6

reyoung force-pushed the feature/clean_matmul branch from df0d9b5 to fcd31d6 Compare May 8, 2018 05:19

reyoung and others added 2 commits May 8, 2018 15:07

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

96b703c

… feature/clean_matmul

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ad594b9

… feature/clean_matmul

chengduoZH approved these changes May 10, 2018

View reviewed changes

reyoung merged commit 705e734 into PaddlePaddle:develop May 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite Matmul, make code cleaner #10449

Rewrite Matmul, make code cleaner #10449

reyoung commented May 7, 2018

wangkuiyi left a comment •

edited

wangkuiyi May 7, 2018

reyoung May 8, 2018

wangkuiyi May 7, 2018

reyoung May 8, 2018 •

edited

wangkuiyi May 7, 2018

reyoung May 8, 2018

wangkuiyi May 7, 2018

chengduoZH May 8, 2018 •

edited

wangkuiyi May 7, 2018

reyoung May 8, 2018

wangkuiyi May 7, 2018

reyoung May 8, 2018

wangkuiyi May 7, 2018

reyoung May 8, 2018

wangkuiyi May 7, 2018

reyoung May 8, 2018 •

edited

wangkuiyi May 7, 2018

reyoung May 8, 2018

wangkuiyi May 7, 2018

reyoung May 8, 2018

chengduoZH May 8, 2018

chengduoZH May 8, 2018 •

edited

chengduoZH left a comment

Rewrite Matmul, make code cleaner #10449

Rewrite Matmul, make code cleaner #10449

Conversation

reyoung commented May 7, 2018

wangkuiyi left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung May 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH May 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reyoung May 8, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chengduoZH May 8, 2018 • edited

Choose a reason for hiding this comment

chengduoZH left a comment

Choose a reason for hiding this comment

wangkuiyi left a comment •

edited

reyoung May 8, 2018 •

edited

chengduoZH May 8, 2018 •

edited

reyoung May 8, 2018 •

edited

chengduoZH May 8, 2018 •

edited