Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Matmul, make code cleaner #10449

Merged
merged 6 commits into from
May 10, 2018

Conversation

reyoung
Copy link
Collaborator

@reyoung reyoung commented May 7, 2018

No description provided.

@reyoung reyoung requested a review from chengduoZH May 7, 2018 11:25
Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the comments, I also feel that this PR needs accompanying C++ unit tests.

I tried to make some changes in this PR: reyoung#8. Not sure how much it helps, but feel free to reject or merge. Thanks!

@@ -90,6 +101,28 @@ class Blas {
int K, T alpha, const T* A, const T* B, T beta, T* C,
int batchCount, int64_t strideA, int64_t strideB) const;

template <typename T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other methods in class template Blas are all in blas_impl.h, why not more the body of this new MatMul into blas_impl.h as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -46,6 +46,17 @@ namespace paddle {
namespace operators {
namespace math {

struct MatDim {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that MatDim is only used in Blas::MatMul, it seems that we should move MatDim into Blas::MatDim?

Also, the motivation of the introduction of MatDim is pretty confusing -- why cannot we use DDim for representing the shape of a matrix?

Copy link
Collaborator Author

@reyoung reyoung May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the name to MatDescriptor since it does not only describe the dimension.

Not only the dimension but also the layout and stride of a memory buffer is described. It is clearer to have an independent structure to describe (transpose, stride, batch_size, dimension) together instead of reusing DDim.

int64_t width_;
int64_t stride_{0};
int64_t batch_size_{0};
bool trans_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a comment explaining what does trans_ mean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

int64_t height_;
int64_t width_;
int64_t stride_{0};
int64_t batch_size_{0};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is batch_size? Matrix is just a 2D data brick; what is its relationship with batch?

Copy link
Contributor

@chengduoZH chengduoZH May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further optimize the performance of GEMM, some hardware(GPU and CPU) supports batched matrix computation, I think MatDim is a better choice to represent matrix or matrix with batch.
Some introduce from Intel and paper
https://software.intel.com/en-us/articles/introducing-batch-gemm-operations
http://www.netlib.org/utk/people/JackDongarra/PAPERS/batched-matrix-comp.pdf.

@@ -90,6 +101,28 @@ class Blas {
int K, T alpha, const T* A, const T* B, T beta, T* C,
int batchCount, int64_t strideA, int64_t strideB) const;

template <typename T>
void MatMul(const framework::Tensor& mat_a, const MatDim& dim_a,
const framework::Tensor& mat_b, const MatDim& dim_b, T alpha,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are users supposed to retrieve dim_a and dim_b?

If it is something like dim_a = GetMatDim(mat_a, ...), could MatMul call this GetMatDim; instead of letting users to make the call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetMatDescriptor can be used in shape inference, data validation, and determine whether to broadcast batch_size or not. It is not only be used by MatMul directly.

using DDim = framework::DDim;
using framework::make_ddim;
using framework::vectorize;
inline framework::DDim GetYDim(const framework::DDim& y_dim) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetYDim => ColumnMatrixShapeFromVector

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Tensor CombineBatchAndM(const Tensor& input) {
Tensor output;
output.ShareDataWith(input);
inline framework::Tensor CombineBatchAndM(const framework::Tensor& input) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CombineBatchAndM => UnfoldFirstTwoDims

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is actually a fold function, i.e., combine the first two dimension together.

I change the name to FoldInitDims. Init and Last is a common concept in functional programming languages, which init means the elements without the last one.

@@ -72,23 +73,57 @@ Tensor CombineBatchAndM(const Tensor& input) {
// (Warning: This requires transposing data and writes into new memory.)
// Identity op if the tensor is not of rank 3.
template <typename DeviceContext, typename T>
Tensor CombineBatchAndN(const DeviceContext& context, const Tensor& input) {
Tensor output;
inline framework::Tensor CombineBatchAndN(const DeviceContext& context,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CombineBatchAndN => UnfoldLastTwoDims

Copy link
Collaborator Author

@reyoung reyoung May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it to FoldHeadAndLastDims. It will combine the 1st and last dimension together.

return output;
}

inline void NormalizeTensorShape(framework::Tensor* x,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeTensorShape => ReshapeTensorIntoMatrixSequence

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}

inline void NormalizeXYOutTensorShape(framework::Tensor* x,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can hardly invent a name to describe the very complicated operation that's done by this function ...

I would highly recommend hiding the definition of this function in a nested namespace or in the class template MatMulGradKernel -- just not to make it pollute the namespace paddle.fluid.operator, so could we revisit its name later when we have a better idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to ReshapeXYOutIntoMatrixSequence.

BTW, I found the header matmul_op.h is not necessary, I just removed it and moving the implementations into matmul_op.cc. The definitions are hiding in the source file and no symbol is exported.

Rename MatDim to MatDescriptor
for (size_t i = 0; i < dim_vec.size() - 2; ++i) {
retv.batch_size_ *= dim_vec[i];
retv.height_ = dim_vec[dim_vec.size() - 2];
retv.width_ = dim_vec[dim_vec.size() - 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retv.height_ = dim_vec[dim_vec.size() - 2]; and retv.width_ = dim_vec[dim_vec.size() - 1]; should not be in the loop.

int64_t height_;
int64_t width_;
int64_t stride_{0};
int64_t batch_size_{0};
Copy link
Contributor

@chengduoZH chengduoZH May 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To further optimize the performance of GEMM, some hardware(GPU and CPU) supports batched matrix computation, I think MatDim is a better choice to represent matrix or matrix with batch.
Some introduce from Intel and paper
https://software.intel.com/en-us/articles/introducing-batch-gemm-operations
http://www.netlib.org/utk/people/JackDongarra/PAPERS/batched-matrix-comp.pdf.

Copy link
Contributor

@chengduoZH chengduoZH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@reyoung reyoung merged commit 705e734 into PaddlePaddle:develop May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants