-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Matmul, make code cleaner #10449
Conversation
c4b90a4
to
c6a6d87
Compare
… feature/clean_matmul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to the comments, I also feel that this PR needs accompanying C++ unit tests.
I tried to make some changes in this PR: reyoung#8. Not sure how much it helps, but feel free to reject or merge. Thanks!
@@ -90,6 +101,28 @@ class Blas { | |||
int K, T alpha, const T* A, const T* B, T beta, T* C, | |||
int batchCount, int64_t strideA, int64_t strideB) const; | |||
|
|||
template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other methods in class template Blas are all in blas_impl.h, why not more the body of this new MatMul into blas_impl.h as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/operators/math/blas.h
Outdated
@@ -46,6 +46,17 @@ namespace paddle { | |||
namespace operators { | |||
namespace math { | |||
|
|||
struct MatDim { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that MatDim is only used in Blas::MatMul, it seems that we should move MatDim
into Blas::MatDim
?
Also, the motivation of the introduction of MatDim is pretty confusing -- why cannot we use DDim for representing the shape of a matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the name to MatDescriptor
since it does not only describe the dimension.
Not only the dimension but also the layout and stride of a memory buffer is described. It is clearer to have an independent structure to describe (transpose, stride, batch_size, dimension) together instead of reusing DDim.
int64_t width_; | ||
int64_t stride_{0}; | ||
int64_t batch_size_{0}; | ||
bool trans_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a comment explaining what does trans_
mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/operators/math/blas.h
Outdated
int64_t height_; | ||
int64_t width_; | ||
int64_t stride_{0}; | ||
int64_t batch_size_{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is batch_size? Matrix is just a 2D data brick; what is its relationship with batch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further optimize the performance of GEMM, some hardware(GPU and CPU) supports batched matrix computation, I think MatDim
is a better choice to represent matrix or matrix with batch.
Some introduce from Intel and paper
https://software.intel.com/en-us/articles/introducing-batch-gemm-operations
http://www.netlib.org/utk/people/JackDongarra/PAPERS/batched-matrix-comp.pdf.
paddle/fluid/operators/math/blas.h
Outdated
@@ -90,6 +101,28 @@ class Blas { | |||
int K, T alpha, const T* A, const T* B, T beta, T* C, | |||
int batchCount, int64_t strideA, int64_t strideB) const; | |||
|
|||
template <typename T> | |||
void MatMul(const framework::Tensor& mat_a, const MatDim& dim_a, | |||
const framework::Tensor& mat_b, const MatDim& dim_b, T alpha, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are users supposed to retrieve dim_a and dim_b?
If it is something like dim_a = GetMatDim(mat_a, ...)
, could MatMul
call this GetMatDim
; instead of letting users to make the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetMatDescriptor
can be used in shape inference, data validation, and determine whether to broadcast batch_size
or not. It is not only be used by MatMul directly.
paddle/fluid/operators/matmul_op.h
Outdated
using DDim = framework::DDim; | ||
using framework::make_ddim; | ||
using framework::vectorize; | ||
inline framework::DDim GetYDim(const framework::DDim& y_dim) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetYDim => ColumnMatrixShapeFromVector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/operators/matmul_op.h
Outdated
Tensor CombineBatchAndM(const Tensor& input) { | ||
Tensor output; | ||
output.ShareDataWith(input); | ||
inline framework::Tensor CombineBatchAndM(const framework::Tensor& input) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CombineBatchAndM => UnfoldFirstTwoDims
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is actually a fold
function, i.e., combine the first two dimension together.
I change the name to FoldInitDims
. Init
and Last
is a common concept in functional programming languages, which init
means the elements without the last one.
paddle/fluid/operators/matmul_op.h
Outdated
@@ -72,23 +73,57 @@ Tensor CombineBatchAndM(const Tensor& input) { | |||
// (Warning: This requires transposing data and writes into new memory.) | |||
// Identity op if the tensor is not of rank 3. | |||
template <typename DeviceContext, typename T> | |||
Tensor CombineBatchAndN(const DeviceContext& context, const Tensor& input) { | |||
Tensor output; | |||
inline framework::Tensor CombineBatchAndN(const DeviceContext& context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CombineBatchAndN => UnfoldLastTwoDims
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename it to FoldHeadAndLastDims
. It will combine the 1st and last dimension together.
paddle/fluid/operators/matmul_op.h
Outdated
return output; | ||
} | ||
|
||
inline void NormalizeTensorShape(framework::Tensor* x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeTensorShape => ReshapeTensorIntoMatrixSequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
paddle/fluid/operators/matmul_op.h
Outdated
} | ||
} | ||
|
||
inline void NormalizeXYOutTensorShape(framework::Tensor* x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can hardly invent a name to describe the very complicated operation that's done by this function ...
I would highly recommend hiding the definition of this function in a nested namespace or in the class template MatMulGradKernel -- just not to make it pollute the namespace paddle.fluid.operator, so could we revisit its name later when we have a better idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to ReshapeXYOutIntoMatrixSequence
.
BTW, I found the header matmul_op.h
is not necessary, I just removed it and moving the implementations into matmul_op.cc
. The definitions are hiding in the source file and no symbol is exported.
Rename MatDim to MatDescriptor
paddle/fluid/operators/math/blas.cc
Outdated
for (size_t i = 0; i < dim_vec.size() - 2; ++i) { | ||
retv.batch_size_ *= dim_vec[i]; | ||
retv.height_ = dim_vec[dim_vec.size() - 2]; | ||
retv.width_ = dim_vec[dim_vec.size() - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retv.height_ = dim_vec[dim_vec.size() - 2];
and retv.width_ = dim_vec[dim_vec.size() - 1];
should not be in the loop.
paddle/fluid/operators/math/blas.h
Outdated
int64_t height_; | ||
int64_t width_; | ||
int64_t stride_{0}; | ||
int64_t batch_size_{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further optimize the performance of GEMM, some hardware(GPU and CPU) supports batched matrix computation, I think MatDim
is a better choice to represent matrix or matrix with batch.
Some introduce from Intel and paper
https://software.intel.com/en-us/articles/introducing-batch-gemm-operations
http://www.netlib.org/utk/people/JackDongarra/PAPERS/batched-matrix-comp.pdf.
df0d9b5
to
fcd31d6
Compare
… feature/clean_matmul
… feature/clean_matmul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
No description provided.