Extend Matmul to support matrix multiplication with multiple heads #18570

czhu15 · 2019-07-10T02:24:31Z

With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16].

Use this new extension, we can avoid head split/merge in Transformer.

first part of #16342

With the support of multiple head, the multiplication of two big matrixes is split into multiplication of several (head_number) small matrixes. e.g. if Mat A is [3, 24] and Mat B is [24, 4], when multiple A and B with head_number as 4, Mat A will be split as 4 matrix of [3, 6] and Mat B will be 4 matrix of [6, 4]. The result of final matrix will be 4 matrix of [3, 4], i.e. [3, 16]. test=develop

… matmul_with_multiple_head

To support matmul with multiple heads, it requires MKL stride feature of matrix multiplication. test=develop

…/Paddle into matmul_with_multiple_head

test=develop

bingyanghuang · 2019-07-16T03:20:04Z

@yihuaxu Please help review this PR. @wojtuss could you help assign someone in your team to help review this PR?

yihuaxu · 2019-07-17T03:39:20Z

@yihuaxu Please help review this PR. @wojtuss could you help assign someone in your team to help review this PR?

LGTM

paddle/fluid/operators/matmul_op.cc

python/paddle/fluid/tests/unittests/test_matmul_op_with_head.py

paddle/fluid/operators/math/blas.h

test=develop

… matmul_with_multiple_head

…/Paddle into matmul_with_multiple_head test=develop

numpy can handle it correctly without user's manual handling. test=develop

paddle/fluid/operators/math/blas_impl.h

paddle/fluid/operators/matmul_op.cc

paddle/fluid/operators/math/blas.h

wojtuss

LGTM

paddle/fluid/operators/math/blas.h

paddle/fluid/operators/math/blas_impl.h

paddle/fluid/operators/matmul_op.cc

luotao1

LGTM

luotao1 · 2019-07-24T09:52:35Z

python/paddle/fluid/tests/unittests/test_matmul_op_with_head.py

+        if transpose_X:
+            shape_X = [M]
+        else:
+            shape_X = [K]


reduce 4 lines to 1 line.
shape_X = [M] if transpose_X else [K]
You can update in your next PR

czhu15 added 2 commits July 10, 2019 10:07

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c8164c4

… matmul_with_multiple_head

luotao1 added the Intel label Jul 10, 2019

czhu15 added 6 commits July 11, 2019 11:23

limit the matmul with head feature to MKL only

4f7721e

To support matmul with multiple heads, it requires MKL stride feature of matrix multiplication. test=develop

limit the matmul with head feature to MKL only

50e0e83

To support matmul with multiple heads, it requires MKL stride feature of matrix multiplication. test=develop

Merge branch 'matmul_with_multiple_head' of https://github.com/czhu15…

a00b8b8

…/Paddle into matmul_with_multiple_head

Merge branch 'matmul_with_multiple_head' of https://github.com/czhu15…

61fe8f7

…/Paddle into matmul_with_multiple_head

Merge branch 'matmul_with_multiple_head' of https://github.com/czhu15…

508f5c7

…/Paddle into matmul_with_multiple_head

limit the unit test of matmul with head to MKL and NO GPU

20b6ed8

test=develop

wojtuss reviewed Jul 22, 2019

View reviewed changes

czhu15 added 2 commits July 23, 2019 11:35

limit the unit test of matmul with head to MKL and NO GPU

882bcf2

test=develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

8e7224d

… matmul_with_multiple_head

czhu15 force-pushed the matmul_with_multiple_head branch from 5ce5a01 to a8a3040 Compare July 23, 2019 03:44

czhu15 added 2 commits July 23, 2019 11:49

Merge branch 'matmul_with_multiple_head' of https://github.com/czhu15…

a8a3040

…/Paddle into matmul_with_multiple_head test=develop

improve the 1-dim array transposing handling (PaddlePaddle#18570)

0ed4ac7

numpy can handle it correctly without user's manual handling. test=develop

wojtuss reviewed Jul 23, 2019

View reviewed changes

paddle/fluid/operators/math/blas_impl.h Show resolved Hide resolved

paddle/fluid/operators/matmul_op.cc Show resolved Hide resolved

paddle/fluid/operators/math/blas.h Show resolved Hide resolved

wojtuss approved these changes Jul 24, 2019

View reviewed changes

paddle/fluid/operators/math/blas.h Show resolved Hide resolved

paddle/fluid/operators/math/blas_impl.h Show resolved Hide resolved

paddle/fluid/operators/matmul_op.cc Show resolved Hide resolved

luotao1 approved these changes Jul 24, 2019

View reviewed changes

luotao1 merged commit 220eef6 into PaddlePaddle:develop Jul 24, 2019

czhu15 deleted the matmul_with_multiple_head branch July 25, 2019 01:26

Xreki mentioned this pull request Aug 14, 2019

Optimize inference performance of ERNIE on P40 GPU PaddlePaddle/benchmark#165

Open

bingyanghuang mentioned this pull request Sep 2, 2019

Remove reshape transpose in attention module for bert optimization #19585

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Matmul to support matrix multiplication with multiple heads #18570

Extend Matmul to support matrix multiplication with multiple heads #18570

czhu15 commented Jul 10, 2019 •

edited by luotao1

bingyanghuang commented Jul 16, 2019

yihuaxu commented Jul 17, 2019

wojtuss left a comment

luotao1 left a comment

luotao1 Jul 24, 2019

Extend Matmul to support matrix multiplication with multiple heads #18570

Extend Matmul to support matrix multiplication with multiple heads #18570

Conversation

czhu15 commented Jul 10, 2019 • edited by luotao1

bingyanghuang commented Jul 16, 2019

yihuaxu commented Jul 17, 2019

wojtuss left a comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

luotao1 Jul 24, 2019

Choose a reason for hiding this comment

czhu15 commented Jul 10, 2019 •

edited by luotao1