Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. #20972

GaoWei8 · 2019-11-01T11:08:57Z

当使用MKL计算的矩阵的尺寸是128的倍数时，内存存取的时间会大大增加。
优化方案是内存尺寸为128的倍数时，做尺寸加4的padding。内存调用的时间会减少。
Intel MKL内存调用分析

预测模型ERNIE padding优化前后性能对比：

线程数	优化前	优化后	提升
单线程	276.253 ms	251.464ms	8.97%
20线程	52.1854ms	29.9978ms	42.52%

经过测试，需要同时对FC计算中的W和X同时做Padding，才有较好的性能提升。

线程数	优化前	只padding w 不padding x	只padding x 不padding w
20线程	52.1854ms	53.2533ms	50.9526ms

test=develop

paddle/fluid/operators/fc_op.cc

luotao1 · 2019-11-14T14:45:21Z

PR描述里需要说一下，性能提升的结果。

test=develop

chenwhql

LGTM for PADDLE_ENFORCE

luotao1

LGTM

test=develop

paddle/fluid/framework/ir/fc_fuse_pass.cc

paddle/fluid/inference/tests/api/analyzer_bert_tester.cc

paddle/fluid/operators/fc_op.cc

paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc

paddle/fluid/operators/math/fc.cc

Xreki · 2019-11-20T09:28:08Z

paddle/fluid/operators/math/fc.cc

    auto blas = math::GetBlas<platform::CPUDeviceContext, T>(context);
-    blas.MatMul(M, N, K, X, W, Y);
+    framework::Tensor X1, Y1;


不要在一行代码里面定义多个变量。

fc_fuse_pass里面只是判断weight是否做padding，kernel里面要进一步判断x是否要padding。w和x的padding是独立的。所以实现里面可能分好几种情况：

w padding了，x看是否要padding

w不padding，x看是否要做padding

也不一定每个分支都要支持，但x是否padding肯定是要做判断的吧？

需要padding的时候再临时Tensor。

weight的尺寸是K*N，同时被128整除的时候，才做padding。weight和X的padding应该是同步的，所以在pass中做了padding之后，这里的X就不判断了。

临时Tensor的Y1因为在下面的计算中要用到，所以是需要放在判断条件外的。临时变量X1，可以放到判断条件里。

paddle/fluid/operators/math/fc.cc

test=develop

paddle/fluid/framework/ir/fc_fuse_pass.cc

paddle/fluid/framework/ir/fc_fuse_pass_tester.cc

paddle/fluid/operators/math/fc.cc

paddle/fluid/operators/math/fc.cu

paddle/fluid/operators/mkldnn/fc_mkldnn_op.cc

Xreki · 2019-11-22T03:30:05Z

python/paddle/fluid/tests/unittests/test_fc_op.py

@@ -78,6 +78,9 @@ def setUp(self):
            'Out': fc_refer(self.matrix, self.with_bias, self.with_relu)
        }

+    def padding(self):
+        self.attrs = {'padding_weights': False}


这里是不会生效的，因为没有地方调用到了padding这个函数。

因为重新修改了FC的padding策略，这里可以测试 (N % 128 == 0 && K % 128 == 0)时的正确性。

Xreki · 2019-11-22T03:31:40Z

paddle/fluid/operators/math/fc.cc

+    framework::Tensor Y1;
+    Y1.Resize({M * (N + 4)});
+    T* Y1_data = Y1.mutable_data<T>(platform::CPUPlace());
+    if (padding_weights) {


如果weights没有提前padding好，那是否需要在这里对x和w做padding？

重新改了FC中加padding的策略。
当N % 128 == 0 && K % 128 == 0时，都会对x和w做padding。
如果weights提前padding好，在fc.cc中省略对w做padding。

python/paddle/fluid/tests/unittests/test_fc_op.py

Xreki · 2019-11-22T03:34:51Z

PR描述里面提供一下只padding w不padding x、只padding x不padding w的性能数据吧

test=develop

Xreki

一些小的修改意见，下个PR再改吧。

paddle/fluid/framework/ir/fc_fuse_pass.cc

paddle/fluid/operators/math/fc.cc

python/paddle/fluid/tests/unittests/test_fc_op.py

…tiple of 128. (PaddlePaddle#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop

…22198) * Optimize the kernel implementation of layernorm with openmp (#20895) * Add ernie c++ inference test (#21015) * Add ernie unit test test=develop * Add ernie unit test test=develop * Add ernie unit test test=develop * remove ngraph * optimize gpu test test=develop * optimize codes test=develop * fix cmake fails on inference_download_and_uncompress (#21185) * solve cmake fails on inference_download_and_uncompress test=develop * solve cmake fails on inference_download_and_uncompress test=develop * Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. (#20972) * Add fc padding to solve mkl performance test=develop * fix gpu pass and error information test=develop * fix fc_fuse_pass_test test=develop * fix error information test=develop * fix error information test=develop * fix name and add fc op padding test test=develop * fix attributes test=develop * optimize fc padding test=develop * fix test test=develop * Polish the codes of fc when needs padding (#21378) test=develop * Add ernie large c++ inference test (#21365) * add ernie-large test test=develop * add ernie large c++ inference test test=develop * Modify padding strategy: remove weight copy in fc padding (#21650) test=develop * optimize fc jit (#21878) test=develop Co-authored-by: Yihua Xu <yihuaxu@hotmail.com>

GaoWei8 force-pushed the padding-fcc branch from 639bff7 to 4bca50e Compare November 13, 2019 08:13

Add fc padding to solve mkl performance

f04d44e

test=develop

GaoWei8 force-pushed the padding-fcc branch from 4bca50e to f04d44e Compare November 14, 2019 09:40

luotao1 reviewed Nov 14, 2019

View reviewed changes

paddle/fluid/operators/fc_op.cc Outdated Show resolved Hide resolved

GaoWei8 added 3 commits November 15, 2019 10:50

fix gpu pass and error information

2517bfc

test=develop

fix fc_fuse_pass_test

6c113d1

test=develop

fix error information

e6b130c

test=develop

chenwhql previously approved these changes Nov 19, 2019

View reviewed changes

luotao1 requested a review from Xreki November 19, 2019 03:47

luotao1 previously approved these changes Nov 19, 2019

View reviewed changes

chenwhql mentioned this pull request Nov 19, 2019

Fix PADDLE_ENFORCE ci check bug #21233

Merged

fix error information

84bf353

test=develop

GaoWei8 dismissed stale reviews from luotao1 and chenwhql via 84bf353 November 19, 2019 11:32

GaoWei8 mentioned this pull request Nov 20, 2019

Optimize inference performance of ERNIE on CPU PaddlePaddle/benchmark#180

Open

3 tasks

Xreki reviewed Nov 20, 2019

View reviewed changes

GaoWei8 added 2 commits November 21, 2019 02:07

fix name and add fc op padding test

fb4a2a3

test=develop

fix attributes

b97d534

test=develop

Xreki reviewed Nov 22, 2019

View reviewed changes

GaoWei8 added 2 commits November 22, 2019 09:39

optimize fc padding

3e01771

test=develop

fix test

726a606

test=develop

wojtuss mentioned this pull request Nov 22, 2019

Optimize inference performance of ERNIE INT8 on CPU PaddlePaddle/benchmark#275

Open

Xreki approved these changes Nov 26, 2019

View reviewed changes

Xreki changed the title ~~Add fc padding to solve mkl performance~~ Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. Nov 26, 2019

Xreki merged commit 234060f into PaddlePaddle:develop Nov 26, 2019

GaoWei8 mentioned this pull request Nov 27, 2019

Polish the codes of fc when needs padding #21378

Merged

GaoWei8 mentioned this pull request Dec 23, 2019

Found regression on BERT inference #21637

Closed

GaoWei8 mentioned this pull request Jan 9, 2020

[cherry-pick] Add FC padding, ernie test unit and layernorm parallel #22198

Merged

GaoWei8 deleted the padding-fcc branch April 3, 2020 03:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. #20972

Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. #20972

GaoWei8 commented Nov 1, 2019 •

edited

luotao1 commented Nov 14, 2019

chenwhql left a comment

luotao1 left a comment

Xreki Nov 20, 2019

GaoWei8 Nov 20, 2019

Xreki Nov 22, 2019

GaoWei8 Nov 22, 2019

Xreki Nov 22, 2019

GaoWei8 Nov 22, 2019 •

edited

Xreki commented Nov 22, 2019

Xreki left a comment

Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. #20972

Add fc padding to improve mkl GEMM's performance when N and K are multiple of 128. #20972

Conversation

GaoWei8 commented Nov 1, 2019 • edited

luotao1 commented Nov 14, 2019

chenwhql left a comment

Choose a reason for hiding this comment

luotao1 left a comment

Choose a reason for hiding this comment

Xreki Nov 20, 2019

Choose a reason for hiding this comment

GaoWei8 Nov 20, 2019

Choose a reason for hiding this comment

Xreki Nov 22, 2019

Choose a reason for hiding this comment

GaoWei8 Nov 22, 2019

Choose a reason for hiding this comment

Xreki Nov 22, 2019

Choose a reason for hiding this comment

GaoWei8 Nov 22, 2019 • edited

Choose a reason for hiding this comment

Xreki commented Nov 22, 2019

Xreki left a comment

Choose a reason for hiding this comment

GaoWei8 commented Nov 1, 2019 •

edited

GaoWei8 Nov 22, 2019 •

edited