Blas optimized elementwise_add forward and backward passes #10913

tpatejko · 2018-05-24T13:18:12Z

This PR implements optimization of elementwse_add forward and backward passes.
It includes for forward pass:

MKL VML-based optimization with v?Add then MKL/MKLDNN are used
Blas-based optimization with VCopy and SAXPY operations when MKL is disabled

For backward pass:

Blas level 1 VCopy is used for copying dx and dy vectors.

When integral or float16 types, or GPU device are used, the implementation falls back to the default (generic) elementwise_add operation.

…is used

…float16 and/or GPU

…fall back to default impl

tpatejko · 2018-05-24T21:23:01Z

This PR implements the following issue #10786.

luotao1

LGTM！Thanks for this speedup, I test it on OCR CRNN_CTC model, the total elapsed time (repeat 100 times of model) of elementwise_add op is from 467ms to 428ms.

tpatejko · 2018-05-25T07:42:25Z

@luotao1 Thanks for this information. Does the model converge?

luotao1 · 2018-05-25T07:50:08Z

@tpatejko I only test the inference speed, but in our unit-tests, https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/tests/book will test the model converge.

Tomasz Patejko added 7 commits May 24, 2018 15:16

MKL elementwise add: elementwise_add uses vAdd VML function when MKL …

e43c8f3

…is used

MKL elementwise_add: BLAS version compiles with integral types

6f93248

MKL elementwise add: default implementation used for integral types, …

01fb2be

…float16 and/or GPU

MKL elementwise add backward: Initial implementation with vector copy

5a622c2

MKL optimized elementwise add backward: coding style fixes

996d12f

MKL elementwise add backward: grad inputs copied when they are not null

fde47aa

MKL elementwise add backward: backward works for integral types with …

9241011

…fall back to default impl

tpatejko requested review from luotao1 and tensor-tang May 24, 2018 13:18

tpatejko added the Intel label May 24, 2018

MKL optimized elementwise add: fix style check

3e876b3

luotao1 approved these changes May 25, 2018

View reviewed changes

luotao1 merged commit bab1196 into PaddlePaddle:develop May 25, 2018

luotao1 added this to Done in Intel Optimization on Fluid May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blas optimized elementwise_add forward and backward passes #10913

Blas optimized elementwise_add forward and backward passes #10913

tpatejko commented May 24, 2018

tpatejko commented May 24, 2018

luotao1 left a comment

tpatejko commented May 25, 2018

luotao1 commented May 25, 2018

Blas optimized elementwise_add forward and backward passes #10913

Blas optimized elementwise_add forward and backward passes #10913

Conversation

tpatejko commented May 24, 2018

tpatejko commented May 24, 2018

luotao1 left a comment

Choose a reason for hiding this comment

tpatejko commented May 25, 2018

luotao1 commented May 25, 2018