optimize the realization of cuda dropout #19136

wangchaochaohu · 2019-08-12T06:56:04Z

dropout use thrust api to generate the random is too slow, use the curand can speed up
PaddlePaddle/benchmark#148

测试transformer-big（enable_ce）中dropout OP 的平均耗时：(利用PaddlePaddle的profiler工具）：
Ave Time : 1.16155 ----> 0.344537

transformer-big模型加速效果，性能提升约：10%

luotao1

.dropout_op.cu.swm 是误传？

wangchaochaohu · 2019-08-12T07:32:12Z

.dropout_op.cu.swm 是误传？

误传已经删了

chengduoZH

LGTM

wangchaochaohu added 5 commits August 10, 2019 07:08

cuda optimie for dropout

71f4855

remove tmp swp file

37097fc

fix compile error test=develop

3944b1b

test=develop optimize the cuda realization of dropout op

ffedacc

remove unsed code test=develop

57fb54e

luotao1 reviewed Aug 12, 2019

View reviewed changes

remove tmp file test=develop

3068c19

wangchaochaohu requested review from guoshengCS, Xreki and chengduoZH August 12, 2019 09:04

wangchaochaohu mentioned this pull request Aug 13, 2019

Optimize the performance of Transformer-Big on 1 V100 GPU PaddlePaddle/benchmark#148

Open

chengduoZH approved these changes Aug 20, 2019

View reviewed changes

wangchaochaohu merged commit 6e326ca into PaddlePaddle:develop Aug 20, 2019

Provide feedback