[Speed]"merge softmax kernel" #9357

dzhwinter · 2018-03-25T14:15:47Z

A future more work on #8594
In our current implementation, a lod describe the Tensor in several blocks, then every range in the lod will call Cuda kernel once.
However, this implement doesn't optimize enough, because Cuda kernel launch time is far more than the Cuda kernel execution time. So I merge these operations into one Cuda kernel to accelerate the sequence_softmax kernel.

…tmax

paddle-bot-old · 2020-05-22T06:41:07Z

Since you haven't replied for a long time, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您长期未回复，我们将关闭这个issue/pr。
若问题未解决或有后续问题，请随时重新打开，我们会继续跟进。

dzhwinter added 3 commits March 25, 2018 21:56

"add detail of merge kernel"

8dd1c28

"fix ci"

4b7d5b6

small fix

d42b651

dzhwinter changed the title ~~"add detail of merge softmax kernel"~~ [Speed]"merge softmax kernel" Mar 27, 2018

dzhwinter added 3 commits March 28, 2018 05:29

Merge remote-tracking branch 'origin/develop' into speed/sequence_sof…

536a3df

…tmax

remove local change

09312c8

"fix ci"

b75b8ae

paddle-bot-old bot closed this May 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speed]"merge softmax kernel" #9357

[Speed]"merge softmax kernel" #9357

dzhwinter commented Mar 25, 2018

paddle-bot-old bot commented May 22, 2020

[Speed]"merge softmax kernel" #9357

[Speed]"merge softmax kernel" #9357

Conversation

dzhwinter commented Mar 25, 2018

paddle-bot-old bot commented May 22, 2020