Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speed]"merge softmax kernel" #9357

Closed

Conversation

dzhwinter
Copy link
Contributor

A future more work on #8594
In our current implementation, a lod describe the Tensor in several blocks, then every range in the lod will call Cuda kernel once.
However, this implement doesn't optimize enough, because Cuda kernel launch time is far more than the Cuda kernel execution time. So I merge these operations into one Cuda kernel to accelerate the sequence_softmax kernel.

@dzhwinter dzhwinter changed the title "add detail of merge softmax kernel" [Speed]"merge softmax kernel" Mar 27, 2018
@paddle-bot-old paddle-bot-old bot closed this May 22, 2020
@paddle-bot-old
Copy link

Since you haven't replied for a long time, we have closed this issue/pr.
If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up.
由于您长期未回复,我们将关闭这个issue/pr。
若问题未解决或有后续问题,请随时重新打开,我们会继续跟进。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant