Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the GPU kernel of fc operator #19687

Merged
merged 10 commits into from
Sep 11, 2019
Merged

Conversation

Xreki
Copy link
Contributor

@Xreki Xreki commented Sep 6, 2019

Implement the GPU kernel for fc_op.
Will refine the fc_fuse_pass to enable the fuse of relu in the next PR.

@Xreki Xreki changed the title Enable the fused computation of fc operator Implement the GPU kernel of fc operator Sep 10, 2019
Copy link
Contributor

@tensor-tang tensor-tang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhaoyuchen2018 zhaoyuchen2018 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


const int kThreadsPerBlock = 1024;
int max_threads = context.GetMaxPhysicalThreadCount();
int num_threads = std::min(kThreadsPerBlock, (((N + 31) >> 5) << 5));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于N比较小的case需要特殊处理下,因为block里面的线程比较少,看会不会有性能问题,后面可以看下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,后面专门看下这类计算的性能。

@Xreki Xreki merged commit a65c728 into PaddlePaddle:develop Sep 11, 2019
@Xreki Xreki deleted the pass_fc_fuse branch October 29, 2019 00:38
seiriosPlus pushed a commit to seiriosPlus/Paddle that referenced this pull request Dec 9, 2019
* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants