New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup csrmv sparse-dense matmul OpenCL kernel wrapper code #3010
Conversation
I am going to look into kernel improvements of NN and TN ops now - differed for later. |
a4e4f8b
to
059fdd6
Compare
I was unable to reproduce the 6x performance difference with this code reorg on lastest master with latest versions of dependencies. It seems like some sort of driver or upstream fix addressed the problem of optimizing out the unused global counter being passed for Nevertheless, the cleanup of the wrapper code is still valid. We don't need to pass the global counter when not used and we most definitely don't need to compile an additional kernel when not using it at all. I am going to reword the PR title to appropriately reflect the change. |
Changes to Users
None.
Checklist
[ ] Functions added to unified API[ ] Functions documented