Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup csrmv sparse-dense matmul OpenCL kernel wrapper code #3010

Merged
merged 2 commits into from May 10, 2021

Conversation

9prady9
Copy link
Member

@9prady9 9prady9 commented Sep 8, 2020

Changes to Users

None.

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
  • [ ] Functions added to unified API
  • [ ] Functions documented

@9prady9 9prady9 added this to the 3.8.1 milestone Sep 8, 2020
@9prady9 9prady9 requested a review from umar456 September 8, 2020 09:34
@9prady9
Copy link
Member Author

9prady9 commented Sep 9, 2020

  • Note that this change is not kernel change, it reorganized and/or removed some unnecessary code which brought about 6x speed improvement in OpenCL backend's for sparse-matrix--dense-vector matmulNN operation.
  • There has been no change in speed for matmulTN operation though.

I am going to look into kernel improvements of NN and TN ops now - differed for later.

@9prady9
Copy link
Member Author

9prady9 commented Feb 25, 2021

* Note that this change is not kernel change, it reorganized and/or removed some unnecessary code
   which brought  about 6x speed improvement in OpenCL backend's for sparse-matrix--dense-vector
   matmulNN operation.

* There has been no change in speed for matmulTN operation though.

I am going to look into kernel improvements of NN and TN ops now - differed for later.

@umar456

I was unable to reproduce the 6x performance difference with this code reorg on lastest master with latest versions of dependencies. It seems like some sort of driver or upstream fix addressed the problem of optimizing out the unused global counter being passed for use_greedy code-path.

Nevertheless, the cleanup of the wrapper code is still valid. We don't need to pass the global counter when not used and we most definitely don't need to compile an additional kernel when not using it at all. I am going to reword the PR title to appropriately reflect the change.

@9prady9 9prady9 changed the title Improve csrmv sparse-dense matmul performance slightly Cleanup csrmv sparse-dense matmul OpenCL kernel wrapper code Feb 25, 2021
@9prady9 9prady9 merged commit 3f080ba into arrayfire:master May 10, 2021
@9prady9 9prady9 deleted the issue2937 branch May 10, 2021 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants