-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: better block/grid size determination #54
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #54 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 8 8
Lines 552 567 +15
Branches 86 88 +2
=========================================
+ Hits 552 567 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @steven-murray . Look fine to me. A more proficient eye (than mine) on the GPU code will probably not hurt.
This fixes the setting of block and grid sizes. The total block size has to be less or equal to the nthreads (usually 1024). This was failing when there were too many ants. I've now refactored it out to a function which does it properly (for the case where the first axis is fastest).
It may be useful to check out whether there's any preference in CUDA itself for which axis should be fastest.