Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transposition kernel optimised for small arrays #478

athas opened this issue Feb 19, 2018 · 0 comments


Copy link

commented Feb 19, 2018

An expression like

map transpose xsss

will run very slowly if the elements of xss are small. The reason is that when we generate OpenCL code for map-transpose, we assign one workgroup per array, and this leads to poor occupancy if the arrays are small (even if there are many of them). The solution is to have a specialised version of transposition for small arrays, like what @RasmusWL did for very skinny and very wide arrays.

This is a moderately urgent issue, as it affects one of the benchmarks for the ICFP paper.

@athas athas closed this in 203a6ee Feb 27, 2018
athas added a commit that referenced this issue Feb 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
1 participant
You can’t perform that action at this time.