Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Launch configuration: use ZE_extension_kernel_max_group_size_properties #430

Closed
maleadt opened this issue Apr 19, 2024 · 3 comments
Closed
Labels
kernels Things about kernels and how they are compiled.

Comments

@maleadt
Copy link
Member

maleadt commented Apr 19, 2024

With prime-sized inputs the suggested group size always consists of only a single thread:

julia> k = @oneapi launch=false identity(nothing)

julia> oneL0.suggest_groupsize(k.fun, 521)
oneAPI.oneL0.ZeDim3(1, 1, 1)

julia> oneL0.suggest_groupsize(k.fun, 7877)
oneAPI.oneL0.ZeDim3(1, 1, 1)

julia> oneL0.suggest_groupsize(k.fun, 7919)
oneAPI.oneL0.ZeDim3(1, 1, 1)

But also with non prime-sized inputs the configuration looks highly suboptimal:

julia> oneL0.suggest_groupsize(k.fun, 8000)
oneAPI.oneL0.ZeDim3(64, 1, 1)

(this kernel can launch groups of 512 threads on this system)

Maybe I'm misinterpreting the use of this API? I thought it was a counterpart of the CUDA occupancy API (cuOccupancyMaxPotentialBlockSize), suggesting a groupsize that accomplishes a reasonable occupancy.

@maleadt
Copy link
Member Author

maleadt commented Apr 19, 2024

Filed upstream: intel/compute-runtime#725

@maleadt maleadt added libraries Things about libraries and how we use them. upstream Out of our hands. labels Apr 19, 2024
@maleadt
Copy link
Member Author

maleadt commented Apr 22, 2024

As noted by upstream, this is expected; the suggested launch configuration exactly covers the input space. Since we don't care about this, using bounds checks at run time, we can use more relaxed launch configurations. A workaround is implemented in #431, but once there's a new driver release we should use the Level Zero extension to query the maximum launch configuration for a given kernel.

@maleadt maleadt changed the title Confusing suggest_groupsize results Launch configuration: use ZE_extension_kernel_max_group_size_properties Apr 22, 2024
@maleadt maleadt added kernels Things about kernels and how they are compiled. and removed upstream Out of our hands. libraries Things about libraries and how we use them. labels Apr 22, 2024
@maleadt
Copy link
Member Author

maleadt commented May 16, 2024

Fixed by #431

@maleadt maleadt closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernels Things about kernels and how they are compiled.
Projects
None yet
Development

No branches or pull requests

1 participant