Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 'delayed kernels' #569

Merged
merged 4 commits into from
Nov 25, 2020
Merged

Support for 'delayed kernels' #569

merged 4 commits into from
Nov 25, 2020

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 24, 2020

When you need to introspect the compiled kernel, e.g. to determine a launch configuration, you either have to do the whole cudaconvert and cufunction dance manually, or use the hacky config=callback argument to @cuda. Both are pretty cumbersome, so here I introduce an alternative: @cuda delayed=true kernel(args...), returning a callable object you can then just introspect and finally call using kernel_object(args...; threads=..., blocks=..., shmem=...).

cc @vchuravy, as KA probably uses the lower-level interface.

@maleadt maleadt added the enhancement New feature or request label Nov 24, 2020
@maleadt
Copy link
Member Author

maleadt commented Nov 24, 2020

Also, I don't particularly like the delayed name, so bikeshed away!

@maleadt maleadt mentioned this pull request Nov 24, 2020
@maleadt
Copy link
Member Author

maleadt commented Nov 24, 2020

Or maybe it should be @cufunction.

@codecov
Copy link

codecov bot commented Nov 24, 2020

Codecov Report

Merging #569 (d7cd82e) into master (3c9e3dd) will decrease coverage by 0.06%.
The diff coverage is 96.07%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #569      +/-   ##
==========================================
- Coverage   80.34%   80.28%   -0.07%     
==========================================
  Files         116      116              
  Lines        6889     6883       -6     
==========================================
- Hits         5535     5526       -9     
- Misses       1354     1357       +3     
Impacted Files Coverage Δ
src/compiler/execution.jl 89.92% <81.81%> (-1.62%) ⬇️
examples/pairwise.jl 78.00% <100.00%> (-0.85%) ⬇️
examples/peakflops.jl 100.00% <100.00%> (ø)
src/accumulate.jl 97.14% <100.00%> (-0.23%) ⬇️
src/gpuarrays.jl 40.90% <100.00%> (-4.93%) ⬇️
src/indexing.jl 100.00% <100.00%> (ø)
src/mapreduce.jl 97.72% <100.00%> (-0.15%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c9e3dd...c8a8573. Read the comment docs.

@vchuravy
Copy link
Member

KA probably uses the lower-level interface.

GPUifyLoops used to, but for KA I switched back to the simpler one. But yes this is great @cufunction is great.

@maleadt
Copy link
Member Author

maleadt commented Nov 25, 2020

Changed it from delayed=true to launch=false. Every @cuda now returns the kernel object too. I decided against @cufunction because it uses values, not types.

@maleadt maleadt added the cuda kernels Stuff about writing CUDA kernels. label Nov 25, 2020
@maleadt maleadt merged commit 4655350 into master Nov 25, 2020
@maleadt maleadt deleted the tb/delayed branch November 25, 2020 17:00
maleadt added a commit that referenced this pull request Jan 5, 2021
Support for 'delayed kernels'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants