-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use KernelAbstractions to accelerate MultilayerQG.streamfunctionfrompv!
#112
Comments
The first step is to write a kernel, which will look something like @kernel invert_column!(ψh, qh, S⁻¹)
i, j = @index(Global, NTuple)
@inbounds ψh[i, j] .= S⁻¹[i, j] * qh[i, j]
end The next step is to create a work layout over which the kernel is launched. If we restrict attention to models that always have more than 32 grid points, we can use something like # Larger workgroups are generally more efficient. For more generality, we could put an if statement that incurs
# different behavior when either nkl or nl are less than 16
workgroup = 16, 16
# The size determines how many times the kernel is run
worksize = grid.nkr, grid.nl
# This (and its useage below) will ensure the kernel is not run _before_ the data in qh is available
barrier = Event(dev)
# Creates a loop over the specified worksize, using workgroup to organize the computation
loop_invert_column! = invert_column!(dev, workgroup, worksize)
# Launch the kernel
event = loop_invert_column!(ψh, qh, params.invS, dependencies=barrier)
# This will ensure that no other operations occur until the kernel has finished
wait(dev, event) |
One thing I am not totally sure about is whether |
By the way, I think this optimization also requires the columns of |
With this last suggestion would x, y FFTs work nicely? |
Oof, good point. Hmm, maybe we need to hand-write the matrix matrix multiply then. Not sure. |
yes it's been coming to haunt us either way... |
Something like @kernel invert_column!(ψh, qh, S⁻¹)
i, j = @index(Global, NTuple)
ψh_column = view(ψh, i, j, :)
qh_column = view(qh, i, j, :)
@inbounds ψh_column .= S⁻¹[i, j] * qh_column
end might work. |
Otherwise a kernel along the lines of using KernelAbstractions.Extras.LoopInfo: @unroll
@kernel invert_column!(ψh, qh, S⁻¹, nz)
i, j = @index(Global, NTuple)
@unroll for k = 1:nz
@inbounds ψh[i, j, k] = 0
@unroll for m = 1:nz
@inbounds ψh[i, j, k] += S⁻¹[i, j][k, m] * qh[i, j, m]
end
end
end might work, alternatively. Or maybe my indices are screwed up --- whichever is correct. Nothing is too difficult, it's just a matter of trying it out. |
I should resurrect this.. |
What about https://github.com/mcabbott/Tullio.jl to the rescue? (just a random thought) |
There's probably a lot of solutions! I think I gave two, but there might be more. |
KernelAbstractions.jl
can be used to accelerate the functionGeophysicalFlows.jl/src/multilayerqg.jl
Lines 299 to 302 in 47a2b51
A simple example showing how to use
KernelAbstractions
is the "Naive Transpose":https://juliagpu.gitlab.io/KernelAbstractions.jl/examples/naive_transpose/
The text was updated successfully, but these errors were encountered: