-
Notifications
You must be signed in to change notification settings - Fork 78
Replace copy_stack_field_down!
with broadcasts of reshaped MPIStateArrays
#1301
Conversation
@jkozdon if this works on the GPU, should I just remove |
bors try |
tryBuild failed: |
@leios can you check why the ocean tests failed on the GPU on the slurm CI? this is a practical usecase and it seems to not work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real opinion.
If you cannot get it working with reshapes of the underlying mpistate array, you can also just do it with the data directly. |
Yeah, so looking at this further, the types for the broadcast that are failing are:
PR #1071 allows for broadcasting of reshaped MPIStateArrays, but this might be more closely related to an issue with adapt, itself |
Right. Just saw that. This has nothing to with with #1071, since it's broadcast the data based on the vars views of the data.
This seems likely. |
boxy_w = reshape(A.w, Nq^2, Nqk, 1, nelemv, nelemh) | ||
flat_w = @view boxy_w[:, end, :, end, :] | ||
flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh) | ||
boxy_wz0 = reshape(A.wz0, Nq^2, Nqk, 1, nelemv, nelemh) | ||
boxy_wz0 .= flat_wz0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would adding .data
after A.w
and A.wz0
solve the issue? The split-explicit code we developed with @jm-c does this quite a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's worth a shot. If that doesn't work, we can export the MPIStateArray functions we created to find .data
of adapted MPIStateArrays.
Nothing to do with CLIMA or MPIStateArrays. using CUDA;
CUDA.allowscalar(false);
Nq = 5;
Nqk = 5;
nelemv = 10;
nelemh = 100;
nelemh_ghost = 10;
nstate = 5;
Q = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
A = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
w = view(Q, 3, :, :);
wz0 = view(A, 1, :, :);
boxy_w = reshape(w, Nq^2, Nqk, 1, nelemv, nelemh);
flat_w = @view boxy_w[:, end, :, end, :];
flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh);
boxy_wz0 = reshape(wz0, Nq^2, Nqk, 1, nelemv, nelemh);
boxy_wz0 .= flat_wz0; will produce: julia> using CUDA;
julia> CUDA.allowscalar(false);
julia> Nq = 5;
julia> Nqk = 5;
julia> nelemv = 10;
julia> nelemh = 100;
julia> nelemh_ghost = 10;
julia> nstate = 5;
julia> Q = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
julia> A = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
julia> w = view(Q, 3, :, :);
julia> wz0 = view(A, 1, :, :);
julia> boxy_w = reshape(w, Nq^2, Nqk, 1, nelemv, nelemh);
julia> flat_w = @view boxy_w[:, end, :, end, :];
julia> flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh);
julia> boxy_wz0 = reshape(wz0, Nq^2, Nqk, 1, nelemv, nelemh);
julia> boxy_wz0 .= flat_wz0;
ERROR: scalar getindex is disallowed
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] assertscalar(::String) at /home/jekozdon/.julia/packages/GPUArrays/X4SqE/src/host/indexing.jl:41
[3] getindex at /home/jekozdon/.julia/packages/GPUArrays/X4SqE/src/host/indexing.jl:96 [inlined]
[4] getindex at ./subarray.jl:245 [inlined]
[5] _unsafe_getindex_rs at ./reshapedarray.jl:249 [inlined]
[6] _unsafe_getindex at ./reshapedarray.jl:246 [inlined]
[7] getindex at ./reshapedarray.jl:234 [inlined]
[8] getindex at ./subarray.jl:236 [inlined]
[9] _unsafe_getindex_rs(::SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false}, ::Tuple{Int64,Int64,Int64}) at ./reshapedarray.jl:249
[10] _unsafe_getindex(::Base.ReshapedArray{Float64,5,SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64) at ./reshapedarray.jl:246
[11] getindex at ./reshapedarray.jl:234 [inlined]
[12] _getindex at ./abstractarray.jl:1020 [inlined]
[13] getindex at ./abstractarray.jl:980 [inlined]
[14] _broadcast_getindex at ./broadcast.jl:597 [inlined]
[15] _getindex at ./broadcast.jl:628 [inlined]
[16] _broadcast_getindex at ./broadcast.jl:603 [inlined]
[17] getindex at ./broadcast.jl:564 [inlined]
[18] macro expansion at ./broadcast.jl:910 [inlined]
[19] macro expansion at ./simdloop.jl:77 [inlined]
[20] copyto! at ./broadcast.jl:909 [inlined]
[21] copyto! at ./broadcast.jl:864 [inlined]
[22] materialize!(::Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{5},Nothing,typeof(identity),Tuple{Base.ReshapedArray{Float64,5,SubArray{Float64,3,Base.ReshapedArray{Float64,5,SubArray{Float64,2,CuArray{Float64,3,Nothing},Tuple{Int64,Base.Slice{Base.OneTo{Int64}},Base.Slice{Base.OneTo{Int64}}},true},Tuple{}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}},Int64,Base.Slice{Base.OneTo{Int64}}},false},Tuple{Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64},Base.MultiplicativeInverses.SignedMultiplicativeInverse{Int64}}}}}) at ./broadcast.jl:823
[23] top-level scope at REPL[69]:1 |
yeah, then we have to mess around with Adapt to get this to work... Unless there is a way to simplify the types (specifically |
If you reshape before the views this would work: using CUDA;
CUDA.allowscalar(false);
Nq = 5;
Nqk = 5;
nelemv = 10;
nelemh = 100;
nelemh_ghost = 10;
nstate = 5;
A = CuArray{Float64, 3}(undef, nstate, Nq * Nq * Nqk, nelemv * nelemh);
B = reshape(A, nstate, Nq^2, Nqk, nelemv, nelemh);
w = view(B, 3, :, :, :, :);
wz0 = view(B, 1, :, 1:1, 1:1, :);
w .= wz0; To do this you need to access the data then reshape and pull out the indices you want. |
# project w(z=0) down the stack | ||
boxy_w = reshape(A.data[:,index_w,:], Nq^2, Nqk, 1, nelemv, nelemh) | ||
flat_w = @view boxy_w[:, end, :, end, :] | ||
flat_wz0 = reshape(flat_w, Nq^2, 1, 1, 1, nelemh) | ||
boxy_wz0 = reshape(A.data[:,index_wz0,:], Nq^2, Nqk, 1, nelemv, nelemh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work, you are creating an array slice not a view, so you are not modifying the original array but a newly created array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you haven't accounted for the ghost elements, so I don't think it will work in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly it would work in parrallel once you fix the views with:
Nq, Nqk, _, _, nelemv, nelemh, _, _ = basic_grid_info(dg)
above, though you will be copying down the ghost elements too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I think you are right. This worked on my GPU, but we need to think of a more general solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to reshape, then do the views (like I did here: #1301 (comment))
The reshape returns a new CuArray (pointing to the same data) which is why you miss the double indirection problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me lurking - is there an extra
,1,
afternum_aux_state
or is that needed for some more magic?
Ah it is an extra ,1,
! that's what my parenthetical was about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.o I think @jkozdon is editting it, the indices just jumped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (original code) doesn't work, not even on a CPU. so we need to make some kind of further change for sure
Couple typos for sure. Updated my suggestion.
@jkozdon can you refresh me on the
MPIStateArray.data
layout? I thought it was["GL points in element", "states", "elements"]
but that doesn't match up with yourviews
(or even the firstreshape
, shouldn't the1
be a:
?).
data ordering is [dof, state, elements]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just pushed a commit based on that fix, worked on the CPU, about to try it on the GPU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm didn't succeed on GPU due to the reshape dimensions not matching the array.data
: https://gist.github.com/climabot/4a0358efe092e8b8015870c489a0e2df#file-out_10192381
any ideas @jkozdon? it works fine on CPU
bors try |
tryBuild failed: |
1fbfd6e
to
ed1c682
Compare
bors try |
tryBuild failed: |
Does this mean slurm/CI succeeded or that it wasn't tried at all? |
I think it is still trying those |
b6468b6
to
bc15c1c
Compare
bors try |
Fine by me to remove it. I believe that you are the only ones using this functionality anyway. |
modifying HBM to no longer use vars beautification make basic_grid_info return NamedTuple use views of `.data` instead of reshapes of slices account for ghost elements clean up and doc fix Co-authored-by: leios <jrs.schloss@gmail.com> Co-authored-by: Jeremy E Kozdon <jekozdon@nps.edu>
tryBuild succeeded: |
bc15c1c
to
59f4828
Compare
okay, I went ahead and removed them in a separate commit. |
bors r+ |
Two commits. Love it! |
haha I know that both commits will pass CI :) But we had a conversation at MIT earlier today about this and sometimes multiple commits are justified for clarity. Also, I think we said we could have the auto-bug checker thing run only on bors commits, right? |
1301: Replace `copy_stack_field_down!` with broadcasts of reshaped MPIStateArrays r=blallen a=blallen # Description After #1071 is merged, we no longer need a dedicated kernel to copy the vertical velocity at the top of the ocean (`wz0`) down the entire stack for use in the source term for `η`. We can also use this PR to discuss how we want the `basic_grid_info` function to work. Co-authored-by: Brandon Allen <ballen@mit.edu>
haha I know that both commits will pass CI :) But we had a conversation at MIT earlier today about this and sometimes multiple commits are justified for clarity. Also, I think we said we could have the auto-bug checker thing run only on bors commits, right?
Yeah there is a way to get git-bisect to run on just bors commits.
|
Build failed: |
bors r+ |
Description
After #1071 is merged, we no longer need a dedicated kernel to copy the vertical velocity at the top of the ocean (
wz0
) down the entire stack for use in the source term forη
. We can also use this PR to discuss how we want thebasic_grid_info
function to work.I have
tests/runtests.jl
julia .dev/climaformat.jl .
For review by CLIMA developers
julia .dev/format.jl
has been run in a separate commit.