RMA improvements #812

jeffhammond · 2024-02-01T12:45:53Z

Adds:

win_allocate
win_lock_all
win_unlock_all
win_flush_local

Does not add:

win_flush_local_all
win_flush_all
rput, raccumulate, rget, rget_accumulate

I don't know how to compile or test this so I am relying on your CI to do that for me.

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

jeffhammond · 2024-02-01T13:38:29Z

I am not going to finish this until MPI.jl supports ARM. I do not get paid to write code for Intel anymore.

#404

giordano · 2024-02-01T14:06:22Z

@jeffhammond I'm sorry to read you closed the PR for this reason, especially since a few of us, me included, are enthusiastic about ARM chips. @vchuravy and I published a paper about using Julia (and MPI.jl) on A64FX in the Embracing Arm for HPC Workshop, and more recently I've been playing with Julia on NVIDIA Grace Hopper, so we really want to support ARM as good as possible!

MPI.jl generally works well on ARM (as any other architectures where Julia runs), the issue you linked to refers only to custom MPI reductions, which make use of LLVM trampolines which are just not supported upstream in LLVM! This issue also affected PowerPC, but then someone fixed the issue for that architecture in LLVM and then we automatically benefited from it. If your company has LLVM engineers to spare, getting someone to fix this issue would benefit all compiler frontends based on LLVM targeting ARM architectures, not just Julia and MPI.jl!

jeffhammond · 2024-02-01T14:12:12Z

I use custom MPI reductions quite a bit in my work, so they aren't something I can ignore. They are an MPI 1.0 feature AFAIK, so it doesn't make sense for me to work on RMA things if they aren't available (for the systems I'm using).

I will ask if anyone is willing to fix this, but I'm still disappointed that there is no workaround besides fixing LLVM.

giordano · 2024-02-01T14:22:32Z

I will ask if anyone is willing to fix this

Thanks! If that's of any help, this should be the upstream ticket: llvm/llvm-project#4100

but I'm still disappointed that there is no workaround besides fixing LLVM.

Julia uses LLVM quite a lot, so that's not totally unsurprising.

vchuravy · 2024-02-01T15:31:32Z

To be precise there are two ways to do C callable function in Julia.

function custom_op(x, y)
    x + y
end
cptr = @cfunction(custom_op, Float32, (Float32, Float32))

and the second way:

function create_cfunc(op)
     @cfunction($op, Float32, (Float32, Float32))
end
create_cfunc(custom_op)

The former works on all platforms that Julia support, the latter does not since it require compiler support that sofar we have struggled to figure out how to support it on AArch64. As far as I can tell it's equivalent to https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html

I haven't dug into this for a couple of years, but I think C moved to function descriptors for this? AFAIK trampolines are not supported by the ABI on aarch64.

It is this second way that is used in

MPI.jl/src/operators.jl

Line 101 in 953088b

    
           fptr = @cfunction($w, Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}, Ptr{Cint}, Ptr{MPI_Datatype}))

simonbyrne · 2024-02-02T17:05:54Z

I am not going to finish this until MPI.jl supports ARM. I do not get paid to write code for Intel anymore.

Unfortunately, no one else is paid to maintain or develop this package either, beyond what they immediately need for their own work. We're happy to discuss paths to fix the issue, but this sort of comment isn't really helpful.

jeffhammond · 2024-02-02T19:03:57Z

I'm sorry. I was exasperated because I could not build test my changes on my laptop (Apple M2), which makes development harder. I can try to contribute to this on the weekend.

I know this is an LLVM issue and I'm trying to help with that.

simonbyrne · 2024-02-03T22:05:18Z

I understand: the test suite is kind of clunky, and could also do with some more improvement, but I thought we disable those tests on non-supported platforms. Feel free to ask on Slack for any questions/frustrations.

Regarding custom operators, one particular challenge that is specific to MPI is that the API doesn't allow you to pass through a "thunk" pointer to the reduction function, which is how we usually avoid this problem for callbacks in other libraries. Do you think there would be any appetite adding an additional API that would support this to the MPI standard? Or have custom operators gone out of style since they're not supported for RDMA operations?

jeffhammond · 2024-02-04T07:43:34Z

Custom reductions aren't optimized but the use cases tolerate that. RDMA doesn't do most MPI ops anyways, certainly not in bulk.

I don't know what thunk means here but in Mukautuva, I have a callback trampoline that uses datatype attributes to carry state from the caller to the callee, so I can invoke a different function. Does that sound relevant?

jeffhammond added 4 commits February 1, 2024 14:34

win_allocate

458cdc5

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

win_(un)lock_all

effda8b

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

flush_local

b50be2a

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

docs

9a1fb89

Signed-off-by: Jeff Hammond <jeff.science@gmail.com>

jeffhammond closed this Feb 1, 2024

jeffhammond deleted the rma-updates branch February 1, 2024 13:42

jeffhammond restored the rma-updates branch February 1, 2024 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMA improvements #812

RMA improvements #812

jeffhammond commented Feb 1, 2024

jeffhammond commented Feb 1, 2024

giordano commented Feb 1, 2024

jeffhammond commented Feb 1, 2024

giordano commented Feb 1, 2024

vchuravy commented Feb 1, 2024 •

edited

Loading

simonbyrne commented Feb 2, 2024

jeffhammond commented Feb 2, 2024

simonbyrne commented Feb 3, 2024

jeffhammond commented Feb 4, 2024

RMA improvements #812

RMA improvements #812

Conversation

jeffhammond commented Feb 1, 2024

jeffhammond commented Feb 1, 2024

giordano commented Feb 1, 2024

jeffhammond commented Feb 1, 2024

giordano commented Feb 1, 2024

vchuravy commented Feb 1, 2024 • edited Loading

simonbyrne commented Feb 2, 2024

jeffhammond commented Feb 2, 2024

simonbyrne commented Feb 3, 2024

jeffhammond commented Feb 4, 2024

vchuravy commented Feb 1, 2024 •

edited

Loading