Skip to content

Conversation

@s-fuerst
Copy link
Contributor

@s-fuerst s-fuerst commented Nov 3, 2022

I did run into out of memory problems, first when I used MPI.Get (which I fixed by reusing always the same Buffer) and later also in Alltoall calls. This happens when derived datatypes are used in those calls, and the MPI.Datatype constructor is called implicitly, like in this example:

using MPI

struct Derived
    val::Int64
end    

MPI.Init()

comm = MPI.COMM_WORLD
size = MPI.Comm_size(comm)
rank = MPI.Comm_rank(comm)

send_counts = Vector{Cint}(1:size)
recv_counts = fill(Cint(rank+1), size)

send_vals = collect(Iterators.flatten([1:i for i = 1:size])) 
send_vals = map(Derived, send_vals)

A = Array{Derived}(send_vals)
C = Array{Derived}(undef, sum(recv_counts))

while true
    MPI.Alltoallv!(VBuffer(A,send_counts), VBuffer(C,recv_counts), comm)
end

MPI.Finalize()

This PR fixes this problem, by caching the created Datatypes, to avoid that in each repeated Alltoallv! call the Datatype is constructed (incl. a API.MPI_Type_create_struct call) and commited again.

Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
@simonbyrne
Copy link
Member

Would you be able to add a test?

@s-fuerst
Copy link
Contributor Author

s-fuerst commented Nov 7, 2022

I'm not sure what I should test. The changed Datatype constructor is already part of many tests. Should I write something, that checks that this fix really solves the memory leak?

@simonbyrne simonbyrne merged commit e28a639 into JuliaParallel:master Nov 7, 2022
@simonbyrne
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants