Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvgetcommunicator returning pointer to MPI_Comm leads to complicated designs. #275

Closed
bangerth opened this issue Apr 18, 2023 · 6 comments
Closed

Comments

@bangerth
Copy link

For the deal.II wrappers of the N_Vector interface, we've had several lengthy discussions about how to deal with the ops.nvgetcommunicator callback. The basic problem is that that interface requires one to write a function that returns a pointer to a communicator -- and that requires that that communicator has a memory location where it can be accessed.

But that's not always the case. Imagine a case where we're trying to wrap PETSc vectors. Then one might be tempted to write a function of this sort:

void * GetCommunicator (NVector v)
{
   MPI_Comm comm = PetscObjectComm(reinterpret_cast<PetscObject>(unpack 'v' to get at PETSc vector));
   return &comm;
}

...
object.ops.nvgetcommunicator = &GetCommunicator;

Except, of course, this doesn't work: We're now returning a pointer to a local variable. In other words, short of finding out where exactly PETSc happens to store the communicator, we don't quite know what address to return.

There needs to be a better design than this. MPI_Comm objects are designed to be passed around by value, and asking for a void * to a specific memory location just doesn't work without an unnecessarily large amount of work.

@sebproell @stefanozampini FYI.
A pointer to one of the discussions we're having: dealii/dealii#15086 (comment)

@balos1
Copy link
Member

balos1 commented Apr 20, 2023

Hi @bangerth. Thanks for raising the issue. We are having some internal discussions on what we can do about this and then Ill follow up here.

@bangerth
Copy link
Author

Thanks, @balos1!

I think I understand where the issue comes from, namely from wanting to support non-MPI installations where you don't want to presume that MPI_Comm is a known data type, and so you don't want to expose it in header files. But then you can't return objects of type MPI_Comm and you have to assume that these objects live somewhere in memory and you need to return type-erased pointers (void*) to these objects instead.

The deal.II and PETSc libraries (and probably others as well) have worked around this by simply declaring a dummy type for non-MPI builds to make sure that there is a type one can reference (and return). Internally, one then just has to make sure that the functions that return these kinds of objects are just not ever called whenever one compiles without MPI support. But at least the interface is always well defined, without having to resort to void *.

@balos1
Copy link
Member

balos1 commented May 8, 2023

@bangerth We are now planning to change this to a dummy type for MPI_Comm in non-MPI builds and MPI_Comm for MPI builds. However, this will necessitate a major release so it will be a few months before it is fixed in a released version of SUNDIALS.

gardner48 added a commit that referenced this issue Nov 28, 2023
Addresses #275. Requires MPI to be linked to SUNDIALS core if it is enabled thus all-in-one MPI and non-MPI builds are no longer supported. Instead user will have to separately build and install with and without MPI.

---------

Co-authored-by: Daniel R. Reynolds <reynolds@smu.edu>
Co-authored-by: Steven Roberts <roberts115@llnl.gov>
Co-authored-by: David J. Gardner <gardner48@llnl.gov>
gardner48 added a commit that referenced this issue Dec 18, 2023
Addresses #275. Requires MPI to be linked to SUNDIALS core if it is enabled thus all-in-one MPI and non-MPI builds are no longer supported. Instead user will have to separately build and install with and without MPI.

---------

Co-authored-by: Daniel R. Reynolds <reynolds@smu.edu>
Co-authored-by: Steven Roberts <roberts115@llnl.gov>
Co-authored-by: David J. Gardner <gardner48@llnl.gov>
@balos1
Copy link
Member

balos1 commented Feb 8, 2024

@bangerth This issue has been addressed in our v7.0.0 release candidate. If you or someone on the deal.ii team are able to test the release candidate (actually, it may be better to point to the develop branch which is the release candidate plus a few more fixes) in deal.ii, especially the new way we handle the MPI_Comm, it would be greatly appreciated. We are planning to release a finalized v7.0.0 in a few weeks and any feedback is welcome.

@balos1
Copy link
Member

balos1 commented Mar 29, 2024

Closed by v7.0.0

@bangerth
Copy link
Author

Thanks, @balos1 ! I meant to figure out how we need to adjust this in deal.II, but my colleague @marcfehling beat me to it yesterday. I will have to run the test suite and see what happens, but will assume for now that it all works fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants