Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect serial build of ESMF? #2553

Closed
tclune opened this issue Jan 24, 2024 · 10 comments · Fixed by #2556
Closed

Detect serial build of ESMF? #2553

tclune opened this issue Jan 24, 2024 · 10 comments · Fixed by #2556
Assignees
Labels
❓ Question Further information is requested

Comments

@tclune
Copy link
Collaborator

tclune commented Jan 24, 2024

A recent GCHP issue might have been resolved more quickly if MAPL did a check to verify that ESMF has been built with MPI. Please check to see if there is some query we could do (and then throw an error).

@tclune tclune added the ❓ Question Further information is requested label Jan 24, 2024
@bena-nasa
Copy link
Collaborator

bena-nasa commented Jan 24, 2024

@tclune I'm not seeing anything in the API, either looking at the ESMF_VMGet or when you call ESM_Init, unless somehow if you don't build with MPI that maybe the MPIcommunicaor returned from ESMF_VMGet is always null?
I see this cryptic remark several places in the user manual
"Not supported in ESMF_COMM=mpiuni mode."
But there's not context on how one can detect this.

@mathomp4 do we have a no/ MPIESMF build of baselibs we could play with?
Also, how can MAPL even function without MPI, it is so ubiquitous?

@tclune
Copy link
Collaborator Author

tclune commented Jan 24, 2024

Also, how can MAPL even function without MPI, it is so ubiquitous?

Well, you can't. But the error that you get early on is not very informative. And happens when doing analysis about cores-per-node or some such. One possibility is just to change that message to indicate that MPIUNI is a likely culprit.

@mathomp4
Copy link
Member

Let's consider two ways: build-time detection and run-time detection.

At build-time, from my staring at FindESMF.cmake, I don't think ESMF_COMM is exposed mainly because I don't think it's exposed in esmf.mk. It's in a comment, but commented entries are skipped. Now, I could probably hack FindESMF.cmake to also skip all commented lines but then also look for the one that is # ESMF_COMM:...

So then perhaps at run-time? But for that I probably need to invoke the names of @theurich and @oehmke and see if they have thoughts? Maybe there is some ESMF_Initialize-like call that acts differently under mpiuni?

Or maybe we add a CMake style try_compile where if the ESMF folks know of a call that fails under mpiuni we could say "Hey, this simple ESMF code failed, your ESMF must have been built as mpiuni" or the like.

@tclune
Copy link
Collaborator Author

tclune commented Jan 25, 2024

We can query the VM and get numPets. Not sure that works with MPIUNI, but by the interface we would expect a result of 1. The question then is what if we run say doubly-periodic on 1 core for testing purposes ....

Hopefully @theurich has a better solution. We could ask for some patch to ESMF for this but that seems heavy for one problem by one student using a non GMAO application ...

@oehmke
Copy link

oehmke commented Jan 25, 2024 via email

@mathomp4
Copy link
Member

Ooh. Thanks @oehmke! I'll test it out with an mpiuni Baselibs I have around.

@oehmke
Copy link

oehmke commented Jan 26, 2024 via email

@mathomp4 mathomp4 linked a pull request Jan 26, 2024 that will close this issue
7 tasks
@mathomp4
Copy link
Member

You’re welcome! Let me know if it doesn’t do what you want.

@oehmke I have a PR that seems to do it: #2556

But new question: how much do I need to belt-and-suspender this? For example, do I need to first call ESMF_VMIsCreated? Or is that just overkill in that a VM is always created after a successful ESMF_Initialize call (which is right above this)?

Also, is there a preference between using ESMF_VMGetCurrent vs ESMF_VMGetGlobal? Or perhaps a better call to get the VM should be used?

I ask only because I rarely do ESMF coding and I figure I should do it "right" when I do. 😄

@theurich
Copy link

@mathomp4 - you are guaranteed a valid global VM after ESMF_Initialize() returns successfully. Right under ESMF_Initialize(), the current VM is identical to the global VM, so either of those calls would work.
Simpler even, just add the vm argument to your ESMF_Initialize() call, and it will return the global VM to you without any extra work. Then access esmfComm via ESMF_VMGet() as suggested by @oehmke, and you should be all set.

@mathomp4
Copy link
Member

@theurich Oooh. Nice. One less call is always good! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❓ Question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants