-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a new ELM domain decomposition algorithm #5690
Conversation
Initial check-in of a 'simple' domain decomposition for ELM in which the number of active land grid cells are divided by number of clumps.
I tested the a case
|
@bishtgautam To count the number of MPI_Bcast calls made inside SCORPIO, you can use the following feature branch for testing:
This branch prints the number of MPI_Bcast calls in e3sm.log: |
For the I1850ELM case outlined above, the MPI_Bcast counter remains unchanged:
|
@bishtgautam look for this line in the lnd.log file: And let us know what it says for each decomp type. |
|
That might help with the coupler performance. But I think I'm wrong about decomp affecting the number of bcast calls. When to broadcast is more driven by the the logic of the routine: e.g. was something computed or read on one processor? That won't change with decomp. |
Yes, this makes sense. |
I ran simulations on Perlmutter in which I increased the number of variables to be outputted in
These simulations used round-robin domain decomposition and the following
|
How is the above different from the case that had 87854 broadcasts? |
@dqwu does your counter count all mpi-bcast calls in the executable? Not just SCORPIO? |
Only those calls inside SCORPIO. There is no way for SCORPIO to count other calls directly made inside E3SM Fortran code. |
@rljacob The case with 87854 MPI_Bcasts was writing all variables (~270) ELM variables in |
Initial check-in of a 'simple' domain decomposition for ELM in which the number of active land grid cells are divided by the number of clumps. [BFB]
Merged to next |
@bishtgautam Seems that there is inconsistencies with this PR as there are DIFFs showing but my final tests had everything passing. I tested this merged to next 3 different times because the first time with Not sure what could give different behavior, but I do see that the cases that DIFF have an EXEROOT set to the new simple-decomp test root: But my tests that PASS have a different EXEROOT ( So changing the new test to not be in the shared executable group may fix this but just looking at the code changes I didn't see any obvious reasons for that. |
re-merge to fix unexpected DIFFs
re-merged to next |
@@ -41,6 +41,7 @@ | |||
"time" : "0:45:00", | |||
"tests" : ( | |||
"ERS.f09_g16.IELMBC", | |||
"ERS.f09_g16.IELMBC.elm-simple_decomp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this test go into e3sm_land_exenoshare
? Because the test-mod is enabling multi-threading:
cat E3SM/components/elm/cime_config/testdefs/testmods_dirs/elm/simple_decomp/shell_commands
./xmlchange NTHRDS=2
Default PEs on most machines are MPI-only (without OpenMP flags), and executables cannot be shared among threaded and non-threaded runs (non-BFB results otherwise).
Initial check-in of a 'simple' domain decomposition for ELM in which the number of active
land grid cells are divided by the number of clumps.
[BFB]