Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a new ELM domain decomposition algorithm #5690

Merged
merged 5 commits into from
Jun 19, 2023

Conversation

bishtgautam
Copy link
Contributor

@bishtgautam bishtgautam commented May 16, 2023

Initial check-in of a 'simple' domain decomposition for ELM in which the number of active
land grid cells are divided by the number of clumps.

[BFB]

Initial check-in of a 'simple' domain decomposition for ELM in which
the number of active land grid cells are divided by number of clumps.
@bishtgautam
Copy link
Contributor Author

I tested the a case

./create_newcase -case I1850ELM.f09_f09.2023-05-16 -compset I1850ELM -res f09_f09 -mach pm-cpu -compile gnu

cd I1850ELM.f09_f09.2023-05-16

cat >> user_nl_elm << EOF
domain_decomp_type = 'simple'
EOF

# Currnelty only tested the code with single threads 
./xmlchange NTHRDS=1 

./case.setup
./case.build

@bishtgautam
Copy link
Contributor Author

@rljacob @dqwu @ndkeen @whannah1, I don't know if this PR reduces the number of MPI_bcast. How can one determine the number of MPI_Bcast with/without this PR?

@dqwu
Copy link
Contributor

dqwu commented May 16, 2023

@rljacob @dqwu @ndkeen @whannah1, I don't know if this PR reduces the number of MPI_bcast. How can one determine the number of MPI_Bcast with/without this PR?

@bishtgautam To count the number of MPI_Bcast calls made inside SCORPIO, you can use the following feature branch for testing:

cd externals/scorpio
git fetch origin
git checkout dqwu/MPI_Bcast_counter

This branch prints the number of MPI_Bcast calls in e3sm.log:
printf("MPI_Bcast is called, counter = %ld\n", counter);

@bishtgautam
Copy link
Contributor Author

For the I1850ELM case outlined above, the MPI_Bcast counter remains unchanged:

MPI_Bcast is called, counter = 87854

@rljacob
Copy link
Member

rljacob commented May 17, 2023

@bishtgautam look for this line in the lnd.log file:
lnd gsmap glo num of segs =

And let us know what it says for each decomp type.

@bishtgautam
Copy link
Contributor Author

  • For round-robin: lnd gsmap glo num of segs = 3274
  • For simple: lnd gsmap glo num of segs = 924

@rljacob
Copy link
Member

rljacob commented May 17, 2023

That might help with the coupler performance.

But I think I'm wrong about decomp affecting the number of bcast calls. When to broadcast is more driven by the the logic of the routine: e.g. was something computed or read on one processor? That won't change with decomp.

@bishtgautam
Copy link
Contributor Author

But I think I'm wrong about decomp affecting the number of bcast calls. When to broadcast is more driven by the the logic of the routine: e.g. was something computed or read on one processor? That won't change with decomp.

Yes, this makes sense.

@bishtgautam
Copy link
Contributor Author

I ran simulations on Perlmutter in which I increased the number of variables to be outputted in elm.h0

No. of MPI_Bcast No. of vars in elm.h0 Comments
45226 0 elm.h0 disabled
46968 27 26 vars out of 27 are default variables
47055 28 87 additional MPI_Bcast
47142 29 87 additional MPI_Bcast

These simulations used round-robin domain decomposition and the following NTASKS:

./xmlquery NTASKS
	NTASKS: ['CPL:128', 'ATM:128', 'LND:128', 'ICE:128', 'OCN:128', 'ROF:128', 'GLC:128', 'WAV:128', 'IAC:1', 'ESP:1']

@rljacob
Copy link
Member

rljacob commented May 17, 2023

How is the above different from the case that had 87854 broadcasts?

@rljacob
Copy link
Member

rljacob commented May 17, 2023

@dqwu does your counter count all mpi-bcast calls in the executable? Not just SCORPIO?

@dqwu
Copy link
Contributor

dqwu commented May 17, 2023

@dqwu does your counter count all mpi-bcast calls in the executable? Not just SCORPIO?

Only those calls inside SCORPIO. There is no way for SCORPIO to count other calls directly made inside E3SM Fortran code.

@rljacob rljacob added the Land label May 17, 2023
@bishtgautam
Copy link
Contributor Author

How is the above different from the case that had 87854 broadcasts?

@rljacob The case with 87854 MPI_Bcasts was writing all variables (~270) ELM variables in elm.h0.

@bishtgautam bishtgautam changed the title [WIP] Adds a new ELM domain decomposition algorithm Adds a new ELM domain decomposition algorithm May 18, 2023
@bishtgautam bishtgautam added the BFB PR leaves answers BFB label May 18, 2023
peterdschwartz added a commit that referenced this pull request Jun 15, 2023
Initial check-in of a 'simple' domain decomposition for ELM in which the number of active
land grid cells are divided by the number of clumps.

[BFB]
@peterdschwartz
Copy link
Contributor

Merged to next

@peterdschwartz
Copy link
Contributor

@bishtgautam Seems that there is inconsistencies with this PR as there are DIFFs showing but my final tests had everything passing.

I tested this merged to next 3 different times because the first time with e3sm_developer tests suite I got DIFFs. But the second time with e3sm_land_developer everything PASSed and the third time with the full e3sm_developer everything PASSed as well (directory is on chrysalis here: /lcrc/group/e3sm/ac.schwartzpd/merge )

Not sure what could give different behavior, but I do see that the cases that DIFF have an EXEROOT set to the new simple-decomp test root: EXEROOT: /pscratch/sd/e/e3smtest/e3sm_scratch/pm-cpu/ERS.f09_g16.IELM.pm-cpu_intel.elm-simple_decomp.C.JNextIntegration20230615_205137/bld

But my tests that PASS have a different EXEROOT (/lcrc/group/e3sm/ac.schwartzpd/merge/output/ERS.f09_g16.I1850GSWCNPRDCTCBC.chrysalis_intel.elm-vstrd.C.20230614_174426_ovugom/bld)

So changing the new test to not be in the shared executable group may fix this but just looking at the code changes I didn't see any obvious reasons for that.

peterdschwartz added a commit that referenced this pull request Jun 16, 2023
@peterdschwartz
Copy link
Contributor

re-merged to next

@peterdschwartz peterdschwartz merged commit d3ba9f9 into master Jun 19, 2023
@peterdschwartz peterdschwartz deleted the bishtgautam/lnd/domain-decomp branch June 19, 2023 17:49
@@ -41,6 +41,7 @@
"time" : "0:45:00",
"tests" : (
"ERS.f09_g16.IELMBC",
"ERS.f09_g16.IELMBC.elm-simple_decomp",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this test go into e3sm_land_exenoshare? Because the test-mod is enabling multi-threading:

cat E3SM/components/elm/cime_config/testdefs/testmods_dirs/elm/simple_decomp/shell_commands 
./xmlchange NTHRDS=2

Default PEs on most machines are MPI-only (without OpenMP flags), and executables cannot be shared among threaded and non-threaded runs (non-BFB results otherwise).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BFB PR leaves answers BFB Land
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants