Skip to content

Updates and fixes for ensemble runs#311

Open
alperaltuntas wants to merge 4 commits intoESCOMP:mainfrom
alperaltuntas:inst_suffix_fixes
Open

Updates and fixes for ensemble runs#311
alperaltuntas wants to merge 4 commits intoESCOMP:mainfrom
alperaltuntas:inst_suffix_fixes

Conversation

@alperaltuntas
Copy link
Member

@alperaltuntas alperaltuntas commented Feb 27, 2026

  • Allow MOM_input template to have INST_SUFFIX as an expandable variable
  • Fix multi instance ERI tests by adding inst_suffix to initial conditions (restart) files
  • Also, add a new test suite called fast_mom that solely runs 3- and 10-degree versions of the tx2_3v2 grid configuration via testmods. This new test finishes in 1 hour (including setup and build times) and uses 1 node per test case (and so goes through the job queue quickly).

This PR is part of a series of PRs to fix various interrelated restart/multi-instance/hybrid run issues alongside ESCOMP/FMS#8
NCAR/MOM6#409
ESMCI/cime#4944

Also, simplify buildnml file path handling by using the pathlib library
instead of os.path.join
…dding inst_suffix

Also rename RESTARTFILE_APPENDIX_PREFIX as ENSEMBLE_APPENDIX_PREFIX
@kdraeder
Copy link
Collaborator

@alperaltuntas Thanks for all of these fixes and upgrades! I'm looking forward to using them as soon as they're ready, or sooner if you need testing in more contexts. I'm currently developing multi-instance HIST tests, especially for "allactive" (= "mostly active") and have run into the input file NINST problem.

It would be a big help to use the low resolution versions in this testing, along with low resolution in CAM and CLM. Have those been combined into a grid yet?

@alperaltuntas
Copy link
Member Author

@kdraeder It would be a big help if you can test these changes. I added you as a reviewer to this PR but feel free to add your feedbacks to any of these PRs.

@kdraeder
Copy link
Collaborator

@kdraeder It would be a big help if you can test these changes. I added you as a reviewer to this PR but feel free to add your feedbacks to any of these PRs.

I think that in order to do the tests I have in mind I'll need to have these MOM changes
in a CESM tag that is consistent with them. Is there such a thing yet?

If not, should I import this version of MOM into a recent CESM tag and see how far it can go?
I haven't been very successful doing this, but I'm willing to try it.

@alperaltuntas
Copy link
Member Author

alperaltuntas commented Feb 27, 2026

@kdraeder You may manually merge all four of these PRs into their respective repositories within cesm3_0_alpha08d. Or more conveniently, you may use/copy the following CESM sandbox on derecho, where all of these changes are already present: /glade/work/altuntas/cesm.sandboxes/cesm3_0_alpha08d_feb27

@kdraeder
Copy link
Collaborator

@alperaltuntas I'll run the low-res tests you've already defined first (ERI and MCC).
Then I'd like to try to run a test with an active atmosphere.
It looks like I'll need to define a new grid because all of the existing low-res atm grids
({3,5,16}ne_mg37) say "only for compsets that are not _MOM".
Should I be able to use one of those ne grids with the t232 (10deg) at this point?
Do you have thoughts about which resolution?

@alperaltuntas
Copy link
Member Author

alperaltuntas commented Feb 27, 2026

@kdraeder You don't need to define a new resolution because CESM still thinks MOM6 is run on tx2_3v2. (The grid is coarsened via xmlchanges and user_nl changes). But I am also not sure whether you'd run into any complications with an active atmosphere. If so, let me know and I'll look into it.

@kdraeder
Copy link
Collaborator

It appears that my git clone of your _feb27 /glade version does not have the tx10deg and tx3deg
in the mom testmods. I see them in your version, and I see that you're on branch inst_suffix_fixes,
but I don't see that branch in my copy. Any suggestions?

@alperaltuntas
Copy link
Member Author

It appears that my git clone of your _feb27 /glade version does not have the tx10deg and tx3deg in the mom testmods. I see them in your version, and I see that you're on branch inst_suffix_fixes, but I don't see that branch in my copy. Any suggestions?

Can you copy the directory (cp -r) rather than a git clone? This branches aren't tagged yet and I haven't updated my CESM .gitmodules.

@kdraeder
Copy link
Collaborator

The cp -r enabled the build of test
ERI_D_Ld8.TL319_t232.G_JRA.derecho_intel.mom-debug--mom-tx10deg,
and it started running, but stopped with this, from the cesm.log file:

6: h-point: mean=   7.1877374035932866E+04 min=   0.0000000000000000E+00 
             max=   2.1800534728205935E+05 Post extract_sfc ocean_salt
6: h-point: c=     18690 Post extract_sfc ocean_salt
3: forrtl: error (65): floating invalid
3: Image       PC                Routine            Line        Source                    
3: libc.so.6   000014C4CC442900  Unknown               Unknown  Unknown
3: cesm.exe    0000000000A956EF  ice_grid_mp_grid_        4408  ice_grid.F90
3: cesm.exe    0000000000A4FB5B  ice_grid_mp_grid_        3707  ice_grid.F90
3: cesm.exe    0000000000A4D970  ice_grid_mp_grid_        3587  ice_grid.F90
3: cesm.exe    000000000146D57F  ice_dyn_evp_mp_ev         471  ice_dyn_evp.F90
3: cesm.exe    0000000001E4A3FA  ice_step_mod_mp_s        1022  ice_step_mod.F90
3: cesm.exe    0000000001067F4C  cice_runmod_mp_ic         270  CICE_RunMod.F90

The Ld8 is not standard, but I've used it before with ERI.
And I got this same error with the default Ld22.
The cases are in
/glade/work/raeder/Exp/CESM+DART_testing/ERIx1.mom10deg.G_JRA
and the rundirs in
/glade/derecho/scratch/raeder/ERIx1.mom10deg.G_JRA

Do you recognize it, or have any suggestions about where I should look for the source error?

@alperaltuntas
Copy link
Member Author

@kdraeder I should have mentioned in the PR description that I also ran into this same error when running the G case with Intel in debug mode. As a result, there aren’t any Intel-debug G tests in the fast_mom test suite.

I haven’t looked into it in detail yet since it seems to be coming from the CICE grid module, though my guess is that it’s related to tripolar grid complications with these relatively coarse grids and asymmetric masking around the stitch. For now, I’d recommend running the test in non-debug mode. If you need debug mode, using the GNU compiler appears to work around the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants