Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fates parameter file auto-build for all tests #2336

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

rgknox
Copy link
Collaborator

@rgknox rgknox commented Jan 25, 2024

Description of changes

This enables the automatic building of the fates parameter file binary for all tests. This calls ncgen from the shell_commands script in the Fates/ testdef folder, to operate on the fates default file that is version controlled.

Specific notes

This implementation is incomplete. In order to get this to work for all tests, I had to place the newly built binaries in the a new folder in the fates source tree. The reason for this is because some of the tests are multi-phase (PEM, ERP, etc). Each of these phases needs access to either the same parameter file, or an exact copy of it. However, the shell_commands (as far as my test show) script is only called the first time, so both parts of the test need access to the same file. Unfortunately, the xml files in both parts of the tests, do not provide any file-paths that are common to phase (I looked pretty thoroughly but maybe missed something), located somewhere on the scratch partition. For instance, they both have different cases, which makes it tough for us to locate the parameter file on the second test, if it has a different case as the first. I also tried using CIME_OUTPUT_ROOT, the sharedlib build location.

There is an xml entry in env_test.xml that is TEST_ARGV. This holds the root folder for the current test environment, and the id of the specific test currently run. With these two bits of information we could place a binary file that is accessible to all phases of a test. However, this information does not seem to be available via xml query at the time we run the shell_commands script.

Another location that might be better than the source, at least for the time being would be to put all these files in the CIME_OUTPUT_ROOT, which is usually just the scratch folder where all cases and tests go. Each parameter file could have the test name and hash in it, to prevent redundancy. The downside is that the root scratch folder starts to fill up.

Contributors other than yourself, if any:

@ekluzek @glemieux @adrifoster

Are answers expected to change (and if so in what way)?

Any User Interface Changes (namelist or namelist defaults changes)?

Testing performed, if any:

@rgknox
Copy link
Collaborator Author

rgknox commented Jan 25, 2024

Here is a list of the parameter test file binaries it generates, one fates test creates 5.9M of data:

~/ctsm/src/fates/parameter_files> ls binaries/
ERP_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERP_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.20240124_191528_ohyuz4-params.nc.text
ERP_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.20240124_193525_2d6yfg-params.nc
ERP_D_P128x2_Ld3.f19_g17.I2000Clm50FatesCru.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERP_Ld3.f09_g17.I2000Clm50FatesRs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERP_P256x2_Ld30.f45_f45_mg37.I2000Clm51FatesRs.derecho_intel.clm-mimicsFatesCold.0124-200432de_int-params.nc
ERS_D_Ld15.5x5_amazon.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdSeedDisp.0124-200432de_gnu-params.nc
ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdTwoStreamNoCompFixedBioGeo.0124-200432de_gnu-params.nc
ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTreeDamage.0124-200432de_int-params.nc
ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTwoStream.0124-200432de_int-params.nc
ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLandUse.0124-200432de_int-params.nc
ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLUH2.0124-200432de_int-params.nc
ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPRT2.0124-200432de_int-params.nc
ERS_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
ERS_D_Ld3.f19_g17.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERS_D_Ld5.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro.0124-200432de_int-params.nc
ERS_D_Ld5.f10_f10_mg37.I2000Clm50Fates.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERS_D_Mmpi-serial_Ld5.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdFixedBiogeo.0124-200432de_int-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoComp.0124-200432de_int-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoCompFixedBioGeo.0124-200432de_int-params.nc
ERS_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdSizeAgeMort.0124-200432de_int-params.nc
ERS_Ld5.f19_g17.I2000Clm45Fates.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdLogging.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdNoFire.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPPhys.0124-200432de_int-params.nc
ERS_Ld60.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdST3.0124-200432de_int-params.nc
ERS_Ld9.f10_f10_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdCH4Off.0124-200432de_int-params.nc
ERS_Lm13.f10_f10_mg37.I2000Clm50Fates.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
ERS_Lm13.f45_f45_mg37.I2000Clm50Fates.derecho_intel.clm-FatesColdNoComp.0124-200432de_int-params.nc
ERS_P128x1_Lm25.f10_f10_mg37.I2000Clm51Fates.derecho_intel.clm-FatesColdNoComp.0124-200432de_int-params.nc
PEM_D_Ld15.5x5_amazon.I2000Clm50FatesRs.derecho_gnu.clm-FatesColdSeedDisp.0124-200432de_gnu-params.nc
SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Fates.derecho_gnu.clm-FatesPRISM--clm-NEON-FATES-YELL.0124-200432de_gnu-params.nc
SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesCold.0124-200432de_gnu-params.nc
SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesCold.0124-200432de_int-params.nc
SMS_Lm3_D_Mmpi-serial.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_gnu.clm-FatesColdHydro.0124-200432de_gnu-params.nc
SMS_Lm6.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-Fates.0124-200432de_int-params.nc

@samsrabin
Copy link
Collaborator

This should probably happen in a SystemTest, not in shell_commands. See #2335.

@samsrabin
Copy link
Collaborator

See also discussion from CTSM SE meeting here.

@samsrabin
Copy link
Collaborator

I'm unable to get this to work on Izumi. It might be something wrong with my environment. Could someone else try this and see?

./run_sys_tests --skip-compare --skip-generate -t ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream

Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgknox this is a nice second step to help us head towards #2126.

I have some questions that I'm wondering about that I added to the code changes. I'm not sure that should hold anything up. But, I marked it as request changes until we decide on this...

@ekluzek ekluzek added FATES A change needed for FATES that doesn't require a FATES API update. testing additions or changes to tests labels Feb 15, 2024
Copy link
Collaborator

@samsrabin samsrabin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few suggestions in the code. Also:

  • Ensure this works for various users on Izumi.

@glemieux
Copy link
Collaborator

glemieux commented Feb 15, 2024

I'm unable to get this to work on Izumi. It might be something wrong with my environment. Could someone else try this and see?

./run_sys_tests --skip-compare --skip-generate -t ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream

I'm seeing a failure trying this as well with the following:

RUN: /scratch/cluster/glemieux/tests_0215-110541iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz/shell_commands
FROM: /scratch/cluster/glemieux/tests_0215-110541iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz
  stat: 1

  errput: ncgen: No such file or directory
        (../../ncgen/genbin.c:58)
Traceback (most recent call last):
  File "/home/glemieux/ctsm/src/fates/tools/modify_fates_paramfile.py", line 355, in <module>
    main()
  File "/home/glemieux/ctsm/src/fates/tools/modify_fates_paramfile.py", line 95, in main
    shutil.copyfile(args.inputfname, tempfilename)
  File "/cluster/anaconda-23.11.0/lib/python3.11/shutil.py", line 256, in copyfile
    with open(src, 'rb') as fsrc:
         ^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/home/glemieux/ctsm/src/fates/parameter_files/binaries/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz-params.nc'
Leaving broken case dir /scratch/cluster/glemieux/tests_0215-110541iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.0215-110541iz

Looking at the fates directory structure, the binaries directory isn't getting built for some reason because ncgen isn't found. @samsrabin is this the same error you were seeing?

@samsrabin
Copy link
Collaborator

That is indeed the same error I was seeing, but I don't think it's ncgen not being found. I think there might be a problem with the ncgen installation.

@samsrabin
Copy link
Collaborator

I'm actually getting a similar error on Derecho, too. From SMS_D_Ld3.f10_f10_mg37.I2000Clm50FatesRs.derecho_intel:

Adding user mods directory /glade/u/home/samrabin/ctsm_fates-auto-params/cime_config/testdefs/testmods_dirs/clm/Fates
RUN: /glade/derecho/scratch/samrabin/tests_0215-114155de/SMS_D_Ld3.f10_f10_mg37.I2000Clm50FatesRs.derecho_intel.clm-Fates.0215-114155de/shell_commands
FROM: /glade/derecho/scratch/samrabin/tests_0215-114155de/SMS_D_Ld3.f10_f10_mg37.I2000Clm50FatesRs.derecho_intel.clm-Fates.0215-114155de
  stat: 1

  errput: ncgen: No such file or directory
	(/home/conda/feedstock_root/build_artifacts/libnetcdf_1650908392318/work/ncgen/genbin.c:genbin_netcdf:63)

Tests in /glade/derecho/scratch/samrabin/tests_0215-114155de/.

@samsrabin
Copy link
Collaborator

The issue is that ${SRCROOT}/src/fates/parameter_files/binaries/ doesn't exist. Will fix in my PR.

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 15, 2024

@samsrabin thanks for the work on this. Note, that fates/parameter_files is under the FATES external so adding a binary_files subdirectory would require a PR to FATES. And I think for git, you have to have at least a README file in the directory for it to show up when you check it out...

@samsrabin
Copy link
Collaborator

samsrabin commented Feb 15, 2024

@ekluzek Good point; adding this directory will make the FATES checkout unclean. @rgknox I think you need to make a new FATES tag that has an empty parameter_files/binaries/ directory, then update Externals.cfg here to point to that. No; see below.

@samsrabin
Copy link
Collaborator

Wait, @ekluzek, even if the new directory is canonically in FATES, won't the checkout be unclean once the parameter file is generated?

@samsrabin
Copy link
Collaborator

Actually… the checkout looks clean. manage_externals/checkout_externals -S gives no warning, and git status in src/fates is clean, even with the new directory and parameter files generated. This might be because .nc files are ignored by src/fates/.gitignore.

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 15, 2024

@samsrabin yes exactly. But, it's good that you showed that's the case. It's good to confirm.

@rgknox
Copy link
Collaborator Author

rgknox commented Feb 15, 2024

That directory needs to be added in fates, sorry for that, I'll get it added to the next FATES PR.
UPDATE: Sam added a mkdir -p call to the scripting, so we don't need this directory added anymore.

@samsrabin
Copy link
Collaborator

That merge commit I just did was to resolve conflicts introduced in my PR. They were only related to run_sys_tests.py and its testing. They're now resolved, and make all in python/ is still clean.

samsrabin
samsrabin previously approved these changes Feb 22, 2024
@rgknox
Copy link
Collaborator Author

rgknox commented Feb 24, 2024

clm_aux on derecho, ok with exception:

FAIL SMS.f10_f10_mg37.I2000Clm50BgcCrop.derecho_nvhpc.clm-crop MODEL_BUILD time=212 (THIS TEST PASSES AFTER RESUBMITTING)

FAIL DAE_C2_D_Lh12.f10_f10_mg37.I2000Clm50BgcCrop.derecho_intel.clm-DA_multidrv RUN time=304

ERROR: ERROR: Unrecognized line ('/bin/bash: module: line 1: syntax error: unexpected end of file

@samsrabin
Copy link
Collaborator

That dang DAE test! Try resubmitting it.

ekluzek
ekluzek previously approved these changes Feb 26, 2024
@ekluzek
Copy link
Collaborator

ekluzek commented Feb 26, 2024

I tried the izumi test that was failing before and it works for me, so I checked that item off, which puts this in a ready to merge mode.

@rgknox
Copy link
Collaborator Author

rgknox commented Feb 27, 2024

This test fails create case: ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream

RUN: /scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag/shell_commands
FROM: /scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag
  stat: 1

  errput: Traceback (most recent call last):
  File "/home/rgknox/ctsm/src/fates/tools/modify_fates_paramfile.py", line 35, in <module>
    from scipy.io import netcdf as nc
ImportError: No module named scipy.io
Leaving broken case dir /scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag
ERROR: Command: '/scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag/shell_commands' failed with error 'Traceback (most recent call last):
  File "/home/rgknox/ctsm/src/fates/tools/modify_fates_paramfile.py", line 35, in <module>
    from scipy.io import netcdf as nc
ImportError: No module named scipy.io' from dir '/scratch/cluster/rgknox/tests_0226-121631iz/ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream.GC.0226-121631iz_nag'

 ---------------------------------------------------
2024-02-26 13:08:10: CREATE_NEWCASE FAILED for test 'ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream'.

However, i'm able to load scipy manually when I run python. Also, this test passed when I ran it stand-alone.. ie, this test passed:

./create_test ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.izumi_nag.clm-FatesColdTwoStream

@rgknox
Copy link
Collaborator Author

rgknox commented Feb 27, 2024

DAE_C2_D_Lh12.f10_f10_mg37.I2000Clm50BgcCrop.derecho_intel.clm-DA_multidrv also still fails after re-submitting

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 28, 2024

Since this isn't critical to come in now, we will plan on delaying this to fix the conda env issue on izumi (I think #2385 will fix this). @samsrabin also has some analysis that shows that there is a race condition for the DAE test that sometime results in a file being gzipped before something else needs I don't think the DAE issue should hold this one up, but that is another good thing to have come in.

@samsrabin samsrabin dismissed stale reviews from ekluzek and themself June 20, 2024 16:57

Undoing approval until it's actually ready

@glemieux
Copy link
Collaborator

Getting this prioritized came up in discussion around NGEET/fates#1236. @ckoven suggested that we may want to also move towards building from the xml patch file.

@samsrabin samsrabin added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 26, 2024
@wwieder
Copy link
Contributor

wwieder commented Oct 31, 2024

@ekluzek will start a document on this and shared it with interested parties by our Nov 6 meeting.

@wwieder wwieder removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FATES A change needed for FATES that doesn't require a FATES API update. testing additions or changes to tests
Projects
Status: Stalled (needs review, blocked etc.)
Development

Successfully merging this pull request may close these issues.

5 participants