Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spack-stack METplus fails with PosixPath errors #2091

Closed
DavidHuber-NOAA opened this issue Nov 28, 2023 · 15 comments
Closed

Spack-stack METplus fails with PosixPath errors #2091

DavidHuber-NOAA opened this issue Nov 28, 2023 · 15 comments
Labels
bug Something isn't working

Comments

@DavidHuber-NOAA
Copy link
Contributor

What is wrong?

When running metplus/3.1.1 built with spack-stack within the global-workflow, multiple Python crashes are reported by PosixPath. For instance:

11/16 18:07:42.025 metplus (met_util.py:109) INFO: Log file: /scratch1/NCEPDEV/stmp2/David.Huber/RUNDIRS/ss_151/metpg2g1.267696/grid2grid_step1/metplus_output/logs/ss_151/master_metplus_grid2grid_step1_pres_gatherbyVSDB_for20221109_runon20231116180742.log
11/16 18:07:42.054 metplus.StatAnalysis (met_util.py:215) ERROR: Fatal error occurred
Traceback (most recent call last):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/pkgutil.py", line 417, in get_importer
    importer = sys.path_importer_cache[path_item]
KeyError: PosixPath('/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/wrappers')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/util/met_util.py", line 162, in run_metplus
    module = import_module(package_name)
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/wrappers/__init__.py", line 30, in <module>
    for (_, module_name, _) in iter_modules([package_dir]):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/pkgutil.py", line 129, in iter_modules
    for i in importers:
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/pkgutil.py", line 421, in get_importer
    importer = path_hook(path_item)
  File "<frozen importlib._bootstrap_external>", line 1632, in path_hook_for_FileFinder
  File "<frozen importlib._bootstrap_external>", line 1504, in __init__
  File "<frozen importlib._bootstrap_external>", line 182, in _path_isabs
AttributeError: 'PosixPath' object has no attribute 'startswith'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/util/met_util.py", line 171, in run_metplus
    raise NameError("There was a problem loading %s wrapper." % item)
NameError: There was a problem loading StatAnalysis wrapper.
11/16 18:07:42.054 metplus.StatAnalysis (met_util.py:215) ERROR: Fatal error occurred
Traceback (most recent call last):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/pkgutil.py", line 417, in get_importer
    importer = sys.path_importer_cache[path_item]
KeyError: PosixPath('/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/wrappers')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/util/met_util.py", line 162, in run_metplus
    module = import_module(package_name)
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c/metplus/wrappers/__init__.py", line 30, in <module>
    for (_, module_name, _) in iter_modules([package_dir]):
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/pkgutil.py", line 129, in iter_modules
    for i in importers:
  File "/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/python-3.10.8-oqvn6sa/lib/python3.10/pkgutil.py", line 421, in get_importer
    importer = path_hook(path_item)
  File "<frozen importlib._bootstrap_external>", line 1632, in path_hook_for_FileFinder
  File "<frozen importlib._bootstrap_external>", line 1504, in __init__
  File "<frozen importlib._bootstrap_external>", line 182, in _path_isabs
AttributeError: 'PosixPath' object has no attribute 'startswith'

What should have happened?

The METplus jobs should be able to run without error.

What machines are impacted?

Hera

Steps to reproduce

  1. git clone git@github.com:DavidHuber-NOAA/global-workflow -b spack-stack
  2. Setup a cycled experiment to run with DO_METP=YES
  3. Run through the first full cycle until gfsmetpg2g, etc kicks off and check the logs.

Additional information

The Python version used by spack-stack/1.5.1 is 3.10.8 which is much newer than is used on WCOSS2 (3.8.6). This was only tested on Hera, but suspected to be an issue on all spack-stack systems.

Do you have a proposed solution?

No response

@WalterKolczynski-NOAA
Copy link
Contributor

I did a test of METplus (outside of EMC_verif-global) with /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c vs. /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/metplus-5.1.0-n3vysib.

/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/gsi-addon/install/intel/2021.5.0/metplus-3.1.1-uu37v6c threw similar errors as described above.

/scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env/install/intel/2021.5.0/metplus-5.1.0-n3vysib worked though.

Can we try using metplus 5.1.0 (and whichever met version should go with it) within GW and see if that fixes things?

@DavidHuber-NOAA
Copy link
Contributor Author

Sure, I can give that a whirl.

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Nov 28, 2023
@DavidHuber-NOAA
Copy link
Contributor Author

I ran a test Grid2Grid job on Orion with met/11.1.0 and metplus/5.1.0. I first had to make a couple adjustments run_verif_global_in_global_workflow.sh, which I have captured in this branch.

This resulted in a number of errors when parsing MET configuration files. Possibly related, copious warnings about unused configuration settings were reported as well, though I seem to recall some of these were present in past, successful runs. I am guessing based on the log output that the format of MET configuration files has changed since version 9.1.3.

The log file can be found here: /work/noaa/global/para/com/ss_151/logs/2021111000/gfsmetpg2g1.log. To isolate a single error, the following grid_stat command

/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/envs/unified-env/install/intel/2022.0.2/met-11.1.0-bgfpwqe/bin/grid_stat -v 2 /work/noaa/global/dhuber/para/stmp/RUNDIRS/ss_151/metpg2g1.106507/grid2grid_step1/data/ss_151/f000.2021111000 /work/noaa/global/dhuber/para/stmp/RUNDIRS/ss_151/metpg2g1.106507/grid2grid_step1/data/ss_151/anom.truth.2021111000 /work/noaa/global/dhuber/GW/gw_spack/sorc/verif-global.fd/parm/metplus_config/metplus_use_cases/METplusV3.1/grid2grid/met_config/METV9.1/GridStatConfig_anom -outdir /work/noaa/global/dhuber/para/stmp/RUNDIRS/ss_151/metpg2g1.106507/grid2grid_step1/metplus_output/make_met_data_by_VALID/grid_stat/anom/ss_151

failed to parse the file /work/noaa/global/dhuber/para/stmp/RUNDIRS/ss_151/metpg2g1.106507/grid2grid_step1/metplus_output/tmp/met_config_110969_0 with an error on line 105 (below), failing on the character [ on "column 5", perhaps suggesting that the extra brackets are no longer needed:

   poly = [ ["/work/noaa/global/dhuber/GW/gw_spack/fix/fix_verif/vx_mask_files/grid2grid/NHX.nc", "/work/noaa/global/dhuber/GW/gw_spack/fix/fix_verif/vx_mask_files/grid2grid/SHX.nc", "/work/noaa/global/dhuber/GW/gw_spack/fix/fix_verif/vx_mask_files/grid2grid/TRO.nc", "/work/noaa/global/dhuber/GW/gw_spack/fix/fix_verif/vx_mask_files/grid2grid/PNA.nc"] ];

@malloryprow Would you mind taking a look at the log files and see if I am interpreting this correctly?

@malloryprow
Copy link
Contributor

Yeah, I was playing around with this earlier. The change from METplus v3 to METplus v4 is a big one. What worked in METplus v3 won't work in METplus v4. All the METplus config files will need to be updated accordingly. METplus v4 is much similar to METplus v5.

@malloryprow
Copy link
Contributor

While it is preferred to get this working with MET v9.1.3 and METplus v3.1.1, if we need to move forward with using MET v11.1.0 and METplus v5.1.0 I can get everything updated. It would probably take me a week or two to get everything changed and tested.

Please let me know ASAP what route we want to take.

@DavidHuber-NOAA
Copy link
Contributor Author

@malloryprow It would be nice to have all of the libraries/repositories moved over to spack-stack and it looks to me like that would mean we would need to upgrade to METplus 5.1. I would like to try playing around with METplus 3.1.1 and Python 3.10.8 a little more, which I will try tomorrow, just to check if we the PosixPath errors can be circumvented without a METplus upgrade.

If we do need to do an upgrade, I am happy to run tests for you from the global workflow.

@malloryprow
Copy link
Contributor

Thanks for giving the older versions another go. Keep me updated!

@malloryprow
Copy link
Contributor

I've been thinking about this more and the upgrades would be okay for stats, but the plotting capabilities that EMC_verif-global uses from METplus v3.1.1 are no longer supported in more recently version of METplus.

@DavidHuber-NOAA
Copy link
Contributor Author

I tracked this issue down to some specific versions of Python starting in mid-2021 here: python/cpython#88227. There is a fix for it in newer versions of Python, so we can probably ask the spack-stack team to get a newer version of Python in spack-stack/1.6.0. It looks like further down in the comments of that issue that version 3.10.11 (or maybe 3.11?) still had this issue, so we may need something newer than that, maybe 3.10.12. FYI @AlexanderRichert-NOAA @climbfuji.

@DavidHuber-NOAA
Copy link
Contributor Author

I'll open a spack-stack issue.

@malloryprow
Copy link
Contributor

Thank you @DavidHuber-NOAA!

@WalterKolczynski-NOAA
Copy link
Contributor

@aerorahul is going to set up a meeting next week with the relevant players to discuss a way forward should we not be able to make this work for METplus 3.1.1 (and maybe even if we are, for medium-term plans).

@DavidHuber-NOAA
Copy link
Contributor Author

This will be resolved with the upgrade to spack-stack 1.6.0 (#2195).

@malloryprow
Copy link
Contributor

This can be closed with #2239 merged.

@aerorahul
Copy link
Contributor

closed via #2195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants