Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Rocky8 spack-stack installations on Jet #2377

Closed
3 of 14 tasks
DavidHuber-NOAA opened this issue Mar 6, 2024 · 23 comments · Fixed by #2458
Closed
3 of 14 tasks

Migrate to Rocky8 spack-stack installations on Jet #2377

DavidHuber-NOAA opened this issue Mar 6, 2024 · 23 comments · Fixed by #2458
Labels
feature New feature or request

Comments

@DavidHuber-NOAA
Copy link
Contributor

DavidHuber-NOAA commented Mar 6, 2024

What new functionality do you need?

Jet is upgrading to the Rocky8 Linux OS, which requires a new spack-stack installation on the platform. The global workflow, subcomponents, and external dependencies will need to be recompiled and tested on the platform before full transition in mid-April.

What are the requirements for the new functionality?

  • Spack-stack v1.6.0 installation
    • /lfs4/HFIP/hfv3gfs/role.epic/spack-stack/spack-stack-1.6.0/envs/gsi-addon-dev-rocky8
    • /lfs4/HFIP/hfv3gfs/role.epic/spack-stack/spack-stack-1.5.1/envs/unified-env-rocky8
  • All submodules transitioned to Rocky8
    • UFS
    • GSI
    • GDAS
    • GSI-Utils
    • GSI-Monitor
    • EMC_verif-global
    • UFS_Utils
    • GFS-Utils
  • External dependencies need to be rebuilt
    • Obsproc
    • Fit2Obs
    • TC_Tracker

Acceptance Criteria

The global workflow is able to run in both cycled and forecast-only modes at resolutions up to C384 with all subcomponents and external dependencies running successfully.

Suggest a solution (optional)

No response

@DavidHuber-NOAA DavidHuber-NOAA added feature New feature or request triage Issues that are triage labels Mar 6, 2024
@DavidHuber-NOAA
Copy link
Contributor Author

FYI @souopgui

@InnocentSouopgui-NOAA
Copy link
Contributor

There is a kJet maintenance downtime planned for 03/26;
After the maintenance, all Jets and all available front ends will be Rocky 8.
@DavidHuber-NOAA, How do we adjust the acceptance criteria for this PR?

@InnocentSouopgui-NOAA
Copy link
Contributor

InnocentSouopgui-NOAA commented Mar 26, 2024

All UFS model tests pass with Rocky8 install of spack-stack on Jet; Log file attached.
RegressionTests_jet.log

@DavidHuber-NOAA
Copy link
Contributor Author

@InnocentSouopgui-NOAA Since it won't be possible to validate against CentOS on Jet, I have adjusted the acceptance criteria to being able to run cycled and forecast-only experiments.

@InnocentSouopgui-NOAA
Copy link
Contributor

I have everything running smoothly on xjet and kjet partitions. The changes I made include:

  • global workflow version and module files for Jet
  • gfs_utils module files for Jet
  • gsi_enkf module files for Jet
  • gsi_monitor module files for Jet
  • gsi_utils module files for Jet
  • upp module files for Jet

@DavidHuber-NOAA, DO you want to have a look at those runs before I submit the pull requests for those changes?

@InnocentSouopgui-NOAA
Copy link
Contributor

InnocentSouopgui-NOAA commented Apr 8, 2024

Cycled experiments (48+ hours) at resolutions

  • 96/48 on xjet and kjet
  • 192/96 on kjet
  • 384/192 on kjet

Forecast only experiment (48+ hours) at resolutions

  • 48
  • 96
  • 192
  • 384

aerorahul pushed a commit to NOAA-EMC/gfs-utils that referenced this issue Apr 8, 2024
- Update Jet module file to use Rocky8 installation of spack-stack;
- Jet has been upgraded to the Rocky8 Linux OS and present module file
no longer works
  Resolves #60
  Refs NOAA-EMC/global-workflow#2377
@InnocentSouopgui-NOAA
Copy link
Contributor

@DavidHuber-NOAA How to run a forecast only experiment on Jet?
I am getting the following message from setup_expt.py

forecast-only mode treats ICs differently and cannot be staged here

@DavidHuber-NOAA
Copy link
Contributor Author

@InnocentSouopgui-NOAA If you run an S2SW forecast-only experiment, they should populate automatically from files stored on-site in /mnt/lfs4/HFIP/hfv3gfs/glopara/data/ICSDIR/prototype_ICs. However, the files on Jet may need to be updated. If you opt to run an atm-only experiment, then ICs will need to be pulled from HPSS. A guide is available here.

@KateFriedman-NOAA @WalterKolczynski-NOAA Would one of you be able to run an rsync to check/update the P8 coupled IC files on Jet?

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Apr 12, 2024

I synced some of the prototype_ICs to Jet this morning (the workflow_C*_refactored ones). The other prototype_ICs have symlinks within to others hosted files so we need to sort out how we wanna sync those to Jet.

Based on the settings in config.stage_ic you can use the synced workflow_C48_refactored ICs to run the C48 S2SW case on Jet (see ci/cases/pr/C48_S2SW.yaml). @InnocentSouopgui-NOAA Let me know if that doesn't work.

@erinaj16
Copy link

I have been trying to recompile and test the global workflow that I had compiled on Jet’s CentOS7 on Rocky8. The workflow was previously compiled using intel/18.0.5.274 and impi/2018.4.274, and this version is compiled using intel/2022.1.2 and impi/2022.1.2. I have recompiled the workflow, including the global model component and the same version of all previously utilized libraries. When I test the global model using the same initial conditions as previously run on the CentOS7 workflow, however, I get noticeable differences between the two versions. At 0 h, I am getting a difference of up to 1.5% the post-processed 250 hPa horizontal wind speed. By 72 h, this difference grows to up to 15.5%.

To my knowledge, I am using the same model version, the same post-processor version (which when tested alone does not seem to have any issues in reproducibility), the same initial conditions, the same namelist file, the same version of the dependency libraries, the same fix files, and stochastic physics turned off in both versions. The only differences that I can tell are the compiler and MPI versions. Are differences this large, starting from 0h, expected from only changing the versions of compiler and/or MPI? Have others been facing a similar issue?

Thank you for any help you can provide or if you can direct me to the best place to ask this question!

latlon_diff_rmse_test_fourbinRocky8_vFDL_fourbin_UV_250_2017090106_000
latlon_diff_rmse_test_fourbinRocky8_vFDL_fourbin_UV_250_2017090106_072

@KateFriedman-NOAA
Copy link
Member

@erinaj16 I would expect some differences, even using the same code and other libraries (assuming nothing else changed beyond intel and OS). This is a big jump in intel version and a new OS for the version you're trying to continue using. If we saw large differences between the version of the system immediately preceding the Rocky8 upgrade (CentOS7 spack-stack) and the system ported to Rocky8 spack-stack then I would be concerned. We can't support older versions (e.g. intel 2018) so I'm not sure how to help from the workflow side of things.

From your information it seems like it's just the forecast model showing differences? If so, you could show the above to the ufs-weather-model folks and see if they have some thoughts on the differences. They likely can't support older versions either though, just a caution. :-/

@KateFriedman-NOAA
Copy link
Member

@InnocentSouopgui-NOAA I merged the Fit2Obs PR, cut a new tag (wflow.1.1.1) and installed it on Jet: /lfs4/HFIP/hfv3gfs/glopara/git/Fit2Obs/v1.1.1

I will install this new tag on all supported platforms. You can update the fit2obs_ver to 1.1.1 versions/run.spack.ver: https://github.com/NOAA-EMC/global-workflow/blob/develop/versions/run.spack.ver#L35

@KateFriedman-NOAA
Copy link
Member

I have installed the Fit2Obs tag on Hera and both WCOSS2s.

@WalterKolczynski-NOAA please install this new Fit2Obs tag on Orion, thanks!

git clone -b wflow.1.1.1 https://github.com/NOAA-EMC/Fit2Obs.git ./v1.1.1
INSTALL_PREFIX=/PATH/TO/FIT2OBS/v1.1.1 ./ush/build.sh

@KateFriedman-NOAA
Copy link
Member

@InnocentSouopgui-NOAA I have installed the updated prepobs dev/gfsv17 branch here on Jet:
/lfs4/HFIP/hfv3gfs/glopara/git/prepobs/dev-gfsv17

Please update this line in the Jet workflow modulefile: https://github.com/NOAA-EMC/global-workflow/blob/develop/modulefiles/module_base.jet.lua#L46. Change "feature-GFSv17_com_reorg_log_update" to "dev/gfsv17".

@InnocentSouopgui-NOAA
Copy link
Contributor

I have installed the Fit2Obs tag on Hera and both WCOSS2s.

@WalterKolczynski-NOAA please install this new Fit2Obs tag on Orion, thanks!

git clone -b wflow.1.1.1 https://github.com/NOAA-EMC/Fit2Obs.git ./v1.1.1
INSTALL_PREFIX=/PATH/TO/FIT2OBS/v1.1.1 ./ush/build.sh

@KateFriedman-NOAA, @DavidHuber-NOAA,
Should we leave the Fit2Obs version upgrade to a separate pull request since it affects all systems?

@KateFriedman-NOAA
Copy link
Member

KateFriedman-NOAA commented Apr 24, 2024

@InnocentSouopgui-NOAA I have approved your TC_tracker PR but will wait and give the TC_tracker CM a chance to review, comment, and/or approve before I merge and update our installs.

@KateFriedman-NOAA
Copy link
Member

Should we leave the Fit2Obs version upgrade to a separate pull request since it affects all systems?

@InnocentSouopgui-NOAA Since the update within Fit2Obs is only for building on Jet, the other systems shouldn't be impacted by going to this new version. We can run CI tests for the PR branch on the other platforms to check for any impacts.

@InnocentSouopgui-NOAA
Copy link
Contributor

@InnocentSouopgui-NOAA Since the update within Fit2Obs is only for building on Jet, the other systems shouldn't be impacted by going to this new version. We can run CI tests for the PR branch on the other platforms to check for any impacts.

It looks like the runtime version for Fit2Obs is pulled from <GW>/versions/run.spack.ver with the variable fit2obs_ver. Though we can override it for Jet only in <GW>/versions/run.jet.ver

@KateFriedman-NOAA
Copy link
Member

It looks like the runtime version for Fit2Obs is pulled from <GW>/versions/run.spack.ver with the variable fit2obs_ver. Though we can override it for Jet only in <GW>/versions/run.jet.ver

Yep! If we hit issues with Fit2Obs in the CI testing before merge then we can change it just for Jet in run.jet.ver and make the fit2obs_ver global update after the PR for this goes in.

@InnocentSouopgui-NOAA
Copy link
Contributor

InnocentSouopgui-NOAA commented Apr 24, 2024

@InnocentSouopgui-NOAA I merged the Fit2Obs PR, cut a new tag (wflow.1.1.1) and installed it on Jet: /lfs4/HFIP/hfv3gfs/glopara/git/Fit2Obs/v1.1.1

I will install this new tag on all supported platforms. You can update the fit2obs_ver to 1.1.1 versions/run.spack.ver: https://github.com/NOAA-EMC/global-workflow/blob/develop/versions/run.spack.ver#L35

@KateFriedman-NOAA,
can you please update the .lua filename from 1.1.0.lua to 1.1.1.lua in
/lfs4/HFIP/hfv3gfs/glopara/git/Fit2Obs/v1.1.1/modulefiles/fit2obs/1.1.0.lua?

@KateFriedman-NOAA
Copy link
Member

can you please update the .lua filename from 1.1.0.lua to 1.1.1.lua in
/lfs4/HFIP/hfv3gfs/glopara/git/Fit2Obs/v1.1.1/modulefiles/fit2obs/1.1.0.lua?

My bad, sorry, forgot we needed to update that within the VERSION file in Fit2Obs. I have updated the version, recut the tag, and reinstalled on Jet. Please try again, thanks!

@InnocentSouopgui-NOAA
Copy link
Contributor

InnocentSouopgui-NOAA commented Apr 25, 2024

@DavidHuber-NOAA, we are missing two things for the migration to Rocky8 on Jet.

  • the approve/merge of the pull request for GSI;
  • the approve/merge of the pull request for TC_tracker.

Shall I open the pull request for the migration of global workflow for review while waiting for those two, or wait a little?

@DavidHuber-NOAA
Copy link
Contributor Author

DavidHuber-NOAA commented Apr 25, 2024

@InnocentSouopgui-NOAA Yes, I think it's fine if you do that. I will ping Russ and Jiayi on the GSI and TC_trracker PRs, respectively, to see if we can get them merged.

EDIT:
I see that the TC_tracker is being tested presently, so I will hold off on that ping.

WalterKolczynski-NOAA pushed a commit that referenced this issue May 10, 2024
# Description
Migrates Global Workflow to Rocky8 spack-stack installations on Jet.
Jet has moved from CentOS7 to Rocky8.

Resolves #2377
Refs NOAA-EMC/UPP#919
Refs NOAA-EMC/gfs-utils#60
Refs NOAA-EMC/GSI#732
Refs NOAA-EMC/GSI-Monitor#130
Refs NOAA-EMC/GSI-utils#33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants