Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove implicit symlink names #2527

Merged

Conversation

WalterKolczynski-NOAA
Copy link
Contributor

Description

Lustre has a defect under Rocky 9 that results in symlink sometimes failing when the link name is not explicit. This updates all link creation to use explicit names.

config.base is updated to turn off two monitor jobs on Hercules because the executables are not yet built there. This, combined with the previous change, should make workflow available for use on Hercules.

Also removes the redundant utility names for NCP, NLN, etc. in the gdas scripts that are already defined in config.base.

Resolves #2131
Resolves #2522

Type of change

  • Bug fix (fixes something broken)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

  • Cycled atm-only test on Hercules
  • Forecast-only S2SWA test on Hercules
  • Full product test on WCOSS2

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@WalterKolczynski-NOAA WalterKolczynski-NOAA self-assigned this Apr 24, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Apr 24, 2024
@WalterKolczynski-NOAA
Copy link
Contributor Author

Not going to do it now, but after MSU comes back from maintenance, will also run this through CI on Hercules.

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Apr 24, 2024
@WalterKolczynski-NOAA WalterKolczynski-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Apr 24, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera labels Apr 24, 2024
@emcbot emcbot added CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress and removed CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Apr 24, 2024
@emcbot
Copy link

emcbot commented Apr 24, 2024

Experiment C96_atmaerosnowDA FAILED on Hera with error logs:

/scratch1/NCEPDEV/global/CI/2527/RUNTESTS/COMROOT/C96_atmaerosnowDA_d0051a7a/logs/2021122012/gdasfcst.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Apr 24, 2024
@emcbot
Copy link

emcbot commented Apr 24, 2024

Experiment C96_atmaerosnowDA FAILED on Hera
in/scratch1/NCEPDEV/global/CI/2527/RUNTESTS/C96_atmaerosnowDA_d0051a7a

parm/config/gfs/config.base Outdated Show resolved Hide resolved
ush/wave_grid_interp_sbs.sh Outdated Show resolved Hide resolved
@DavidHuber-NOAA
Copy link
Contributor

It looks like the snowDA test failed as it tried to copy 21Z restart files. I don't think these should exist for 3dvar experiments. It seems like forecast_postdet.sh should be keying on iaufhrs instead of restart_interval.

@DavidHuber-NOAA
Copy link
Contributor

It looks like the snowDA test failed as it tried to copy 21Z restart files. I don't think these should exist for 3dvar experiments. It seems like forecast_postdet.sh should be keying on iaufhrs instead of restart_interval.

Actually, I think the proper solution is to modify the restart_interval based on DOHYBVAR (3 if DOHYBVAR, else 6) in config.fcst.

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. did not test.

Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed label Apr 26, 2024
@WalterKolczynski-NOAA
Copy link
Contributor Author

gfswavepostpnt is hanging during the MPMD on Hercules. Works on other machines. @DavidHuber-NOAA is taking a look since he didn't see it during the original Hercules port.

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA @DavidHuber-NOAA - Let me know if you need a waves person to take a look.

@DavidHuber-NOAA
Copy link
Contributor

After some digging, I found the following error in two members of an MPMD job:

slurmstepd: error: *** STEP 1084715.0 STEPD TERMINATED ON hercules-03-46 AT 2024-04-25T18:17:33 DUE TO JOB NOT ENDING WITH SIGNALS ***

Based on log output, all of the subtasks completed successfully, but some of the members seem to not be returning proper statuses to srun, which causes the job to stall. It's not clear what is causing this to happen. I've forwarded this information to the Hercules helpdesk.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Apr 30, 2024
@emcbot emcbot added CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS and removed CI-Wcoss2-Ready **CM use only** PR is ready for CI testing on WCOSS labels Apr 30, 2024
@emcbot
Copy link

emcbot commented Apr 30, 2024

Experiment C48_S2SW FAILED on Hera
in/scratch1/NCEPDEV/global/CI/2527/RUNTESTS/C48_S2SW_06f69666

@WalterKolczynski-NOAA WalterKolczynski-NOAA added CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera and removed CI-Hera-Failed **Bot use only** CI testing on Hera for this PR has failed labels Apr 30, 2024
@emcbot emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera CI-Wcoss2-Building **Bot use only** CI testing is cloning/building on WCOSS labels Apr 30, 2024
@emcbot
Copy link

emcbot commented Apr 30, 2024

Automated global-workflow Testing Results:

Machine: Wcoss2
Start: Tue Apr 30 16:10:26 UTC 2024 on clogin05
---------------------------------------------------
Build: Completed at 04/30/24 04:21:51 PM
Case setup: Completed for experiment C48_ATM_64dd7fa0
Case setup: Skipped for experiment C48mx500_3DVarAOWCDA_64dd7fa0
Case setup: Skipped for experiment C48_S2SWA_gefs_64dd7fa0
Case setup: Completed for experiment C48_S2SW_64dd7fa0
Case setup: Completed for experiment C96_atm3DVar_64dd7fa0
Case setup: Skipped for experiment C96_atmaerosnowDA_64dd7fa0
Case setup: Completed for experiment C96C48_hybatmDA_64dd7fa0
Case setup: Skipped for experiment C96C48_ufs_hybatmDA_64dd7fa0

@emcbot emcbot added CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress and removed CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion labels Apr 30, 2024
@emcbot
Copy link

emcbot commented Apr 30, 2024

Experiment C48_ATM_64dd7fa0 SUCCESS on Wcoss2 at 04/30/24 05:36:16 PM

@emcbot
Copy link

emcbot commented Apr 30, 2024

Experiment C96C48_hybatmDA_64dd7fa0 SUCCESS on Wcoss2 at 04/30/24 06:42:26 PM

@emcbot
Copy link

emcbot commented Apr 30, 2024

Experiment C96_atm3DVar_64dd7fa0 SUCCESS on Wcoss2 at 04/30/24 06:48:17 PM

@emcbot
Copy link

emcbot commented Apr 30, 2024

Experiment C48_S2SW_64dd7fa0 SUCCESS on Wcoss2 at 04/30/24 06:51:10 PM

@emcbot emcbot added CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running **Bot use only** CI testing on WCOSS for this PR is in-progress labels Apr 30, 2024
@emcbot
Copy link

emcbot commented Apr 30, 2024

All CI Test Cases Passed on Wcoss2:

Experiment C48_ATM_64dd7fa0 *** SUCCESS *** at 04/30/24 05:36:16 PM
Experiment C96C48_hybatmDA_64dd7fa0 *** SUCCESS *** at 04/30/24 06:42:26 PM
Experiment C96_atm3DVar_64dd7fa0 *** SUCCESS *** at 04/30/24 06:48:17 PM
Experiment C48_S2SW_64dd7fa0 *** SUCCESS *** at 04/30/24 06:51:10 PM

@emcbot emcbot added CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Apr 30, 2024
@emcbot
Copy link

emcbot commented Apr 30, 2024

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2527

1 similar comment
@emcbot
Copy link

emcbot commented Apr 30, 2024

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2527

@emcbot emcbot added CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully and removed CI-Orion-Running **Bot use only** CI testing on Orion for this PR is in-progress labels Apr 30, 2024
@emcbot
Copy link

emcbot commented Apr 30, 2024

CI Passed Orion at
Built and ran in directory /work2/noaa/stmp/CI/ORION/2527

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 53b6764 into NOAA-EMC:develop Apr 30, 2024
7 of 10 checks passed
danholdaway added a commit to danholdaway/global-workflow that referenced this pull request May 2, 2024
* upstream/develop:
  Update gfs_utils for Gaea (NOAA-EMC#2556)
  Updated GEMPAK version and APRUN launcher. (NOAA-EMC#2555)
  Utilize scale-dependent localization for atmospheric analysis (NOAA-EMC#2542)
  Remove implicit symlink names (NOAA-EMC#2527)
  Fixes sea ice archiving (NOAA-EMC#2541)
  Link ensemble analysis increment files to COMROOT for warm_start (NOAA-EMC#2553)
  Launch Multiple Platforms to Jenkins with polling (NOAA-EMC#2548)
  Turn C48mx500_3DVarAOWCDA back on  (NOAA-EMC#2543)
  Add option to link different orog/ugwd fix files for global nest (NOAA-EMC#2532)
  Retire AWIPS GRIB1 products (NOAA-EMC#2547)
  Add CADS use flexibility (NOAA-EMC#2540)
  Hot fix for bash CI on WCOSS2 (NOAA-EMC#2536)
  Fix comment indentation (NOAA-EMC#2526)
danholdaway added a commit to danholdaway/global-workflow that referenced this pull request May 3, 2024
…di_exe

* upstream/develop:
  Add nest capability (NOAA-EMC#2545)
  Update gfs_utils for Gaea (NOAA-EMC#2556)
  Updated GEMPAK version and APRUN launcher. (NOAA-EMC#2555)
  Utilize scale-dependent localization for atmospheric analysis (NOAA-EMC#2542)
  Remove implicit symlink names (NOAA-EMC#2527)
  Fixes sea ice archiving (NOAA-EMC#2541)
  Link ensemble analysis increment files to COMROOT for warm_start (NOAA-EMC#2553)
  Launch Multiple Platforms to Jenkins with polling (NOAA-EMC#2548)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully CI-Orion-Passed **Bot use only** CI testing on Orion for this PR has completed successfully CI-Wcoss2-Passed **Bot use only** CI testing on WCOSS for this PR has completed successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent symlink failures on Rocky 9 Add CI for Hercules
5 participants