Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/refs initial #528

Conversation

BinbinZhou-NOAA
Copy link
Contributor

Note to developers: You must use this PR template!

Description of Changes

REFS verification is a new component for EVS/v2.
(1) This is the initial version of REFS component is for EVS (v2), but its contents are very similar to those of HREF component
(2) The MET/METplus versions v12/v6 are used
(3) The restart capabilities are added to both stats and plots jobs
(4) The REFS is still in development stage, and it is routinely run by Matt Pyle.

Please include a summary of the changes and the related GitHub issue(s). Please also include relevant motivation and context.
N/A

Developer Questions and Checklist

  • Is this a high priorty PR? If so, why and is there a date it needs to be merged by?
    No.
  • Do you have any planned upcoming annual leave/PTO?
    No PTO leave within 2 months
  • Are there any changes needed for when the jobs are supposed to run?
    /NA
  • [y ] The code changes follow NCO's EE2 Standards.
  • [y ] Developer's name is removed throughout the code and have used ${USER} where necessary throughout the code.
  • [ y] References the feature branch for HOMEevs are removed from the code.
  • [ y] J-Job environment variables, COMIN and COMOUT directories, and output follow what has been defined for EVS.
  • [y ] Jobs over 15 minutes in runtime have restart capability.
  • [N ] If applicable, changes in the dev/drivers/scripts or dev/modulefiles have been made in the corresponding ecf/scripts and ecf/defs/evs-nco.def?
  • Since this PR is still in development stage, the REFS component is not added into the ecf/defs/evs-nco.def file
  • [y ] Jobs contain the appropriate file checking and don't run METplus for any missing data.
  • [y ] Code is using METplus wrappers structure and not calling MET executables directly.
  • [y ] Log is free of any ERRORs or WARNINGs.
    Some WARNINGs are still not resolved since they should be fixed by DTC

Testing Instructions

Since the REFS is still run by Matt Pyle, in the testing, the COMINrefs in all of the stat driver scripts should be set to the REFS output directory:
export COMINrefs=/lfs/h2/emc/ptmp/emc.lam/para/com/refs/v1.0

Note: If the testing is on a personal account, the following line should be added to the spcoutlook stat driver script:
export EVSINspcotlk=/lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/cam
or
export EVSINspcotlk=/lfs/h1/ops/prod/com/evs/v1.0/prep/cam

Test procedures:

Part 1. For the stats generation jobs

There are 3 stats jobs:
jevs_cam_refs_grid2obs_stats.sh
jevs_cam_refs_precip_stats.sh
jevs_cam_refs_spcoutlook_stats.sh

For each job, following scenarios should be tested:

For jevs_cam_refs_grid2obs_stats.sh:
Scenario 1: first run, no any interruption
In this case, all 3 output stat files (final stat files) should be generated and copied to the final stat directory
$COMOUT/cam/refs.$VDATE
In the mean while the restart directory "restart" should be created in the small stat directory
$COMOUT/cam/atmos.$VDATE/refs/grid2obs, in which all small stat files are saved (same as old version)
for gather processing or for restart run
In the restart directory, there are 4 sub-directories:
prepare, product, profile, and system

In the prepare sub-directory, there 2 *completed files: gfs_prepbufr.completed and rap_prepbufr.completed, and
sub-direcotory prepbufr.20240714 in which prepared prebbufr netCDF files are stored and for restart

In the product, or profile or system sub-directory, there are several *.completed files to indicate which sub-tasks are completed
If the grid2obs job is fully completed without interruption, the completed files for all of the sub-tasks should be presented in
these 3 sub-directories, respectively.

Scenario 2: the prepare process and a only part of the stats generation are completed but other part of stats generation is not.
Suppose all processes in profile and system are completed but the processes in product are not.
To simulate this scenario:
Step 1: delete all of the output from the scenario 1, including the final stat files in $COMOUT/cam/refs.$VDATE,
the small stat files and restart sub-directory in the $COMOUT/cam/atmos.$VDATE/refs/grid2obs

Step 2: submit the driver script jevs_cam_refs_grid2obs_stats.sh

Step 3: waiting about 1 hour, and kill the grid2obs job

Step 4: re-submit the driver script jevs_cam_refs_grid2obs_stats.sh

Step 5: After it is completed, check the output final stat files, they should be same as the same files from scenario 1:

Repeat the above procedures for precip and spcoutlook jobs

Part 2 For the plot generation jobs

Note: since REFS is a new component for EVS, there are no REFS stat files in vpppg diectory
/lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/stats/cam
So please set COMIN in the plot driver scripts to be /lfs/h2/emc/vpppg/noscrub/binbin.zhou/evs/v2.0/stats/cam

There are 15 jobs:
7 jobs for 31-day score plots, 7 jobs for 90-day score plots and 1 job for precip spatial map:

All of those 15 jobs are pretty faster, most of them are less than 15i min, but still have restart capability except for spatial map job.

For each job, a restart sub-directory is created in the $COMOUT directory:
$COMOUT/atmos.$VDATE/restart, in which there are additional 2 sub-directories 31 and 90 (for 31-day and 90-day plots,respectively)
in either 31 or 90 sub-directory, there are 7 sub-directories for each of 31-day jobs:
refs_cape_plots
refs_ctc_plots
refs_ecnt_plots
refs_precip_plots
refs_profile_plots
refs_snowfall_plots
refs_spcoutlook_plots

In each of these directory, all of completed png files and their indicating files *.completed are stored.
In the restart run, each of the *completed will be checked, if it exists, its corresponding png file is copied to the
working directory, otherwise, it will be generated by the job in the working directory
After all of png, eitehr are generated from the job or copied from the restart directory, they are combined into a big
tar file and staved in the COMOUT/atmos.$DVATE directory

The testing procedures for those 15 jobs depend on their walltime

(1) Following jobs are less than 1 min:
jevs_cam_refs_grid2obs_cape_past31days_plots.sh
jevs_cam_refs_grid2obs_cape_past90days_plots.sh
jevs_cam_refs_grid2obs_ctc_past31days_plots.sh
jevs_cam_refs_grid2obs_ctc_past90days_plots.sh
jevs_cam_refs_precip_past31days_plots.sh
jevs_cam_refs_precip_past90days_plots.sh

The restart testing can be skipped, and just the normal runs are tested
Note, if testing the restart capability. The procedure is:
Step 1: launch the driver script, ushc as jevs_cam_refs_grid2obs_cape_past31days_plots.sh,
Step 2: wait about 30 or 40 seconds. kill the job and re-run the job
Step 3: Check the final tar file to see if it is ok or not

(2) Other jobs has walltime between 2 ~ 15 min:
jevs_cam_refs_profile_past31days_plots.sh
jevs_cam_refs_profile_past90days_plots.sh
jevs_cam_refs_grid2obs_ecnt_past31days_plots.sh
jevs_cam_refs_grid2obs_ecnt_past90days_plots.sh
jevs_cam_refs_snowfall_past31days_plots.sh
jevs_cam_refs_snowfall_past90days_plots.sh
jevs_cam_refs_spcoutlook_past31days_plots.sh
jevs_cam_refs_spcoutlook_past90days_plots.sh

The testing procedure is similar to the above but kill the job
after waiting 1 or 2 minutes

(3) jevs_cam_refs_precip_spatial_plots.sh
No restart capability in this job since it is less than 1 min

@@ -107,7 +110,7 @@ echo $COMPATH
# Execute the script.
#######################################################################

if [ $MODELNAME = firewxnest ]; then
if [ $MODELNAME = nam_firewxnest ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marcel's PR changed this to if [ $MODELNAME = firewxnest ]; then, please restore.

@@ -92,8 +96,7 @@ else
else
export COMOUTplots=${COMOUTplots:-$COMOUT/$RUN.$VDATE}
fi
export EVSINnam=${EVSINnam:-$COMIN/stats/$COMPONENT/nam_$MODELNAME}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marcel's PR added

export EVSINnam=${EVSINnam:-$COMIN/stats/$COMPONENT/nam_$MODELNAME}
export EVSINrrfs=${EVSINrrfs:-$COMIN/stats/$COMPONENT/rrfs_$MODELNAME}

Please restore.

@@ -140,7 +140,7 @@ fi
####################################
if [ $VERIF_CASE = radar ] || [ $VERIF_CASE = severe ]; then
$HOMEevs/scripts/${STEP}/${COMPONENT}/exevs_${COMPONENT}_${VERIF_CASE}_${STEP}.sh
elif [ $MODELNAME = nam_firewxnest ] || [ $MODELNAME = rrfs_firewxnest ]; then
elif [ $MODELNAME = nam_firewxnest ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marcel's PR added elif [ $MODELNAME = nam_firewxnest ] || [ $MODELNAME = rrfs_firewxnest ]; then, please restore.

@malloryprow
Copy link
Contributor

@BinbinZhou-NOAA I started reviewing the code changes for the dev/, ecf/, and jobs/ and left comments for things to be changed. Please make the changes and I'll continue the review.

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

Thanks! Can you also remove all instances of LOOP_ORDER in the METplus config files?

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

Something has gone awry with this PR again. There are changes to NFCENS and RTOFS related files.

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

You merged develop into feature/refs_initial. You should not be doing that. You should be merging feature/rrfs_refs_v1 into your branch feature/refs_intial.
16394a0

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

Please try this while in the location of the feature branch feature/refs_inital on WCOSS2

  1. git fetch
  2. git pull origin feature/refs_initial
  3. git reset --hard HEAD~1
  4. git push origin feature/refs_initial

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

I'm still seeing the NFCENS and RTOFS files from your merge of develop in this PR. Your merge of develop into feature/refs_initial is still the last commit of the branch.

@malloryprow
Copy link
Contributor

Please try this

  1. git fetch
  2. git pull origin feature/refs_initial
  3. git revert -m 1 16394a0e13f440021f30abd4a1dfccb82fb947f5
  4. git push origin feature/refs_initial

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

Make sure the new branch for the PR is off of feature/rrfs_refs_v1. In the future, do not merge changes from develop into your refs branch.

@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@BinbinZhou-NOAA BinbinZhou-NOAA deleted the feature/refs_initial branch August 27, 2024 18:50
@BinbinZhou-NOAA
Copy link
Contributor Author

BinbinZhou-NOAA commented Aug 27, 2024 via email

@malloryprow
Copy link
Contributor

Yes, all logging is controlled by machine.conf and logging settings should not be used in the the component METplus conf files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants