Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add observation preparation job for aerosols DA to workflow #2624

Merged
merged 32 commits into from
Jun 14, 2024

Conversation

ypwang19
Copy link
Contributor

@ypwang19 ypwang19 commented May 23, 2024

Description

This PR added a prepaeroobs job to prepare aerosol obs files for DA. This job does quality control of the VIIRS aerosol raw observations and convert them to ioda format.

Resolves #2623

Type of change

  • New feature (adds functionality)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO

How has this been tested?

  • Cycled test on Hera

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • I have made corresponding changes to the documentation if necessary

@ypwang19 ypwang19 marked this pull request as draft May 23, 2024 20:23
ypwang19 and others added 11 commits May 24, 2024 13:49
…AA-EMC#2614)

- Updating STMP and PTMP settings in host file for Orion and Hercules
because they are cross mounted.
- Also took the opportunity to finally update **SLURM_ACCOUNT** to
**HPC_ACCOUT** in CI over rides.
- Added a refactor of the `rocotostat.py` tool that is more pythonic and
as a execute retry feature because the `rocotostat` utility on Orion has
been failing sometimes.
* origin:
  Change GRIB2 parameter names and vertical levels for ocean/ice post (NOAA-EMC#2611)
  Add atmensanlfv3inc job (NOAA-EMC#2592)
  Global-workflow (AR) Generic updates for Gaea C5 (NOAA-EMC#2515)
  Update STMP and PTMP settings in host file for Orion and Hercules  (NOAA-EMC#2614)
env/HERCULES.env Fixed Show fixed Hide fixed
@ypwang19 ypwang19 marked this pull request as ready for review May 30, 2024 21:13
@ypwang19
Copy link
Contributor Author

@CoryMartin-NOAA @andytangborn This PR is ready for review. Any comments and suggestions are welcome.

env/HERA.env Outdated Show resolved Hide resolved
jobs/JGLOBAL_PREP_AERO_OBS Outdated Show resolved Hide resolved
jobs/JGLOBAL_PREP_AERO_OBS Outdated Show resolved Hide resolved
parm/config/gfs/config.base Outdated Show resolved Hide resolved
##############################################

# Generate COM variables from templates
YMD=${PDY} HH=${cyc} declare_from_tmpl -rx COM_OBS COM_CHEM_ANALYSIS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COM_OBS should be considered read-only. In development we copy obs from the dump archive to a location within our directories, but in operations we use them directly from obsproc. We can make new prep COM variables in config.com for any components that need them.

@CoryMartin-NOAA @guillaumevernieres I think there are other jobs recently that have the same issue that we just let through that will need to be updated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to have a bigger discussion on this. There are obs for the non-atmospheric components that are not being processed through the traditional obsproc channels (in particular the aerosol and SOCA datasets). I'll send an email to follow up and schedule a meeting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My solution would be to create a COM_<component>_PREP_TMPL (or similar; waves already has a prep) to store pre-processed stuff.

Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments


export OBSPROCYAML="${PARMgfs}/gdas/aero/obs/lists/gdas_aero_obsproc.yaml.j2"
export OBSPROCEXE="${EXECgfs}/gdas_obsprovider2ioda.x"
export VIIRS_DATA_DIR="/scratch2/NCEPDEV/stmp3/Yaping.Wang/VIIRS/AWS/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to change this to a more "official" location for at least the CI test period before the PR can be successfully merged in.

parm/config/gfs/config.prepaeroobs Outdated Show resolved Hide resolved
jobs/JGLOBAL_PREP_AERO_OBS Outdated Show resolved Hide resolved
List needed raw obs files.
Copy the raw obs files to $DATA/obs.
Link over the needed executable.
Generate corrosponding YMAL file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Generate corrosponding YMAL file.
Generate corresponding YMAL file.

"""
self.task_config.COM_OBS_CHEM = os.path.join(self.task_config.COM_OBS, 'chem')
if os.path.exists(self.task_config.COM_OBS_CHEM):
rmdir(self.task_config.COM_OBS_CHEM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we positive we want to remove the directory if it exists?

ush/python/pygfs/task/aero_prepobs.py Outdated Show resolved Hide resolved
HH = fshort[0][26:28]
MM = fshort[0][28:30]
fstart = to_datetime(yyyy + '-' + mm + '-' + dd + 'T' + HH + ':' + MM + 'Z')
if sensor in file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need to be more robust, for example, what if the directory path is:
/work2/noaa/cmartin/exp/use-n20/data/n21.nc, it will be true for both n20 and n21.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still think this comment stands, but perhaps I'm worrying too much.
Can you do if sensor in os.path.basename(file) or something like that?

dd = fshort[0][24:26]
HH = fshort[0][26:28]
MM = fshort[0][28:30]
fstart = to_datetime(yyyy + '-' + mm + '-' + dd + 'T' + HH + ':' + MM + 'Z')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fstart = to_datetime(yyyy + '-' + mm + '-' + dd + 'T' + HH + ':' + MM + 'Z')
fstart = to_datetime(f'{yyyy}-{mm}-{dd}T{HH}:{MM}Z')

jobs/JGLOBAL_PREP_OBS_AERO Fixed Show fixed Hide fixed
jobs/JGLOBAL_PREP_OBS_AERO Fixed Show fixed Hide fixed
jobs/JGLOBAL_PREP_OBS_AERO Fixed Show fixed Hide fixed
@CoryMartin-NOAA
Copy link
Contributor

@ypwang19 can you merge in develop and resolve the conflicts? I'll review what is here in the meantime

@DavidHuber-NOAA DavidHuber-NOAA self-requested a review June 14, 2024 12:52
export npe_prepobsaero=1
export nth_prepobsaero=1
export npe_node_prepobsaero=1
export memory_prepobsaero="96GB"
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this prep job really need this much memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, but it is somewhat intensive, but this is probably an order of magnitude too large, I can test next week with less memory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed later.

@CoryMartin-NOAA
Copy link
Contributor

FYI @ypwang19 is now on maternity leave, so I'll have to pick up this PR to get it finished, so there may be some delay in responses to reviews.

env/WCOSS2.env Outdated Show resolved Hide resolved
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good overall, just one fix for WCOSS2 and a few suggestions.

CoryMartin-NOAA and others added 5 commits June 14, 2024 14:34
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
@CoryMartin-NOAA
Copy link
Contributor

@aerorahul @DavidHuber-NOAA @WalterKolczynski-NOAA Yaping is now on maternity leave. What do you suggest in terms of getting this PR over the finish line? My suggestions in no particular order are:

  • Merge as is (it's possible everything is fine, may need to stage some obs)
  • Merge this except the task definition stuff so that nothing breaks
  • Close the PR and I open a new one in my fork so I can make necessary edits
  • Other options?

@@ -178,6 +178,7 @@ export DO_WAVE="NO"
export DO_OCN="NO"
export DO_ICE="NO"
export DO_AERO="NO"
export DO_PREP_OBS_AERO="YES"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CoryMartin-NOAA How about setting this to NO for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that is all that is needed to ensure it doesn't break anything

Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of remaining issues, but those can be addressed later IMO. Once an issue is created, I'm good with merging this.

# Set variables used in the script
##############################################

export COMIN_OBS="${DATA}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will likely need to be updated, but that can wait since the new job is now off by default. Can you create an issue for this @CoryMartin-NOAA?


export OBSPROCYAML="${PARMgfs}/gdas/aero/obs/lists/gdas_aero_obsproc.yaml.j2"
export OBSPROCEXE="${EXECgfs}/gdas_obsprovider2ioda.x"
export VIIRS_DATA_DIR="/scratch2/NCEPDEV/stmp3/Yaping.Wang/VIIRS/AWS/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could be wrapped into the new issue.

export npe_prepobsaero=1
export nth_prepobsaero=1
export npe_node_prepobsaero=1
export memory_prepobsaero="96GB"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed later.

try:
for file in allfiles:
basename = os.path.basename(file)
pattern = r"s(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{3})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pattern could be defined outside the for loop.

logger.info("Found %d matching files.", len(matching_files))
except FileNotFoundError:
logger.error("The specified file/directory does not exist.")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should raise FileNotFoundError(message)

@DavidHuber-NOAA DavidHuber-NOAA merged commit 6c93b45 into NOAA-EMC:develop Jun 14, 2024
4 checks passed
KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this pull request Jun 14, 2024
* origin/develop:
  Add observation preparation job for aerosols DA to workflow (NOAA-EMC#2624)
  Remove ocean daily files (NOAA-EMC#2689)
  Update Jenkinsfile
  Add Hercules-EMC to the Jenkins configurable parameter list (NOAA-EMC#2685)
  Update gdas.cd and gsi_utils hashes (NOAA-EMC#2641)
  Add ability to use GEFS replay ICs (NOAA-EMC#2559)
  Replace `sleep` with `wait_for_file` (NOAA-EMC#2586)
  Add COM template for JEDI obs (NOAA-EMC#2678)
  Link both global-nest fix files and non-nest ones at the same time (NOAA-EMC#2632)
  Update ufs-weather-model  (NOAA-EMC#2663)
  Add ability to process ocean/ice products specific to GEFS (NOAA-EMC#2561)
  Update cleanup job to use COMIN/COMOUT (NOAA-EMC#2649)
  Add overwrite to creat experiment in BASH CI (NOAA-EMC#2676)
  Add handling to select CRTM cloud optical table based on cloud scheme and update calcanal_gfs.py  (NOAA-EMC#2645)

Refs NOAA-EMC#2475
RussTreadon-NOAA pushed a commit that referenced this pull request Jun 24, 2024
Add a prepaeroobs job to prepare aerosol obs files for DA.

This job does quality control of the VIIRS aerosol raw observations and
convert them to ioda format.

Resolves #2623 
---------

Co-authored-by: ypwang19 <yaping.wang@users.noreply.github.com>
Co-authored-by: TerrenceMcGuinness-NOAA <terrence.mcguinness@cox.net>
Co-authored-by: Cory Martin <cory.r.martin@noaa.gov>
Co-authored-by: David Huber <69919478+DavidHuber-NOAA@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add observation file preparation job for aerosols to workflow
6 participants