Define the Bukovsky masking regions for use in MET. #1940

JohnHalleyGotway · 2021-10-12T15:52:35Z

Describe the New Feature

On 9/21/2021, NOAA/EMC decided to start computing verification statistics using the Bukovsky regions. In particular, there were discussions about new regions being added (a southern region if I'm not mistaken), as well as defining CONUS with the East/West/Central/South Bukovsky definitions. This coincides with the EVSv1 and was already approved by the UFS V&V group.

Some of these areas needed to be cleaned up, specifically including the Eastern region near Maine, since part of Canada is included.

Here's the website: https://www.narccap.ucar.edu/contrib/bukovsky/

The regions are defined on a pretty coarse 1/2 degree grid, which is appropriate for climate simulations. However, applying them to high resolution regional domains will produced jagged results. Recommend coordinating with Dr. Bukovsky to define these regions in as a set of lat/lon points or shapefiles. That would enable a nice application of them to model domains of any resolution.

The tasks for this issue are:

Coordinate with Dr. Bukovsky to develop lat/lon or shapefile versions of these regions, if possible.
Coordinate with NOAA/EMC to ensure that we include CONUS-only versions of these regions.
Include the resulting lat/lon or shapfiles in the MET repository.
Coordinate with NOAA/EMC to test and validate their use.

There are 27 individual regions defined along with 13 region groups. However those 27 regions do not consider international borders. NOAA/EMC is specifically interested in 4 region groups: West, Central, East, and South. But they want those intersected with CONUS. Ideally, we'd provide definitions for each of the Bukovsky regions, along with their CONUS intersections. This may or may not be possible.

Acceptance Testing

List input data types and sources.
Describe tests required for new functionality.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2793541

Define the Metadata

Assignee

Select engineer(s) or no engineer required: @JohnHalleyGotway
Select scientist(s) or no scientist required: @j-opatz and @LoganDawson-NOAA

Labels

Select component(s)
Select priority
Select requestor(s)

Projects and Milestone

Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

METplus, MET, METdatadb, METviewer, METexpress, METcalcpy, METplotpy
No known issues needed at this time. But consider adding one for METplus wrappers and/or METplus-Training to demonstrate their use.

New Feature Checklist

See the METplus Workflow for details.

The text was updated successfully, but these errors were encountered:

j-opatz · 2021-11-04T18:27:28Z

After reaching out to Melissa Bukovsky about obtaining finer resolution files, she provided some good news; she did, in fact, have 1km resolution files on hand. However, the finer resolution suffer from the ragged-steps around bodies of water and land boundaries that were applied at the 0.5 degree resolution.

While looking into some of the files, it does appear there are some irregularities around the water bodies. This applies to large bodies (oceans, Great Lakes, etc) as well as small inland sources (lakes, smaller rivers, etc.). I went through and did a plot_data_plane image of the three areas that were described as a concern either by Melissa or noted in the original creation of the summary 4 regions by Logan: the northeast, the southeast, and the great lakes.

It's important to note that these files already had landmasking applied, so the 0's off the west coast of FL (for example) are the result of Melissa trimming off the west shift that happened from the original 1km source file. So it'd probably operate as desired when fed to MET, but the other pictures provided show there's some clean-up to do. Not to mention the possible issues with inland lakes being masked as water.
We do have access to the original 1km files (the ones that suffer from shifts) and we could apply our own landmasking in MET that better fits what EMC needs. We also need to answer if these regions should be individual files (size intensive but logically convenient) or one file with each region being accessed with a different value (e.g. the southeast being 17), which is size friendly but complicates access slightly.

j-opatz · 2022-02-10T19:48:40Z

Feedback from Logan:

We think it will be critical to have the Bukovsky regions available in the next release, as it sounds like METplus-4.1.0 and MET-10.1.0 will be the versions we will have installed into ops for EVS development (and potentially implementation). If it's not feasible to get the regions into the next coordinated METplus release, we would probably store them separately in our 'fix' directory within EVS.
Looking back at the latest status of the GitHub issue, I think we can likely live with some of the coastline shifts and lake masking that exists. That's less problematic than the grid cells with overlapping regions or the like.

After some feedback from the team, it sounds like these region files are best stored in the MET repository, along with the other region files.

There were two outstanding questions at the last time this issue was discussed:

Can we use the landmasked files we currently have
How should these regions be stored/accessed in MET

From the feedback, it sounds like the answer to Q1 is yes, the 1 km finer resolution files that Melissa provided will do the trick.
But the answer to Q2 is still unknown. I think the answer is in the size of these files: for reference, the size of each of the 1 km files is 302MB. I'm guessing that EMC will want access to all of the regions to create their own regions, so 5.4GB needs to be stored. Given that size requirement, I'd suggest we store all of these masks in one file, possibly complicating access for some, but keeping size constraints (hopefully) more reasonable.

LoganDawson-NOAA · 2022-02-11T17:33:08Z

@j-opatz given the size requirements, let's go with storing the masks in one file. As you mention, that will change the way the masks are accessed, but I think that may be okay for us at EMC.

Based on some of our recent EMC verification team discussions, I believe we will need to do some preprocessing and create separate masks files that are defined on the various grids we use for verification. This will allow us to save compute resources in operations. Therefore, the way the Bukovsky region masks are stored in the MET repository is less of a concern for us.

Hopefully that explanation makes sense. Please let me know if it doesn't.

j-opatz · 2022-03-07T17:25:18Z

From the METplus telecon today (3/7/22), this issue was elevated to a requirement for official release. It's also been affirmed that we'll use one file to store all of the regional files, which will create a list-style access in MET.

When we get to the testing/acceptability phase, we need to loop in Logan (@LoganDawson-NOAA) and Mallory (@malloryprow).

JohnHalleyGotway · 2022-04-22T16:50:27Z

@j-opatz and @LoganDawson-NOAA, here's some further developments.

MET does NOT like this NetCDF format because that lat/lon spacing does NOT remain constant. Reading it into a MET tool produces this error message:

ERROR  : get_data_from_lat_lon_vars() -> MET can only process Latitude/Longitude files where the longitudes are evenly spaced (dlon=0.00999451, delta[14]=0.00900269)

The non-constant lat/lon deltas suggest that it is NOT actually defined on a true lat/lon grid. Here are histograms of the deltas. Very close to 0.01, but not quite, and the deltas are not constant as they would be on true lat/lon grid.

I propose the following approach...
- Collapse the multiple masks down to a single field of integers which are the indices of the Bukovsky regions.
- Use python embedding with MET-10.1.0 to pass those indices as POINT DATA into the point2grid tool.
- Configure point2grid to interpolate the data to an actual 0.01 degree lat/lon grid.

Once the MET tools are happy with this file, I'll turn to how we can most easily use it to define masks in MET.

I'll plan to proceed with this approach unless I hear direction otherwise.

LoganDawson-NOAA · 2022-04-22T17:01:26Z

@JohnHalleyGotway thanks for doing all this digging into the issue!

I haven't used the high-resolution file myself at all, so I wasn't aware of these issues with that lat/lon grid definitions. Your proposed solution sounds reasonable to me!

JohnHalleyGotway · 2022-05-02T15:26:36Z

Documenting feedback we received from Melissa via email on this issue. Please see below. And this explanation makes the data much easier to work with. There is no need to process them as point data. I plan to use python embedding to manually define the grid, accounting for the shift that Melissa describes, and then regrid it to a 0.01 degree grid.

From Melissa:

Sorry these are causing problems for you. I've gone back and looked at what was done. Something just a bit odd seems to happen when the lat/lon arrays are attached to the netcdf file that I can't explain, though the arrays are still not exactly uniform before being written to the file either. The lat/lon arrays are produced via:

lats = fspan(15.25,75.25,6059)
lons = fspan(200.25,330.25,13050)

This gives a fairly uniform spacing of 0.00990 followed by additional inconsistent decimal places. The inconsistent part is an artifact from the change to the higher resolution up until this point. When added to the files, the lat/lon values are rounded or truncated adding to the inconsistency.

You can regrid these to a uniform grid. I see no problems with that - I've regridded the 0.5 deg resolution version to a variety of different model grids over the years. Or, you can try the shapefiles (the regions were regridded onto a uniform lat/lon WRF grid to facilitate the creation of the shapefiles).

I'm going to re-emphasize this caveat though...

All of the regions need to be shifted East by 0.25 or 0.50 degrees in the shapefiles (not sure which one, as I haven't used these yet, but I pretty confident this is related to the original resolution of the regions). The southeast region is the best example of this... there's a png of the shapefile plotted on the region in the shapefile folder (and attached), and the region mask should be better centered over FL, not offset to the left. When you take a look at these, keep that in mind -- it's an easy additive fix. And, I think it applies to the netcdf files too, not just the shapefiles.

JohnHalleyGotway · 2022-05-02T16:48:29Z

Feedback from @LoganDawson-NOAA on 5/2/2022. In general we should INCLUDE inland lakes inside the Bukovsky masking regions. Many model runs would not resolve those lakes anyway so they should be included.

JohnHalleyGotway · 2022-05-11T20:28:41Z

Based on discussion on 5/11/22, recommend the following order of operations:

Start with landmasked NetCDF files as input.
Process each to apply a CONUS mask (either CONUS.poly or a CONUS shapefile). Remove "internal" holes for water.
After 2, run gen_vx_mask to stitch together these regions into 1 field.
Run regrid_data_plane with python embedding to regrid to a 0.01 degree lat/lon grid that includes CONUS but not much else.
Run gen_vx_mask to translate the 17 CONUS sub-regions into 4 larger groups.

JohnHalleyGotway · 2022-06-13T16:34:29Z

Discussed at 6/13/2022 METplus NOAA telecon. Target to have this completed and mature by the end of July 2022. Recommend providing an initial version no later than July 1st.

JohnHalleyGotway · 2022-06-29T19:42:45Z

@LoganDawson-NOAA I'm looking for some direction on next steps.

The question at hand is whether we want to use the 50m or 110m resolution of the Natural Earth shapefiles.

Let me give you some background to illustrate.
So far we've:

Consolidated the Bukovsky regions down into 1 field defined at 0.01 degree resolution where the value at each grid point is the value of that mask at that point.
Filled in the inland lakes and waterways by defining the value of each water grid point as the nearest valid mask value.
Applied the Natural Earth shapefiles to exclude any grid point outside of CONUS. And that's the source of this question.

The top image shows the 110m resolution while the bottom image shows the 50m resolution:

You can see that the 110m version exactly corresponds to MET's map data because that's how we defined the map data.
Note some key diffs:

The 50m version shows much greater resolution in the Puget sound and along the west coast. It includes the Florida keys and detailed Chesapeake Bay. The 110m version does not.

While these are defined on a 0.01 degree grid, you'll actually run regrid_data_plane to regrid each mask to the grid being evaluated. Do you want to regrid FROM the coarser 110m version of the masks or from the higher resolution 50m version?

FYI, once we decide on the desired CONUS resolution, the next steps are:

Check for and fill in any NA values along the coast with the "correct" mask value.
Decide which mask to use for northern MN. 13 or 14? Or we could "split the difference" and replace each NA value there with the nearest non-NA mask value. The latter is actually the easiest solution.
Apply the 50m or 110m definitions of the Great Lakes to exclude them from the masks.
Update the metadata of the NetCDF file to include the masking region names corresponding to each mask value.
Provide a shell or python script example with calls to gen_vx_mask to apply these to an NCEP grid.

… is only for develompent purposes and not actually intended to be merged back into the develop branch.

JohnHalleyGotway · 2022-07-05T23:32:44Z

Here's a tarfile with the two versions to consider where the boundaries are defined by the 110m vs 50m natural earth shapefiles.
Bukovsky_CONUS.tar.gz

The remaining tasks are:

Decide on the desired resolution.
Add a 2D gridded variable to define the region groups.
Add a 1D variable to define the region group names.
Provide a script to easily apply these masks to a specific grid.

Or another alternative is enhancing the logic of the MET library code to handle these composite mask and make it easy for the to select which region should be extracted.

LoganDawson-NOAA · 2022-07-06T17:20:15Z

@JohnHalleyGotway this looks fantastic! I brought up the resolution question during our EVS meeting this morning, and we'd like to move forward with the 50 m resolution version.

Providing a simple script to interpolate these high-resolution masks to a specific grid would be preferable to enhancing the MET logic (at least in the short term). Since masking regions defined on each different verification grid can be treated as fixed files, it will be most efficient for us to do all of that regridding during our development phase before EVS code delivery as opposed to doing the regridding with each run of a MET command.

The final addition I see that's needed is defining the four aggregated regions that are shown in the map in the first comment on this issue. CONUS_East, CONUS_West, CONUS_South, and CONUS_Central might be good names to use for those. Do you need confirmation of which subregions belong to each regional aggregate?

JohnHalleyGotway · 2022-07-06T23:01:29Z

@LoganDawson-NOAA I believe this work is complete. Please see these 3 images showing the basic Bukovsky regions, the Bukovsky region groups, and the full CONUS region:

You can find the corresponding data in this tar file:
Bukovsky_CONUS.tar.gz

It contains:

Bukovsky_CONUS_regions_50m.nc: NetCDF file containing "region", "region_group", and "CONUS" variables, along with "region_name" and "region_group_name" to define the names for each value.
regions.ps, groups.ps, and conus.ps: Plots shown above
masks.ctable: Colortable I used for these plots.
dev_notes.txt: Some notes on the commands I ran to generate this. Note that I ran a modified version of gen_vx_mask from the feature_1940_bukovsky branch, but these changes should NOT be merged back into the develop branch.
gen_bukovsky.sh: A shell script to apply these masks to an NCEP grid. For example:

gen_bukovsky.sh G130 G130

Generates 22 output files, one for each of the basic regions, region groups, and the full CONUS region. For example, Bukovsky_G130_CONUS_South.nc contains a variable named CONUS_South. In MET config files, all you need to do is set poly = [ "Bukovsky_G130_CONUS_South.nc" ];. And that should apply that mask and write output with the VX_MASK column set to CONUS_South.

Note that it has been updated to require exactly 2 arguments (in case you want to define the target grid as something other than a named grid):

ERROR: ./gen_bukovsky.sh -> exactly 2 arguments are required!
ERROR: 1. Target grid NAME to define output file names
ERROR: 2. Target grid DEFINITION, as a named grid (e.g. G130),
ERROR:    grid specification string, or the path to a gridded data file

@LoganDawson-NOAA please review and let me know if this is what you're looking for.

Once you confirm, I'll close this issue.

LoganDawson-NOAA · 2022-07-08T16:00:57Z

@JohnHalleyGotway this current format is exactly what we were looking for! The script easily generates the mask files that are needed for different verification grids, and we can confirm that the MET output includes the mask VX_MASK names as expected.

I have one last ask that was an oversight on my part. Will you also include a CONUS region that is the union of CONUS_East, CONUS_West, CONUS_South, and CONUS_Central? Verification over the entire CONUS without any regional breakdowns is all that's required for some model/field combinations. Having a full CONUS region available would prevent us from having to generate stats over the four region groups before aggregating when generating graphics. This CONUS region could potentially be included in the region_group variable, but it would probably complicate/break your plot_data_plane command

JohnHalleyGotway · 2022-07-09T17:47:22Z

@LoganDawson-NOAA, sure no problem. Adding in CONUS was easy enough. Please note that I updated the contents of the comment above and reposted a new tar file to include the CONUS region. Also note that I modified the gen_bukovsky.sh script to take 2 arguments instead of 1.

Please re-review and let me know.

JohnHalleyGotway · 2022-07-26T21:40:12Z

Marking this issue as completed. I provided a NetCDF containing the Bukovsky regions to NOAA/EMC about 2 weeks ago, and they have been using them with no complaint.

Note that this NetCDF file was NOT added to the MET repository itself but may be stored in the NOAA EVS repository.

JohnHalleyGotway added this to the MET 10.1.0 milestone Oct 12, 2021

JohnHalleyGotway assigned JohnHalleyGotway, j-opatz and LoganDawson-NOAA Oct 12, 2021

TaraJensen removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels Oct 12, 2021

j-opatz changed the title ~~Add Bukovsky regions to be used as masking regions to the MET repository.~~ Investigating Bukovsky regions to be used as masking regions Oct 21, 2021

j-opatz added the required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone label Mar 7, 2022

JohnHalleyGotway changed the title ~~Investigating Bukovsky regions to be used as masking regions~~ Add Bukovsky masking region definitions to the MET repository. Mar 8, 2022

JohnHalleyGotway modified the milestones: MET 10.1.0, Consider for Next Release Mar 11, 2022

JohnHalleyGotway moved this from To Do to In Progress in MET 11.0.0-beta1 (6/22/22) Apr 28, 2022

TaraJensen added the reporting: DTC NOAA R2O NOAA Research to Operations DTC Project label Apr 28, 2022

JohnHalleyGotway added priority: high High Priority and removed priority: high labels May 9, 2022

JohnHalleyGotway removed this from MET 11.0.0-beta1 (6/22/22) Jun 13, 2022

JohnHalleyGotway added this to MET-11.0.0-beta2 (8/9/22) Jun 13, 2022

JohnHalleyGotway moved this to To Do in MET-11.0.0-beta2 (8/9/22) Jun 13, 2022

JohnHalleyGotway added required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project and removed required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone labels Jun 13, 2022

JohnHalleyGotway added this to MET 11.0.0-beta1 (6/22/22) Jun 22, 2022

JohnHalleyGotway moved this to To Do in MET 11.0.0-beta1 (6/22/22) Jun 22, 2022

JohnHalleyGotway removed this from MET 11.0.0-beta1 (6/22/22) Jun 22, 2022

JohnHalleyGotway moved this from To Do to In Progress in MET-11.0.0-beta2 (8/9/22) Jun 27, 2022

JohnHalleyGotway added a commit that referenced this issue Jun 29, 2022

Per #1940, add the NEAREST_VALID interp method. Note though that this…

327afd4

… is only for develompent purposes and not actually intended to be merged back into the develop branch.

JohnHalleyGotway added a commit that referenced this issue Jun 30, 2022

Per #1940, modified gen_vx_mask to exactly implement the logic needed.

64a578c

JohnHalleyGotway closed this as completed Jul 26, 2022

Repository owner moved this from In Progress to Done in MET-11.0.0-beta2 (8/9/22) Jul 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define the Bukovsky masking regions for use in MET. #1940

Define the Bukovsky masking regions for use in MET. #1940

JohnHalleyGotway commented Oct 12, 2021 •

edited by TaraJensen

Loading

j-opatz commented Nov 4, 2021

j-opatz commented Feb 10, 2022

LoganDawson-NOAA commented Feb 11, 2022

j-opatz commented Mar 7, 2022

JohnHalleyGotway commented Apr 22, 2022

LoganDawson-NOAA commented Apr 22, 2022

JohnHalleyGotway commented May 2, 2022

JohnHalleyGotway commented May 2, 2022

JohnHalleyGotway commented May 11, 2022

JohnHalleyGotway commented Jun 13, 2022

JohnHalleyGotway commented Jun 29, 2022 •

edited

Loading

JohnHalleyGotway commented Jul 5, 2022 •

edited

Loading

LoganDawson-NOAA commented Jul 6, 2022

JohnHalleyGotway commented Jul 6, 2022 •

edited

Loading

LoganDawson-NOAA commented Jul 8, 2022

JohnHalleyGotway commented Jul 9, 2022

JohnHalleyGotway commented Jul 26, 2022

Define the Bukovsky masking regions for use in MET. #1940

Define the Bukovsky masking regions for use in MET. #1940

Comments

JohnHalleyGotway commented Oct 12, 2021 • edited by TaraJensen Loading

Describe the New Feature

Acceptance Testing

Time Estimate

Sub-Issues

Relevant Deadlines

Funding Source

Define the Metadata

Assignee

Labels

Projects and Milestone

Define Related Issue(s)

New Feature Checklist

j-opatz commented Nov 4, 2021

j-opatz commented Feb 10, 2022

LoganDawson-NOAA commented Feb 11, 2022

j-opatz commented Mar 7, 2022

JohnHalleyGotway commented Apr 22, 2022

LoganDawson-NOAA commented Apr 22, 2022

JohnHalleyGotway commented May 2, 2022

JohnHalleyGotway commented May 2, 2022

JohnHalleyGotway commented May 11, 2022

JohnHalleyGotway commented Jun 13, 2022

JohnHalleyGotway commented Jun 29, 2022 • edited Loading

JohnHalleyGotway commented Jul 5, 2022 • edited Loading

LoganDawson-NOAA commented Jul 6, 2022

JohnHalleyGotway commented Jul 6, 2022 • edited Loading

LoganDawson-NOAA commented Jul 8, 2022

JohnHalleyGotway commented Jul 9, 2022

JohnHalleyGotway commented Jul 26, 2022

JohnHalleyGotway commented Oct 12, 2021 •

edited by TaraJensen

Loading

JohnHalleyGotway commented Jun 29, 2022 •

edited

Loading

JohnHalleyGotway commented Jul 5, 2022 •

edited

Loading

JohnHalleyGotway commented Jul 6, 2022 •

edited

Loading