Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the Bukovsky masking regions for use in MET. #1940

Closed
8 of 21 tasks
JohnHalleyGotway opened this issue Oct 12, 2021 · 25 comments
Closed
8 of 21 tasks

Define the Bukovsky masking regions for use in MET. #1940

JohnHalleyGotway opened this issue Oct 12, 2021 · 25 comments
Assignees
Labels
MET: Masking priority: high High Priority reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: NOAA/EMC NOAA Environmental Modeling Center required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: new feature Make it do something new
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Oct 12, 2021

Describe the New Feature

On 9/21/2021, NOAA/EMC decided to start computing verification statistics using the Bukovsky regions. In particular, there were discussions about new regions being added (a southern region if I'm not mistaken), as well as defining CONUS with the East/West/Central/South Bukovsky definitions. This coincides with the EVSv1 and was already approved by the UFS V&V group.

Some of these areas needed to be cleaned up, specifically including the Eastern region near Maine, since part of Canada is included.

Bukovsky_regions

Here's the website: https://www.narccap.ucar.edu/contrib/bukovsky/

The regions are defined on a pretty coarse 1/2 degree grid, which is appropriate for climate simulations. However, applying them to high resolution regional domains will produced jagged results. Recommend coordinating with Dr. Bukovsky to define these regions in as a set of lat/lon points or shapefiles. That would enable a nice application of them to model domains of any resolution.

The tasks for this issue are:

  • Coordinate with Dr. Bukovsky to develop lat/lon or shapefile versions of these regions, if possible.
  • Coordinate with NOAA/EMC to ensure that we include CONUS-only versions of these regions.
  • Include the resulting lat/lon or shapfiles in the MET repository.
  • Coordinate with NOAA/EMC to test and validate their use.

There are 27 individual regions defined along with 13 region groups. However those 27 regions do not consider international borders. NOAA/EMC is specifically interested in 4 region groups: West, Central, East, and South. But they want those intersected with CONUS. Ideally, we'd provide definitions for each of the Bukovsky regions, along with their CONUS intersections. This may or may not be possible.

Acceptance Testing

List input data types and sources.
Describe tests required for new functionality.

Time Estimate

Estimate the amount of work required here.
Issues should represent approximately 1 to 3 days of work.

Sub-Issues

Consider breaking the new feature down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2793541

Define the Metadata

Assignee

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Select Repository and/or Organization level Project(s) or add alert: NEED PROJECT ASSIGNMENT label
  • Select Milestone as the next official version or Future Versions

Define Related Issue(s)

Consider the impact to the other METplus components.

New Feature Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s) and Linked issues
    Select: Repository level development cycle Project for the next official release
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added requestor: NOAA/EMC NOAA Environmental Modeling Center type: new feature Make it do something new priority: high alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle MET: Masking labels Oct 12, 2021
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.1.0 milestone Oct 12, 2021
@TaraJensen TaraJensen removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue alert: NEED CYCLE ASSIGNMENT Need to assign to a release development cycle labels Oct 12, 2021
@j-opatz j-opatz changed the title Add Bukovsky regions to be used as masking regions to the MET repository. Investigating Bukovsky regions to be used as masking regions Oct 21, 2021
@j-opatz
Copy link
Contributor

j-opatz commented Nov 4, 2021

After reaching out to Melissa Bukovsky about obtaining finer resolution files, she provided some good news; she did, in fact, have 1km resolution files on hand. However, the finer resolution suffer from the ragged-steps around bodies of water and land boundaries that were applied at the 0.5 degree resolution.

While looking into some of the files, it does appear there are some irregularities around the water bodies. This applies to large bodies (oceans, Great Lakes, etc) as well as small inland sources (lakes, smaller rivers, etc.). I went through and did a plot_data_plane image of the three areas that were described as a concern either by Melissa or noted in the original creation of the summary 4 regions by Logan: the northeast, the southeast, and the great lakes.
GreatLakes_check
NorthAtlantic_check
landmask_check

It's important to note that these files already had landmasking applied, so the 0's off the west coast of FL (for example) are the result of Melissa trimming off the west shift that happened from the original 1km source file. So it'd probably operate as desired when fed to MET, but the other pictures provided show there's some clean-up to do. Not to mention the possible issues with inland lakes being masked as water.
We do have access to the original 1km files (the ones that suffer from shifts) and we could apply our own landmasking in MET that better fits what EMC needs. We also need to answer if these regions should be individual files (size intensive but logically convenient) or one file with each region being accessed with a different value (e.g. the southeast being 17), which is size friendly but complicates access slightly.

@j-opatz
Copy link
Contributor

j-opatz commented Feb 10, 2022

Feedback from Logan:

We think it will be critical to have the Bukovsky regions available in the next release, as it sounds like METplus-4.1.0 and MET-10.1.0 will be the versions we will have installed into ops for EVS development (and potentially implementation). If it's not feasible to get the regions into the next coordinated METplus release, we would probably store them separately in our 'fix' directory within EVS.
Looking back at the latest status of the GitHub issue, I think we can likely live with some of the coastline shifts and lake masking that exists. That's less problematic than the grid cells with overlapping regions or the like.

After some feedback from the team, it sounds like these region files are best stored in the MET repository, along with the other region files.

There were two outstanding questions at the last time this issue was discussed:

  • Can we use the landmasked files we currently have

  • How should these regions be stored/accessed in MET

From the feedback, it sounds like the answer to Q1 is yes, the 1 km finer resolution files that Melissa provided will do the trick.
But the answer to Q2 is still unknown. I think the answer is in the size of these files: for reference, the size of each of the 1 km files is 302MB. I'm guessing that EMC will want access to all of the regions to create their own regions, so 5.4GB needs to be stored. Given that size requirement, I'd suggest we store all of these masks in one file, possibly complicating access for some, but keeping size constraints (hopefully) more reasonable.

@LoganDawson-NOAA
Copy link

@j-opatz given the size requirements, let's go with storing the masks in one file. As you mention, that will change the way the masks are accessed, but I think that may be okay for us at EMC.

Based on some of our recent EMC verification team discussions, I believe we will need to do some preprocessing and create separate masks files that are defined on the various grids we use for verification. This will allow us to save compute resources in operations. Therefore, the way the Bukovsky region masks are stored in the MET repository is less of a concern for us.

Hopefully that explanation makes sense. Please let me know if it doesn't.

@j-opatz j-opatz added the required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone label Mar 7, 2022
@j-opatz
Copy link
Contributor

j-opatz commented Mar 7, 2022

From the METplus telecon today (3/7/22), this issue was elevated to a requirement for official release. It's also been affirmed that we'll use one file to store all of the regional files, which will create a list-style access in MET.

When we get to the testing/acceptability phase, we need to loop in Logan (@LoganDawson-NOAA) and Mallory (@malloryprow).

@JohnHalleyGotway JohnHalleyGotway changed the title Investigating Bukovsky regions to be used as masking regions Add Bukovsky masking region definitions to the MET repository. Mar 8, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Add Bukovsky masking region definitions to the MET repository. Define the Bukovsky masking regions for use in MET Mar 15, 2022
@JohnHalleyGotway JohnHalleyGotway changed the title Define the Bukovsky masking regions for use in MET Define the Bukovsky masking regions for use in MET. Mar 15, 2022
@georgemccabe
Copy link
Collaborator

georgemccabe commented Mar 21, 2022

A tar file with the 1km masking regions are available in a temporary location on the web here: https://dtcenter.ucar.edu/dfiles/code/METplus/METplus_Data/Bukovsky_1km.tgz
Per discussion in the METplus NOAA Telecon today, it is not urgent to make these files available on WCOSS and more investigation into these data may be needed to determine a suitable access point for them.

@j-opatz
Copy link
Contributor

j-opatz commented Mar 25, 2022

I've created an ensemble netCDF file of the regions: it's located on Seneca, under/d1/personal/jopatz/workbench/Bukovsky/Bukovsky_regions.nc.

@JohnHalleyGotway take a look and see if this format is acceptable for MET. If not, I can tweak whatever we need.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Apr 21, 2022

@j-opatz finally taking a look at this. I see that this data is stored as a 3-dimensional array of doubles:
double landmask(region, lat, lon);
That yields a whopping 11GB file.
If we keep it this way, we should switch the double to something smaller (boolean or integer). Another alternative would be storing a single field of integers, as is done in this this file:
https://github.com/dtcenter/MET/blob/main_v10.1/met/data/tc_data/basin_global_tenth_degree.nc

The basin and rsmc variables are floats (should be integers though)

	float basin(lat, lon) ;
	float rsmc(lat, lon) ;

And basin_name(nbasin, mxstr) and rsmc_name(nrmsc, mxstr) define how to interpret those basin and rsmc values.

The upside of doing it this way would be a much smaller file. And we know that each grid point belongs to 1, and only 1, masking region. The downside is that we'd need code changes in MET to actually read those masking region names and write them to the VX_MASK column of the output.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Apr 21, 2022

@j-opatz and @LoganDawson-NOAA, I have a question about the dimensions.

  • Longitude:
    • from -159.75 to -29.75 = 130 degrees
    • 13050 longitudes
    • delta lon = 0.009962 degrees
  • Latitude:
    • from 15.25 to 75.25 = 60 degrees
    • 6059 latitudes
    • delta lat = 0.009904

This seems awfully suspicious and a total missed opportunity for simplicity. Shouldn't we have made it a 6001 x 13001 grid with delta lat = delta lon = 0.01 degree spacing?

Any idea if this result was on purpose or accidental?

I could try using "regrid_data_plane" to shift it over to a simpler 0.01 degree grid but would like to make sure you agree on that approach.

@JohnHalleyGotway
Copy link
Collaborator Author

@j-opatz and @LoganDawson-NOAA, here's some further developments.

  • MET does NOT like this NetCDF format because that lat/lon spacing does NOT remain constant. Reading it into a MET tool produces this error message:
ERROR  : get_data_from_lat_lon_vars() -> MET can only process Latitude/Longitude files where the longitudes are evenly spaced (dlon=0.00999451, delta[14]=0.00900269)
  • The non-constant lat/lon deltas suggest that it is NOT actually defined on a true lat/lon grid. Here are histograms of the deltas. Very close to 0.01, but not quite, and the deltas are not constant as they would be on true lat/lon grid.

Screen Shot 2022-04-22 at 10 33 00 AM

Screen Shot 2022-04-22 at 10 33 46 AM

  • I propose the following approach...
    • Collapse the multiple masks down to a single field of integers which are the indices of the Bukovsky regions.
    • Use python embedding with MET-10.1.0 to pass those indices as POINT DATA into the point2grid tool.
    • Configure point2grid to interpolate the data to an actual 0.01 degree lat/lon grid.

Once the MET tools are happy with this file, I'll turn to how we can most easily use it to define masks in MET.

I'll plan to proceed with this approach unless I hear direction otherwise.

@LoganDawson-NOAA
Copy link

@JohnHalleyGotway thanks for doing all this digging into the issue!

I haven't used the high-resolution file myself at all, so I wasn't aware of these issues with that lat/lon grid definitions. Your proposed solution sounds reasonable to me!

@TaraJensen TaraJensen added the reporting: DTC NOAA R2O NOAA Research to Operations DTC Project label Apr 28, 2022
@JohnHalleyGotway
Copy link
Collaborator Author

Documenting feedback we received from Melissa via email on this issue. Please see below. And this explanation makes the data much easier to work with. There is no need to process them as point data. I plan to use python embedding to manually define the grid, accounting for the shift that Melissa describes, and then regrid it to a 0.01 degree grid.


From Melissa:

Sorry these are causing problems for you. I've gone back and looked at what was done. Something just a bit odd seems to happen when the lat/lon arrays are attached to the netcdf file that I can't explain, though the arrays are still not exactly uniform before being written to the file either. The lat/lon arrays are produced via:

lats = fspan(15.25,75.25,6059)
lons = fspan(200.25,330.25,13050)

This gives a fairly uniform spacing of 0.00990 followed by additional inconsistent decimal places. The inconsistent part is an artifact from the change to the higher resolution up until this point. When added to the files, the lat/lon values are rounded or truncated adding to the inconsistency.

You can regrid these to a uniform grid. I see no problems with that - I've regridded the 0.5 deg resolution version to a variety of different model grids over the years. Or, you can try the shapefiles (the regions were regridded onto a uniform lat/lon WRF grid to facilitate the creation of the shapefiles).

I'm going to re-emphasize this caveat though...

All of the regions need to be shifted East by 0.25 or 0.50 degrees in the shapefiles (not sure which one, as I haven't used these yet, but I pretty confident this is related to the original resolution of the regions). The southeast region is the best example of this... there's a png of the shapefile plotted on the region in the shapefile folder (and attached), and the region mask should be better centered over FL, not offset to the left. When you take a look at these, keep that in mind -- it's an easy additive fix. And, I think it applies to the netcdf files too, not just the shapefiles.

@JohnHalleyGotway
Copy link
Collaborator Author

Feedback from @LoganDawson-NOAA on 5/2/2022. In general we should INCLUDE inland lakes inside the Bukovsky masking regions. Many model runs would not resolve those lakes anyway so they should be included.

@JohnHalleyGotway JohnHalleyGotway added priority: high High Priority and removed priority: high labels May 9, 2022
@JohnHalleyGotway
Copy link
Collaborator Author

Based on discussion on 5/11/22, recommend the following order of operations:

  1. Start with landmasked NetCDF files as input.
  2. Process each to apply a CONUS mask (either CONUS.poly or a CONUS shapefile). Remove "internal" holes for water.
  3. After 2, run gen_vx_mask to stitch together these regions into 1 field.
  4. Run regrid_data_plane with python embedding to regrid to a 0.01 degree lat/lon grid that includes CONUS but not much else.
  5. Run gen_vx_mask to translate the 17 CONUS sub-regions into 4 larger groups.

@JohnHalleyGotway
Copy link
Collaborator Author

Discussed at 6/13/2022 METplus NOAA telecon. Target to have this completed and mature by the end of July 2022. Recommend providing an initial version no later than July 1st.

@JohnHalleyGotway JohnHalleyGotway added required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project and removed required: FOR OFFICIAL RELEASE Required to be completed in the official release for the assigned milestone labels Jun 13, 2022
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jun 29, 2022

@LoganDawson-NOAA I'm looking for some direction on next steps.

The question at hand is whether we want to use the 50m or 110m resolution of the Natural Earth shapefiles.

Let me give you some background to illustrate.
So far we've:

  • Consolidated the Bukovsky regions down into 1 field defined at 0.01 degree resolution where the value at each grid point is the value of that mask at that point.
  • Filled in the inland lakes and waterways by defining the value of each water grid point as the nearest valid mask value.
  • Applied the Natural Earth shapefiles to exclude any grid point outside of CONUS. And that's the source of this question.

The top image shows the 110m resolution while the bottom image shows the 50m resolution:
Screen Shot 2022-06-29 at 1 36 22 PM

You can see that the 110m version exactly corresponds to MET's map data because that's how we defined the map data.
Note some key diffs:

  • The 50m version shows much greater resolution in the Puget sound and along the west coast. It includes the Florida keys and detailed Chesapeake Bay. The 110m version does not.

While these are defined on a 0.01 degree grid, you'll actually run regrid_data_plane to regrid each mask to the grid being evaluated. Do you want to regrid FROM the coarser 110m version of the masks or from the higher resolution 50m version?

FYI, once we decide on the desired CONUS resolution, the next steps are:

  • Check for and fill in any NA values along the coast with the "correct" mask value.
  • Decide which mask to use for northern MN. 13 or 14? Or we could "split the difference" and replace each NA value there with the nearest non-NA mask value. The latter is actually the easiest solution.
  • Apply the 50m or 110m definitions of the Great Lakes to exclude them from the masks.
  • Update the metadata of the NetCDF file to include the masking region names corresponding to each mask value.
  • Provide a shell or python script example with calls to gen_vx_mask to apply these to an NCEP grid.

JohnHalleyGotway added a commit that referenced this issue Jun 29, 2022
… is only for develompent purposes and not actually intended to be merged back into the develop branch.
@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jul 5, 2022

Here's a tarfile with the two versions to consider where the boundaries are defined by the 110m vs 50m natural earth shapefiles.
Bukovsky_CONUS.tar.gz

The remaining tasks are:

  • Decide on the desired resolution.
  • Add a 2D gridded variable to define the region groups.
  • Add a 1D variable to define the region group names.
  • Provide a script to easily apply these masks to a specific grid.

Or another alternative is enhancing the logic of the MET library code to handle these composite mask and make it easy for the to select which region should be extracted.

@LoganDawson-NOAA
Copy link

@JohnHalleyGotway this looks fantastic! I brought up the resolution question during our EVS meeting this morning, and we'd like to move forward with the 50 m resolution version.

Providing a simple script to interpolate these high-resolution masks to a specific grid would be preferable to enhancing the MET logic (at least in the short term). Since masking regions defined on each different verification grid can be treated as fixed files, it will be most efficient for us to do all of that regridding during our development phase before EVS code delivery as opposed to doing the regridding with each run of a MET command.

The final addition I see that's needed is defining the four aggregated regions that are shown in the map in the first comment on this issue. CONUS_East, CONUS_West, CONUS_South, and CONUS_Central might be good names to use for those. Do you need confirmation of which subregions belong to each regional aggregate?

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Jul 6, 2022

@LoganDawson-NOAA I believe this work is complete. Please see these 3 images showing the basic Bukovsky regions, the Bukovsky region groups, and the full CONUS region:

Screen Shot 2022-07-09 at 11 42 13 AM

You can find the corresponding data in this tar file:
Bukovsky_CONUS.tar.gz

It contains:

  • Bukovsky_CONUS_regions_50m.nc: NetCDF file containing "region", "region_group", and "CONUS" variables, along with "region_name" and "region_group_name" to define the names for each value.
  • regions.ps, groups.ps, and conus.ps: Plots shown above
  • masks.ctable: Colortable I used for these plots.
  • dev_notes.txt: Some notes on the commands I ran to generate this. Note that I ran a modified version of gen_vx_mask from the feature_1940_bukovsky branch, but these changes should NOT be merged back into the develop branch.
  • gen_bukovsky.sh: A shell script to apply these masks to an NCEP grid. For example:
gen_bukovsky.sh G130 G130

Generates 22 output files, one for each of the basic regions, region groups, and the full CONUS region. For example, Bukovsky_G130_CONUS_South.nc contains a variable named CONUS_South. In MET config files, all you need to do is set poly = [ "Bukovsky_G130_CONUS_South.nc" ];. And that should apply that mask and write output with the VX_MASK column set to CONUS_South.

Note that it has been updated to require exactly 2 arguments (in case you want to define the target grid as something other than a named grid):

ERROR: ./gen_bukovsky.sh -> exactly 2 arguments are required!
ERROR: 1. Target grid NAME to define output file names
ERROR: 2. Target grid DEFINITION, as a named grid (e.g. G130),
ERROR:    grid specification string, or the path to a gridded data file

@LoganDawson-NOAA please review and let me know if this is what you're looking for.

Once you confirm, I'll close this issue.

@LoganDawson-NOAA
Copy link

@JohnHalleyGotway this current format is exactly what we were looking for! The script easily generates the mask files that are needed for different verification grids, and we can confirm that the MET output includes the mask VX_MASK names as expected.

I have one last ask that was an oversight on my part. Will you also include a CONUS region that is the union of CONUS_East, CONUS_West, CONUS_South, and CONUS_Central? Verification over the entire CONUS without any regional breakdowns is all that's required for some model/field combinations. Having a full CONUS region available would prevent us from having to generate stats over the four region groups before aggregating when generating graphics. This CONUS region could potentially be included in the region_group variable, but it would probably complicate/break your plot_data_plane command

@JohnHalleyGotway
Copy link
Collaborator Author

@LoganDawson-NOAA, sure no problem. Adding in CONUS was easy enough. Please note that I updated the contents of the comment above and reposted a new tar file to include the CONUS region. Also note that I modified the gen_bukovsky.sh script to take 2 arguments instead of 1.

Please re-review and let me know.

@JohnHalleyGotway
Copy link
Collaborator Author

Marking this issue as completed. I provided a NetCDF containing the Bukovsky regions to NOAA/EMC about 2 weeks ago, and they have been using them with no complaint.

Note that this NetCDF file was NOT added to the MET repository itself but may be stored in the NOAA EVS repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MET: Masking priority: high High Priority reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: NOAA/EMC NOAA Environmental Modeling Center required: FOR DEVELOPMENT RELEASE Required to be completed in the development release for the assigned project type: new feature Make it do something new
Projects
No open projects
Development

No branches or pull requests

5 participants