Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle expected no input for ascii2nc and point2grid #1630

Closed
17 of 20 tasks
georgemccabe opened this issue Jan 22, 2021 · 9 comments · Fixed by #1637 or #1660
Closed
17 of 20 tasks

Handle expected no input for ascii2nc and point2grid #1630

georgemccabe opened this issue Jan 22, 2021 · 9 comments · Fixed by #1637 or #1660
Assignees
Labels
reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: NOAA/EMC NOAA Environmental Modeling Center type: enhancement Improve something that it is currently doing
Milestone

Comments

@georgemccabe
Copy link
Collaborator

georgemccabe commented Jan 22, 2021

Based on discussion in this MET Help ticket: https://rt.rap.ucar.edu/rt/Ticket/Display.html?id=98329

Describe the Enhancement

Logan (NOAA) would like to process data with ascii2nc that occasionally has no obs data, which is expected. When this is the case, we would like ascii2nc to write out a file with no observations in it. This could be a configurable setting in case other users prefer the tool to skip writing a file if no data is present.

The data is then passed into point2grid. It is unclear how point2grid would react to an empty input file. We should also investigate what is needed to handle this situation in that application as well.

Time Estimate

Estimate the amount of work required here.
1 day.

Sub-Issues

No sub-issues required.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

2791541

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required: John HG
  • Select scientist(s) or no scientist required: Logan

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones or add "alert:NEED PROJECT ASSIGNMENT" label
  • Select milestone to next major version milestone or "Future Versions"

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s), Project(s), Milestone, and Linked issues
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@georgemccabe georgemccabe added type: enhancement Improve something that it is currently doing component: application code requestor: NOAA/EMC NOAA Environmental Modeling Center alert: NEED MORE DEFINITION Not yet actionable, additional definition required alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Jan 22, 2021
@georgemccabe georgemccabe added this to the MET 10.0.0 milestone Jan 22, 2021
@georgemccabe georgemccabe added this to To do in MET-10.0.0-beta3 (1/27/21) via automation Jan 22, 2021
JohnHalleyGotway added a commit that referenced this issue Jan 25, 2021
…(which returns bad status) to a warning message.
JohnHalleyGotway added a commit that referenced this issue Jan 25, 2021
…elds of 0's or bad data to the output. Change previous error message to warning. Also, update LOTS of warning and error log messages to make them consistent.
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 25, 2021

As of met-9.1, zero observations causes ascii2nc to error out:

ERROR  : Zero observations retained.
ERROR  : Cannot create NetCDF Observation file: sample_ascii_obs_empty.nc

Updated ascii2nc to instead print a warning message and create an empty output file:

WARNING: 
WARNING: Zero observations retained!
WARNING: 
DEBUG 2: Processing observations for 0 headers.
DEBUG 1: Creating NetCDF Observation file: sample_ascii_obs_empty.nc
DEBUG 2: Finished processing 0 observations for 0 headers.

As of met-9.1, zero observations causes point2grid to error out:

WARNING: get_grid_mapping(MET_obs)  no valid point observation data
ERROR  : process_point_file() ->  No valid data for the variable [APCP]

Updated point2grid to instead print a warning message and create an output file with default values:

WARNING: 
WARNING: get_grid_mapping(MET_obs) -> no valid point observation data!
WARNING: 
WARNING: 
WARNING: process_point_file() -> No valid data for the variable [APCP]
WARNING: 
DEBUG 1: Writing output file: sample_point2grid.nc

@JohnHalleyGotway JohnHalleyGotway moved this from To do to In progress in MET-10.0.0-beta3 (1/27/21) Jan 25, 2021
@JohnHalleyGotway JohnHalleyGotway linked a pull request Jan 25, 2021 that will close this issue
10 tasks
@JohnHalleyGotway JohnHalleyGotway removed the alert: NEED MORE DEFINITION Not yet actionable, additional definition required label Jan 25, 2021
JohnHalleyGotway added a commit that referenced this issue Jan 25, 2021
…n there are no obs) and within each loop iteration (for when there are multiple fields to process).
JohnHalleyGotway added a commit that referenced this issue Jan 26, 2021
* Per #1630, update ascii2nc to change zero observations from an error (which returns bad status) to a warning message.

* Per #1630, update point2grid to read an empty input file and write fields of 0's or bad data to the output. Change previous error message to warning. Also, update LOTS of warning and error log messages to make them consistent.

* Per #1630, need to initialize the dataplanes before the loop (for when there are no obs) and within each loop iteration (for when there are multiple fields to process).
@JohnHalleyGotway JohnHalleyGotway moved this from In progress to Done in MET-10.0.0-beta3 (1/27/21) Jan 26, 2021
@JohnHalleyGotway
Copy link
Collaborator

This is fixed for the met-10.0.0 beta3 release. @LoganDawson-NOAA once that release is available, please test to confirm that ascii2nc and point2grid handle zero observations as you need and expect. If you find any issues, we can just re-open this issue.

JohnHalleyGotway added a commit that referenced this issue Jan 26, 2021
* Getting rid of compiler warnings in PB2NC by replacing several instances of the NULL pointer with the nul character (\0) instead.

* Fix typo in config_options.rst.

* Feature 1408 var_name_for_grib_code (#1617)

* #1408 Added get_var_id

* #1408 Check variable name in the configuration to use the variable name instewad of grib code

* #1408 Added point2grid_ascii2nc_surfrad_DW_PSP_by_name

* Feature 1580 2d time (#1616)

* #1580 Added get_grid_from_lat_lon_vars

* #1580 Added get_grid_from_lat_lon_vars and support 2D time variable

* #1580 Support int type variable without scale_factor and add_offset attributes

* #1580 Support 2D time variable. Implemented filtering by valid_time

* #1580 Bug fix: read time with dimension 0

* #1580 Support time variable with no dimension

* #1580 Initial release

* #1580 Added point2grid_2D_time

* #1580 Check project attribute for GOES

* #1580 Changed NULL to 0 to avoid co,pilation warning

* #1580 Added point2grid_2D_time

* #1580 Added "point2grid configuration file" section

* #1580 Changed to_grid for point2grid_NCCF_UK & point2grid_2D_time

Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>

* feature 1580 nccf (#1619)

* #1580 Correct the precision at _apply_scale_factor

* #1580 Added unit test plot_data_plane_NCCF_time

* #1580 Changed argument type to double at _apply_scale_factor(double)

* Bugfix 1618 develop pb2nc (#1623)

Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>

* Feature 1624 OBS_COMMAND (#1625)

* Per #1627, add grid_data.regrid config option for PlotPointObs and update the tool to do the requested regridding. Still need to update the docs.

* Per #1627, update docs about grid_data.regrid config option for PlotPointObs.

* Per #1627, add another call to plot_point_obs to exercise the new regrid functionality.

* Feature 1624 obs_command second try (#1629)

* Per #1624, define OBS_COMMAND.

* Per #1624, unset the test-specific environment variables after completing the run.

* Per #1624, after PR #1625 merged these changes into develop, they caused 2 unexpected diffs in the NB output. These were caused by enviornment variables being unset after each test. Updating unit_netcdf.xml and unit_point2grid.xml to define more test-specific environment variables to reproduce previous NB output.

* Organizing NB climatology and point2grid output files into the appopriate directories rather than having them at the top-level directory.

* Update pull_request_template.md

* Update the point2grid unit tests to write their temp files to the point2grid subdirectory instead of the top-level test output directory.

* Update appendixC.rst

Split the definition of H_RATE and POD

* Feature 1626 tc_gen (#1633)

* Per #1448, many changes for TC-Gen. Replace the oper_genesis dictionary with the oper_technique string. Add genesis_init_diff config entry. Update config_constants.h accordingly and the tc_gen_conf_info.h/.cc to parse the updated config entries.

* Per #1448, large overhaul of the tc_gen matching logic. This work is not yet complete. Still need to compute categorical MISSES but the current version does compile.

* Per #1448, add GenesisInfoArray::has_storm_id() function and remove the unused set_dland() function.

* Per #1448, more updates. Define the best genesis events while parsing the best tracks. We need to know the best genesis events in order to count up the forecast misses.

* Per #1448, lots more changes for tc_gen. Create a PairDataGenesis class to store genesis pairs. This will be needed to write a matched pair line type.

* Per #1448, minor tweaks to log messages.

* Per #1448, update PairDataGenesis class to store the BEST track Storm ID since the forecast genesis do not have meaningful Storm ID's.

* Per #1448, in GenesisInfoArray::add(), do NOT store multiple genesis events for the same storm, but do print a useful Debug(3) log message about it.

* Per #1448, update PairDataGenesis::has_case() logic to check the storm id and initialization time but NOT require an exact forecast hour match.

* Per #1448, update the tc_gen log messages to more concisely and consistently report the storm id.

* Per #1448, update the PairDataGenesis logic a bit to have all the misses and hits in chronological order.

* Per #1448, add genesis_init_diff entry.

* Per #1448, set the default genesis_init_diff entry to 48 hours since that's what Dan H used in his examples.

* Per #1448, work on comments and log messages.

* Per #1448, reimplement TrackInfoArray as a vector instead of managing the memory myself. This makes the implmentation of TrackInfoArray::erase_storm_id() very easy. Replace n_tracks() function with n() in several places.

* Per #1448, add valid_freq and basin_file config entries. Also rename load_dland.h/.cc to load_tc_data.h/.cc and add code to read the basin file.

* Per #1448, add GenesisInfoArray::erase_storm_id().

* Per #1448, update tc_gen code to handle new config options.

* Per #1448, had my units wrong. Was processing seconds when I thought it was hours!

* Per #1448, making test TC-Gen config file consistent with the default.

* Per #1448, also track the obs valid times.

* Per #1448, switch from tech1/tech2 to dev/ops methods. Update log messages and add lots of details to the tc_gen documentation.

* Per #1430, in tc_gen enable dev_method_flag, ops_method_flag, ci_alpha, and output_flag to be specified separately for each filter. Also add nc_pairs_flag and genesis_track_points_window config options. Add config constants entries for these options and update tc_gen to handle all of these changes.

* Per #1430, consolidate the parse_grid_mask() code a bit to avoid redundancy.:

* Per #1430, just cleaning up some messy comments.

* Per #1430, adding hooks for writing NetCDF output file.

* Per #1430, update DataPlane::set_size() function to take a 3rd argument to specify how the DataPlane should be initialized.

* Per #1430, update the nc_pairs_flag options and update the code to parse them.

* Per #1430, update the TrackInfo class to track and report the min/max warm core information.

* Per #1430, current state of development. Still a work in progress. I'm getting runtime segfaults when testing and I still need to NOT overcount the BEST track hits.

* Per #1430, committing changes described by #1430 (comment)

* Per #1430, forgot to rename genesis_match_window to genesis_hit_window as it is in the code.

* Per #1430, chaning GenesisInfo to just inherit directly from TrackInfo. Frankly, I should have thought of this a LONG time ago.

* Per #1430, change the default desc setting from NA to ALL and add the best_unique_flag option.

* Per #1430, simplify the logic now that GenesisInfo is derived from TrackInfo. Also support the best_unique_flag config option.

* Per #1430, instead of storing 12 individual DataPlane objects, store them in a map to make writing their output more convenient.

* Per #1430, updating documentation and comments.

* Per #1430, more doc updates.

* Per #1430, update unit test to only write NetCDF counts for the AL_BASIN and not the other filters.

* Per #1430, fix parsing logic for nc_pairs_flag = TRUE.

* Per #1430, fix bug. Check the VxOpt.NcInfo before calling write_nc(), not the top-level one.

* Per #1430, the docker build of tc_gen failed.

* Per #1430, working on DockerHub compilation.

* Per #1430, getting DockerHub build working.

* One more try.

* Per #1597, add hooks for new GENMPR stat line type.

* Per #1597, add config file option and column definitions for the GENMPR line type.

* Per #1597, finish writing the GENMPR line type.

* Per #1597, change the default output grid from a global 5 degree to global 1 degree grid.

* Per #1597, change GENMPR output columns to GEN_TDIFF and INIT_TDIFF since they're reported in HHMMSS format instead of seconds. Also, tweak the config file for the tc-gen unit test.

* Per #1597, have to add GENMPR header columns for Stat-Analysis and test scripts to handle it.

* Per #1597, update Stat-Analysis to handle the GENMPR line type.

* Per #1597, user's guide updates for the GENMPR and NetCDF output file.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, add AGEN_INIT and AGEN_FHR columns.

* Per #1597, remove the AGEN_TIME and BGEN_TIME columns from the GENMPR line type and instead write the genesis times to the FCST_VALID_BEG/END and OBS_VALID_BEG/END header columns.

* Remove some unused output column name definitions. There are a remnant from very early versions of MET which included the CTP, CFP, and COP line types.

* Per #1597, update config file options to use dev_hit_radius, dev_hit_window, and opt_hit_tdiff. Also update log message to switch from 'lead' to 'forecast hour'.

* Per #1626, add met_regrid_nearest() utility function since I'm calling it twice.

* Per #1626, update the basin_global_tenth_degree.nc basin definition file to include basin name abbreviations.

* Per #1626, update load_tc_data.h/.cc to also read the basin abbreviations from the NetCDF basin file.

* Per #1626, add TC-Gen config file options for init_inc, init_exc, and basin_mask. Updated the library and application code, and updated the user's guide.

* Fixing Fortify warnings for 'Poor Style: Variable Never Used' in 6 files.

* Fix Fortify warnings for 'Uninitialized variable' in tc_gen.cc and point2grid.cc.

* Fix Fortify warnings for 'Poor Style: Redundant Initialization' in plot_point_obs.cc and point2grid.cc.

* Feature 1346 valid time attr (#1634)

* #1346 get_att_value_unixtime supports yyyymmdd_hhmmss, too

* #1346 Check valid_time & init_time attributes, too

* #1346 Check valid_time & init_time attributes, too

Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>

* Feature 1473 python errors (#1615)

* Added sample script to read ascii data and create an xarray.

* Disabled use_xarray exit for testing.

* Get attrs from DataArray if using xarray.

* Removed some comments.

* Revised error messages for use with both numpy and xarray.

* Removing commented out code.

Co-authored-by: David Fillmore <fillmore@kiowa.rap.ucar.edu>
Co-authored-by: johnhg <johnhg@ucar.edu>

* Feature 1630 zero obs (#1637)

* Per #1630, update ascii2nc to change zero observations from an error (which returns bad status) to a warning message.

* Per #1630, update point2grid to read an empty input file and write fields of 0's or bad data to the output. Change previous error message to warning. Also, update LOTS of warning and error log messages to make them consistent.

* Per #1630, need to initialize the dataplanes before the loop (for when there are no obs) and within each loop iteration (for when there are multiple fields to process).

* Bugfix 1638 develop climo cdf (#1639)

* Per #1638, correct the order of arguments in the call to the normal_cdf() utility function.

* Per #1638, update the logic in derive_climo_prob(). For CDP thresholds, the constant climo probability should be based on the inequality type where less-than-types match the threshold percentile value while greater-than-types are 1.0 minus the threshold percentile.

* Per #1638, update normal_cdf() to initialize the output CDF field using the climo mean field instead of the observation data field. This makes the timestamps consistent for the climo mean, stdev, and cdf variables in the Grid-Stat NetCDF matched pairs output file.

* Update tc_gen.cc

Co-authored-by: hsoh-u <hsoh@ucar.edu>
Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: j-opatz <59586397+j-opatz@users.noreply.github.com>
Co-authored-by: David Fillmore <davidfillmore@users.noreply.github.com>
Co-authored-by: David Fillmore <fillmore@kiowa.rap.ucar.edu>
@LoganDawson-NOAA
Copy link

@JohnHalleyGotway I tested out met-10.0.0 beta3 on a date with zero observations, and ascii2nc isn't quite producing output that point2grid can process. For comparison, I've included differences from the headers of the netcdf files resulting from ascii2nc when observations exist versus when there are no obs to process. Both files were produced with met-10.0.0 beta3.

When there are observations to be processed by ascii2nc, the dimensions are listed as follows and key variable and global attribute differences are:

dimensions:
        mxstr = 16 ;
        mxstr2 = 40 ;
        mxstr3 = 80 ;
        nobs = 3 ;
        nhdr = 3 ;
        nhdr_typ = 1 ;
        nhdr_sid = 1 ;
        nhdr_vld = 3 ;
        nobs_qty = 2 ;
        obs_var_num = 1 ;
variables:
        int obs_vid(nobs) ;
                obs_vid:long_name = "index of variable names at var_name" ;
                obs_vid:_FillValue = -9999 ;
        char obs_var(obs_var_num, mxstr2) ;
                obs_var:long_name = "variable names" ;

// global attributes:
                :MET_Obs_version = "1.02" ;
                :use_var_id = "true" ;

When no observations exist, the dimensions are listed as follows and key variable and global attribute differences are:

dimensions:
        mxstr = 16 ;
        mxstr2 = 40 ;
        mxstr3 = 80 ;
        nobs = UNLIMITED ; // (0 currently)
        nhdr = UNLIMITED ; // (0 currently)
        nhdr_typ = UNLIMITED ; // (0 currently)
        nhdr_sid = UNLIMITED ; // (0 currently)
        nhdr_vld = UNLIMITED ; // (0 currently)
        nobs_qty = UNLIMITED ; // (0 currently)
variables:
        int obs_gc(nobs) ;
                obs_gc:long_name = "grib code corresponding to the observation type" ;
                obs_gc:_FillValue = -9999 ;

// global attributes:
                :MET_Obs_version = "1.02" ;
                :use_var_id = "false" ;

In the correct output, the value for obs_var is "Fscale" which is the variable expected by point2grid. Without the obs_var variable in the ascii2nc output, point2grid begins creating a netcdf output file with the correct lat/lon data, but it hangs and has to be forcibly terminated.

For additional reference, I've attached files showing the commands and output that result from ascii2nc and point2grid when processing zero observations:
ascii2nc_empty.txt
point2grid_empty.txt

The same information is provided when there are observations available:
ascii2nc_w-obs.txt
point2grid_w-obs.txt

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 29, 2021 via email

@JohnHalleyGotway JohnHalleyGotway added this to To do in MET-10.0.0-beta4 (3/2/21) via automation Jan 30, 2021
@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Jan 30, 2021

Reopening this issue and assigning it to the met-10.0.0-beta4 release.

My original point2grid test did actually run fine:

point2grid empty_obs.nc G211 empty_grid.nc -field 'name="TMP"; level="Z2";' -v 10

However, running with any name string that is NOT a standard GRIB code abbreviation (like "Fscale" or "bad_string") causes point2grid to hang:

point2grid empty_obs.nc G211 empty_grid.nc -field 'name="Fscale"; level="Z2";' -v 10

Reopening and will fix.

MET-10.0.0-beta4 (3/2/21) automation moved this from To do to In progress Jan 30, 2021
@JohnHalleyGotway
Copy link
Collaborator

@hsoh-u I found/fixed the bug that was causing point2grid to hang. It was a cut/paste error.

In PointToGridConfInfo::get_var_id(), we had var_name_map.end() that should be def_var_name_map.end(). Fixing that gets rid of the runtime hang.

But bigger question. When the input to point2grid has 0 observations, and I request "Fscale", it errors out with:

bin/point2grid empty_obs.nc G211 empty_grid.nc -field 'name="Fscale"; level="";'
ERROR  : process_point_file() -> Invalid GRIB code [Fscale]
ERROR  : Try setting the "name" in the "-field" command line option to one of the available names:
ERROR  : 	GRIB codes: 

I can make this go away by adding an entry for "Fscale" to the var_name_map in the config file:

var_name_map = [
...
{ key = "Fscale";   val = "Fscale"; } 
];

And then rerunning with that config file, I now get a warning about 0 obs with an output file.

bin/point2grid empty_obs.nc G211 empty_grid.nc -field 'name="Fscale"; level="";' -config Point2GridConfig 
WARNING: get_grid_mapping(MET_obs) -> no valid point observation data!
WARNING: process_point_file() -> No valid data for the variable [Fscale]
DEBUG 1: Writing output file: empty_grid.nc

Can you confirm that this is the desired behavior? If the user provides a variable name that (1) doesn't appear in the input NetCDF file and (2) isn't listed in the config file, the point2grid tool should error out? Is that the right behavior?

@hsoh-u
Copy link
Collaborator

hsoh-u commented Feb 1, 2021

I don't think it's right behavior.
It's a side effect. Without mapping in config file, MET knows "Fscale" is not valid GRIB code. MET exits with an error message (without scanning the GRIB code variable). The existence at the mapping from the configuration makes MET scan the GRIB code variable like the other filtering logic (Note: The mapping is "GRIB code_string" to "variable_name", for example, "11" to "TMP") ==> Should not allow non-numeric key at the mapping configuration. If allowed, MET should detect this early.

Why this happens:
Step 1: If the variable index is used, MET knows if the requested variable exists or not very quickly without scanning a whole variable. This is an advantage of using variable name and index mapping (only if variable index is used).

Step 2: Scanning & filtering all obs data. The mapping allows "variable name" for GRIB code. The obs data could be filtered by requested GRIB code, but MET knows it after scanning all obs. data.

The point2grid runs fast now, so two scanning for GRIB code will be OK.

@JohnHalleyGotway
Copy link
Collaborator

Howard, I didn't follow all of those details, but it sounds like you have a good idea for how it should work! Let's proceed as follows:
(1) I'll clarify the point of this issue to make sure you understand the needed behavior.
(2) I'll push my local feature branch (feature_1630_point2grid_hang) up to GitHub and assign you to this issue.
(3) You pull my feature branch code, make whatever updates should be made, and push back up to GitHub.
(4) I'll pull your updates from GitHub, recompile, and retest to make sure I understand them and that it handles Logan's situation correctly.

Logan is running point2grid every day using daily storm reports as input to ascii2nc and point2grid. When storms are present, it works fine and as expected. But when there are no storms, ascii2nc errors out. I've already updated ascii2nc to NOT error out with 0 input obs. And now we want point2grid to do the same. With 0 input observations to process, it should just create an output field of all bad data values and counts and mask values of 0.

Ideally, Logan would NOT need to specify "Fscale" in the config file. Instead, point2grid would just process the commands, realize that no Fscale observations are present, print a warning, and produce an "empty" output file.

@TaraJensen TaraJensen added reporting: DTC NOAA R2O NOAA Research to Operations DTC Project and removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Feb 3, 2021
hsoh-u pushed a commit that referenced this issue Feb 9, 2021
@hsoh-u hsoh-u moved this from In progress to Pull request review in MET-10.0.0-beta4 (3/2/21) Feb 9, 2021
@hsoh-u hsoh-u moved this from Pull request review to In progress in MET-10.0.0-beta4 (3/2/21) Feb 9, 2021
@JohnHalleyGotway JohnHalleyGotway moved this from In progress to Pull request review in MET-10.0.0-beta4 (3/2/21) Feb 9, 2021
@JohnHalleyGotway JohnHalleyGotway linked a pull request Feb 9, 2021 that will close this issue
10 tasks
@JohnHalleyGotway
Copy link
Collaborator

Fixed in develop by PR #1660. @LoganDawson-NOAA, please test this functionality again once met-10.0.0_beta4 is released.

MET-10.0.0-beta4 (3/2/21) automation moved this from Pull request review to Done Feb 10, 2021
JohnHalleyGotway added a commit that referenced this issue Feb 18, 2021
* Per 1646, one line fix for cut-and-paste error. (#1647)

* Per #1644, no actual code changes here. Just formatting and spacing. For example, replace double ;; with single ;'

* Per #1644, FOUND THE BUG! It's a copy/paste error. We had var_name_map.end() that should be def_var_name_map.end(). Fixing that gets rid of the runtime hang.'

* Per #1643, redefine the contents of the existing AREA_RATIO output column from MODE. Define it as FCST/OBS object area instead of min/max. Update the User's Guide to note the change and also clarify that the MTD VOLUME_RATIO output really is FCST/OBS. (#1650)

* Feature 1644 ps_log (#1651)

* Per #1644, write rejection reason codes at verbosity 2 when there are 0 matched pairs.

* Per #1644, add a few sentences to Point-Stat, Practical Information chapter about debugging 0 matched pairs.

* The mode_conv.pl logic was slightly broken. MET PR #1650 should have broken the NB but it did not. Turns out the diffing logic is NOT properly distinguishing between single and pair object lines. It does this by looking for an underscore in the OBJECT_ID column. When we added FCST_UNITS and OBS_UNITS, that shifted OBJECT_ID up 2 spots, but the code was still checking the (0-based) 20th column instead of the 22nd. Fixing this now and will rerun NB20210202 to confirm it works again.

* The diffing logic for MODE pair lines still was not correct. We'd added the ASPECT_DIFF and CURVATURE_RATIO columns a while ago, but they were missing from the diff logic. This logic really is not good. We need to make it more robust, reading the version-specific header columns from a table file instead of hard-coding them!

* Feature 1653 rscripts (#1654)

* Per #1653, update plot_cnt.R and plot_mpr.R to remove the version-specific header columns.

* Per #1653, nice enhancments to these Rscripts to make them more independent of the MET version number.

* Per #1653, more tweaks

* Per #1653, if no input files are provided, error out with a useful message.

* Per #1653, while the scripts ran fine using R 4.0.2 on my Mac, they fail on eyewall using R 3.4.0. Adding as.character() to get past that error.

* Feature 1655 nc_log (#1656)

* #1630 Display a warning instead of error message with invalid variable if the input data is empty

* Feature 1658 grib_tables (#1659)

* Per #1658, update MXUPHL entries.

* Per #1658, updating long name for MAXREF, MAXUVV, and MAXDVV.

* Modified format of release notes

* Feature 1450 hersbach (#1662)

* Per #1450, add new ECNT columns for Hersback CRPS. Still need to actually compute the stats though.

* Per #1450, update NumArray functions to only sort if the data is not yet sorted. And check for bad data when computing the standard deviation.

* Per #1450, add code to compute the empirical CRPS value.

* Per #1450, large change to the new output for the empirical CRPS. In order to aggregate decomposed empirical CRPS reliability and potential correctly, we'd need to write (n+1)*2 additional columns. While the empirical crps can be aggregated as a weighted mean, the decomposition cannot. It just isn't feasible to do this in the ECNT line type. If this reliability and potential really are required, recommend that we add an entirely new CRPS line type instead of tacking onto ECNT. These changes simply remove reliabilit and potential from the output.

* Per #1450 and #1451, replacing single CRPS_CLIMO column with CRPSCL and CRPSCL_EMP which will be needed for #1451.

* Per #1450, delete temp files I'd accidentally committed.

* Per #1450, update the user's guide with CRPS updates.

* Fix bug replacing crpss_emp with crpss_gaus.

Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>
JohnHalleyGotway added a commit that referenced this issue Feb 25, 2021
* Per 1646, one line fix for cut-and-paste error. (#1647)

* Per #1644, no actual code changes here. Just formatting and spacing. For example, replace double ;; with single ;'

* Per #1644, FOUND THE BUG! It's a copy/paste error. We had var_name_map.end() that should be def_var_name_map.end(). Fixing that gets rid of the runtime hang.'

* Per #1643, redefine the contents of the existing AREA_RATIO output column from MODE. Define it as FCST/OBS object area instead of min/max. Update the User's Guide to note the change and also clarify that the MTD VOLUME_RATIO output really is FCST/OBS. (#1650)

* Feature 1644 ps_log (#1651)

* Per #1644, write rejection reason codes at verbosity 2 when there are 0 matched pairs.

* Per #1644, add a few sentences to Point-Stat, Practical Information chapter about debugging 0 matched pairs.

* The mode_conv.pl logic was slightly broken. MET PR #1650 should have broken the NB but it did not. Turns out the diffing logic is NOT properly distinguishing between single and pair object lines. It does this by looking for an underscore in the OBJECT_ID column. When we added FCST_UNITS and OBS_UNITS, that shifted OBJECT_ID up 2 spots, but the code was still checking the (0-based) 20th column instead of the 22nd. Fixing this now and will rerun NB20210202 to confirm it works again.

* The diffing logic for MODE pair lines still was not correct. We'd added the ASPECT_DIFF and CURVATURE_RATIO columns a while ago, but they were missing from the diff logic. This logic really is not good. We need to make it more robust, reading the version-specific header columns from a table file instead of hard-coding them!

* Feature 1653 rscripts (#1654)

* Per #1653, update plot_cnt.R and plot_mpr.R to remove the version-specific header columns.

* Per #1653, nice enhancments to these Rscripts to make them more independent of the MET version number.

* Per #1653, more tweaks

* Per #1653, if no input files are provided, error out with a useful message.

* Per #1653, while the scripts ran fine using R 4.0.2 on my Mac, they fail on eyewall using R 3.4.0. Adding as.character() to get past that error.

* Feature 1655 nc_log (#1656)

* #1630 Display a warning instead of error message with invalid variable if the input data is empty

* Feature 1658 grib_tables (#1659)

* Per #1658, update MXUPHL entries.

* Per #1658, updating long name for MAXREF, MAXUVV, and MAXDVV.

* Modified format of release notes

* Feature 1450 hersbach (#1662)

* Per #1450, add new ECNT columns for Hersback CRPS. Still need to actually compute the stats though.

* Per #1450, update NumArray functions to only sort if the data is not yet sorted. And check for bad data when computing the standard deviation.

* Per #1450, add code to compute the empirical CRPS value.

* Per #1450, large change to the new output for the empirical CRPS. In order to aggregate decomposed empirical CRPS reliability and potential correctly, we'd need to write (n+1)*2 additional columns. While the empirical crps can be aggregated as a weighted mean, the decomposition cannot. It just isn't feasible to do this in the ECNT line type. If this reliability and potential really are required, recommend that we add an entirely new CRPS line type instead of tacking onto ECNT. These changes simply remove reliabilit and potential from the output.

* Per #1450 and #1451, replacing single CRPS_CLIMO column with CRPSCL and CRPSCL_EMP which will be needed for #1451.

* Per #1450, delete temp files I'd accidentally committed.

* Per #1450, update the user's guide with CRPS updates.

* Fix bug replacing crpss_emp with crpss_gaus.

* #1657 Added TIME_EPSILON

* #1657 Corrected 1 second offset by the precision error

* #1657 Added AccumTime

* #1657 Read the time from "bounds" attribute and set the max value from the bounds time variable

* #1657 Corrected 1 second offset by the precision error

* Per #1439, add check_mask_names() utility function which errors out if the list of masking region names is non-unique. Update Point-Stat and Grid-Stat to call it. (#1679)

* Feature 1451 crpss (#1676)

* Per #1450, add new ECNT columns for Hersback CRPS. Still need to actually compute the stats though.

* Per #1450, update NumArray functions to only sort if the data is not yet sorted. And check for bad data when computing the standard deviation.

* Per #1450, add code to compute the empirical CRPS value.

* Per #1450, large change to the new output for the empirical CRPS. In order to aggregate decomposed empirical CRPS reliability and potential correctly, we'd need to write (n+1)*2 additional columns. While the empirical crps can be aggregated as a weighted mean, the decomposition cannot. It just isn't feasible to do this in the ECNT line type. If this reliability and potential really are required, recommend that we add an entirely new CRPS line type instead of tacking onto ECNT. These changes simply remove reliabilit and potential from the output.

* Per #1450 and #1451, replacing single CRPS_CLIMO column with CRPSCL and CRPSCL_EMP which will be needed for #1451.

* Per #1450, delete temp files I'd accidentally committed.

* Per #1450, update the user's guide with CRPS updates.

* Per #1451, instead of computing the climo crps on the fly, compute and store it separately for each point.

* Per #1451, the ECNT line type will no longer be written separately for each CDF bin. Removing the bin-related arguments from the write_ecnt functions.

* Per #1451, the climo_cdf.write_bins option no longer applies. Since Ensemble-Stat will no longer compute stats separately for each climo bin, I'm removing the reference to write_bins from the Ensemble-Stat config files.

* Per #1451, compute and store the climo CRPS for each point. Also, break apart the normal climo computation into separate functions for crps, ign, and pit.

* Per #1451, update Ensemble-Stat logic to no longer subset pairs into climo CDF bins. We had done this to be consistent with the use of climo data in point and grid-stat. But this change to the handling of climo data is consitent with the NOAA/EMC approach.

* Per #1451, split out the setting of climo CDF thresholds into a separate function so that it can also be called by stat-analysis.

* Per #1451, in the Ensemble-Stat ORANK line type, rename CLIMO to CLIMO_MEAN and add a CLIMO_STDEV column.

* Per #1451, also need to update gsidens2orank to write a climo_stdev column.

* Per #1451, switch from constant pointer to ClimoCDFInfo object to a copy to make the logic of doing this in Stat-Analysis a little easier.

* Per #1451, the HiRA method in Point-Stat computes an ECNT output line type. Needed to call set_climo_cdf() there so that we know how many climo values to use when computing the empirical climo CRPS.

* Per #1451, need to store climo_cdf for both grid and point verification.

* Per #1451, update to write the CLIMO_STDEV header column for the ORANK line type.

* Per #1451, in Ensemble-Stat when doing point verification, check for empty OBS_UNIT string and write NA instead.

* Per #1451, update unit tests by enhancing the climatology call to Ensemble-Stat to also include point verification. Tweak the Ensemble-Stat cofiguration for that and also add a call to pb2nc to prepare the point observations for use.

* Per #1450, added a new section to the Ensemble-Stat chapter describing how climo mean/stdev are used in the computation of the skill scores.

* Update ensemble-stat.rst

Co-authored-by: j-opatz <59586397+j-opatz@users.noreply.github.com>

* #1677 Update the refence time (from time_bnds variable) (#1680)

Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>

* Feature 1135 stat_analysis (#1681)

* Per #1135, add fcst/obs_init/valid_inc/exc options for STAT-Analysis jobs.

* Per #1135, update all the STATAnalysis config files to include entries for the new fcst/obs_init/valid_inc/exc options.

* Per #1135, add documentation for fcst/obs_init/valid_inc/exc options to the STAT-Analysis chapter. Also, clarify the description for the existing options.

* Per #1135, adding another call to stat_analysis to check the time filtering options.

* Per #1135, just renaming stat_analysis output file.

* Apply suggestions from code review

Co-authored-by: jprestop <jpresto@ucar.edu>

Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: jprestop <jpresto@ucar.edu>

Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: Howard Soh <hsoh@kiowa.rap.ucar.edu>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>
Co-authored-by: j-opatz <59586397+j-opatz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reporting: DTC NOAA R2O NOAA Research to Operations DTC Project requestor: NOAA/EMC NOAA Environmental Modeling Center type: enhancement Improve something that it is currently doing
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants