Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat gridded fields of entirely missing data as missing files and fix python embedding to call common data processing code. #1494

Closed
21 tasks
JohnHalleyGotway opened this issue Sep 16, 2020 · 1 comment · Fixed by #1600
Assignees
Labels
priority: blocker Blocker requestor: Navy/NRL Naval Research Laboratory type: enhancement Improve something that it is currently doing
Milestone

Comments

@JohnHalleyGotway
Copy link
Collaborator

JohnHalleyGotway commented Sep 16, 2020

This issue was originally called:
Update Ensemble-Stat to better handle python-embedding failures and entire fields of missing data.
However, I updated it to more clearly state the actual fix.

Describe the Enhancement

This issue arose when John O was setting up a METplus use case for NRL that uses python-embedding to call Ensemble-Stat. Data for all 7 ensemble members live within the same variable in the same NetCDF file. The python-embedding script is called 7 times to pull values for each of the 7 members.

While the python script runs without error, 3 of the 7 members contain a full field of missing data values. In Ensemble-Stat, all 7 members appear to be "valid" so the ens.ens_thresh threshold is satisfied. However, no grid point contains 7 valid ensemble values, meaning that no ensemble statistics are computed.

The underlying rule here is that, after accounting for missing input files, at each grid point there can be 0 bad data values. That means, you can't have 7 valid ensemble values for the first grid point, and then only 4 for the next... because then we can't group the ensemble ranks together into a RANK histogram. It's OK for entire files to be missing... but not OK to have missing data values within the fields.

In John O's case, all 7 calls to the python embedding script run without error. It's just that 3 of the 7 calls produce fields of entire missing data values. So there are no grid points which contain 7 valid ensemble member values... and so they are all discarded due to missing ensemble values.

This task to it update the logic of Ensemble-Stat in 2 ways:

(1) Currently, if the python-embedding script returns bad status, the entire Ensemble-Stat run exits. Update the python-embedding logic to allow for runtime failures without the tool exiting. Then treat a python-embedding failure as if it were a missing input file... and count that against ens.ens_thresh.

(2) When reading ensemble data, check to see if all data is bad data. If so, also treat that as if it were a missing input file. Question... should this check only be applied for python-embedding, or for all input file types?

Some issues to consider:

  • We would like to NOT have to execute the python embedding scripts multiple times because that's slow!
  • For python-embedding, we can actually have multiple fields requested. How do we handle the case when the python-embedding script runs fine for the first field of the first file, but then fails for the second field of the first file?

Find data for this in eyewall:/d1/projects/nrl_aerosol/mp_work

Time Estimate

2 days.

Sub-Issues

Consider breaking the enhancement down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Funding Source

Define the source of funding and account keys here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority
  • Select requestor(s)

Projects and Milestone

  • Review projects and select relevant Repository and Organization ones or add "alert:NEED PROJECT ASSIGNMENT" label
  • Select milestone to next major version milestone or "Future Versions"

Define Related Issue(s)

Consider the impact to the other METplus components.

Enhancement Checklist

See the METplus Workflow for details.

  • Complete the issue definition above, including the Time Estimate and Funding Source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>_<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update unit tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s), Project(s), Milestone, and Linked issues
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.
@JohnHalleyGotway JohnHalleyGotway added type: enhancement Improve something that it is currently doing component: application code priority: medium Medium Priority requestor: Navy/NRL Naval Research Laboratory alert: NEED ACCOUNT KEY Need to assign an account key to this issue labels Sep 16, 2020
@JohnHalleyGotway JohnHalleyGotway added this to the MET 10.0 milestone Sep 16, 2020
@JohnHalleyGotway JohnHalleyGotway self-assigned this Sep 16, 2020
@JohnHalleyGotway JohnHalleyGotway added this to To do in MET-10.0.0-beta1 (10/22/20) via automation Sep 16, 2020
@TaraJensen TaraJensen added priority: blocker Blocker and removed alert: NEED ACCOUNT KEY Need to assign an account key to this issue priority: medium Medium Priority labels Sep 22, 2020
@JohnHalleyGotway JohnHalleyGotway added this to To do in MET-10.0.0-beta2 (12/7/20) via automation Oct 13, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 4, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 4, 2020
… change its return from void to bool. And simply return false, if the raw input data is all bad data. Second, print debug(4) log messages to list out the range of valid data and timing information. It'll be really nice to make this consistent across all the input file types.
JohnHalleyGotway added a commit that referenced this issue Dec 4, 2020
… to account for the fact that it now returns a bool instead of void.
@JohnHalleyGotway JohnHalleyGotway changed the title Update Ensemble-Stat to better handle python-embedding failures and entire fields of missing data. Treat gridded fields of entirely missing data as a missing files, and fix python embedding to call common data processing code. Dec 5, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 5, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 5, 2020
…s_data_plane() function to apply data shifting, censoring, conversion, and check for all missing data.
@JohnHalleyGotway
Copy link
Collaborator Author

Made the following changes:

  • Updated the common library function, process_data_plane()...
    • Return boolean instead of void.
    • Return false if the input is entirely missing data values.
    • Print Debug(4) messages about the data planes processed (previously this was only printed for CF-compliant NetCDF data)
  • Updated the library code for vx_data2d_grib, vx_data2d_grib2, vx_data2d_nc_met, vx_data2d_nc_pinterp, and vx_data_nccf to check the new boolean return status from the process_data_plane() function.
  • Updated the library code for vx_data2d_python to call process_data_plane() in the first place. Prior to this, data shifting, censoring, and conversion logic was not working with python embedding.

@JohnHalleyGotway JohnHalleyGotway changed the title Treat gridded fields of entirely missing data as a missing files, and fix python embedding to call common data processing code. Treat gridded fields of entirely missing data as missing files, and fix python embedding to call common data processing code. Dec 5, 2020
@JohnHalleyGotway JohnHalleyGotway changed the title Treat gridded fields of entirely missing data as missing files, and fix python embedding to call common data processing code. Treat gridded fields of entirely missing data as missing files and fix python embedding to call common data processing code. Dec 5, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 5, 2020
…that I ran across when testing for met-help.
@JohnHalleyGotway JohnHalleyGotway linked a pull request Dec 6, 2020 that will close this issue
8 tasks
JohnHalleyGotway added a commit that referenced this issue Dec 6, 2020
Merging into the develop branch to get these changes included for the met-10.0.0-beta2 development release. These are good and worthwhile changes to make on their own. However, I do still need @j-opatz to confirm that they actually fix the original issue with python embedding in ensemble-stat!

* Per #1494, add DataPlane::is_all_bad_data() function to determine whether the field is all bad data.

* Per #1494, update the process_data_plane() function in 2 ways. First, change its return from void to bool. And simply return false, if the raw input data is all bad data. Second, print debug(4) log messages to list out the range of valid data and timing information. It'll be really nice to make this consistent across all the input file types.

* Per #1494, update the logic for all the calls to process_data_plane() to account for the fact that it now returns a bool instead of void.

* Per #1494, refine the Data plane debug(4) log messages.

* Per #1494, work on log message.

* Per #1494, updated vx_data2d_python library to call the common process_data_plane() function to apply data shifting, censoring, conversion, and check for all missing data.

* Per #1494, just fixing a small typo in an unrelated source code file that I ran across when testing for met-help.
@JohnHalleyGotway JohnHalleyGotway moved this from To do to Done in MET-10.0.0-beta2 (12/7/20) Dec 6, 2020
JohnHalleyGotway added a commit that referenced this issue Dec 7, 2020
* Task 1455 doc (#1550)

* first stab at converting to sphinx

* removing all slashes

* adding new link to README.rst file

* working on lists

* Made formatting changes

* Finished fcst section

* fixing spelling, bolding and italics issues

* updating web links

* working on formatting

* updating formatting

* formatting

* first attempt to clean up formatting completed.

* adding README to the index file

* fixing warning errors

* Bringing README_TC into sphinx.  Updating section headers

* Adding README_TC

* Made formatting updates to README.rst

* corrected section under wavelet

* small changes

* removing met/data/config/README since it is now in met/docs/Users_Guide

* Added some formatting for headers

* fixing chapters & sections

* Fixed warnings from building

Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>

* Add debug level 4 message to list out the number of GRIB2 records inventoried. This helps debugging issues with MET potentially not reading all input GRIB2 records on WCOSS.

* Update Makefile.am

PR #1550 broke the build. It removed the data/config/README file but left a reference to it in Makefile.am. I'm removing that reference directly in the develop branch to get the Docker build, nightly regression test, and nightly Fortify build working.

* Bugfix 1554 develop ncdump (#1556)

* Task 1455 doc (#1557)

* first stab at converting to sphinx

* removing all slashes

* adding new link to README.rst file

* working on lists

* Made formatting changes

* Finished fcst section

* fixing spelling, bolding and italics issues

* updating web links

* working on formatting

* updating formatting

* formatting

* first attempt to clean up formatting completed.

* adding README to the index file

* fixing warning errors

* Bringing README_TC into sphinx.  Updating section headers

* Adding README_TC

* Made formatting updates to README.rst

* corrected section under wavelet

* small changes

* removing met/data/config/README since it is now in met/docs/Users_Guide

* Added some formatting for headers

* fixing chapters & sections

* Fixed warnings from building

* adding in code blocks

* removing slashes

* changes

* Made changes to formatting

* removing For example code blocks

* major updates

* fist pass at document conversion complete.

* cleaning up questions about dashes

* Made some formatting modifications

* Removing README_TC because it is being replaced by README_TC.rst in met/docs/Users_Guide

* Removing the reference to the README_TC file

* Making title capitalization consistent with README

* Added a space in timestring

* changing to 'time string' with a space between the words.

* adding a link to the new README_TC location in met/docs/Users_Guide

* Modified references to README and README_TC

Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>

* Bugfix 1562 develop grid_diag (#1564)

* Per #1562, add the same grid_diag fix for the develop branch.

* Per #1562, removing the poly = CONUS.poly mask from GridDiagConfig_TMP. That settting masked a problem in the handling of missing data. Exercising the mask.poly option is tested in another unit test. This will change the output and break the nightly build, but that's good since we'll do more thorough testing.

* Per #1508, change the verbosity in unit_tc_gen.xml from -v 2 to -v 5 to print out some additional log messages that may help in debugging the intermittent file list failure.

* Removed references data/config/README

* Feature 1528 plot_point_obs (#1560)

* Per #1528, adding default config file for PlotPointObs tool.

* Per #1528, added line_width to config file.

* Per #1528, adding PlotPointObs config file and it actually compiles. Now I need to do all the work.

* Per #1528, making some progress adding Observation objects and storing the unique locations in a set. Next, I need to parse the options from the config file.

* Per #1528, add config constants for plot_point_obs options and update PlotInfo to include an on/off flag and colorbar flag.

* Per #1528, update plot_point_obs config file options and plot_point_obs_conf_info.h/.cc to process them.

* Per #1528, add a couple more config keys for Plot-Point-Obs.

* Per #1528, add an obs_gc array to the default Plot-Point-Obs config file.

* Per #1528, make a few local plotting functions global so that Plot-Point-Obs can call them.

* Per #1528, lots of changes to Plot-Point-Obs. Getting closer. Still need to finish up coding, add tests, and update the documentation.

* Per #1528, I found that PlotInfo.colorbar_spacing was never actually used in the code. So I removed it from the MODE and PlotPointObs config files and removed it from the PlotInfo object.

* Per #1528, add timestring_to_time_t() utility function to vx_cal library.

* Per #1528, cleanup... just removed commented out code.

* Per #1528, update plot_point_obs.cc to parse/process the observation valid time correctly.

* For #1528, change the library order to make the linker happy.

* Per #1528, I had changed message_type to msg_typ but failed to updated the default config file.

* Per #1527, making the usage statement slightly more concise.

* Per #1528, changing the default line_color and fill_color entries to be empty arrays.

* Per #1528, add in support for the convert and data censoring logic.

* Per #1528, update the documentation for plot_point_obs to reflect the new usage.

* 1528 Consider making the plotting options of plot_point_obs more configurable (#1559)

* #1528 Added get_dim_size

* #1528 Added qty_list

* #1528 Get the quality flag string. Get the character dimensions from the variable

* Per #1528, slight refomatting of source code for consistent line lengths.

* Per #1528, setting the default dotsize to 1.0, as it was previously (not 10!).

* Per #1528, fix logic so that if a fill color table is specified fill_point is set to true. Also adjust margins.

* Per #1528, add a new call to plot_point_obs in unit_plot_point_obs.xml to exercise the new config file options.

* Per #1528, only make the margins bigger if we're actually plotting a colorbar. Otherwise, retain the previous margin sizes.

* Update met/data/config/PlotPointObsConfig_default

Co-authored-by: jprestop <jpresto@ucar.edu>

* Update PlotPointObsConfig_default

* Update test/config/PlotPointObsConfig

Co-authored-by: jprestop <jpresto@ucar.edu>

* Consistent formatting

* Per #1528, based on Michelle's feedback, add the -title string option to the plot_point_obs usage statement.

Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
Co-authored-by: jprestop <jpresto@ucar.edu>

* Feature 1568 ncl (#1570)

* Per #1568, add python script to convert NCL *.rgb colormaps to MET *.ctable color tables. Also store 270 new or updated colortables that are the output from this script. Note that some of the existing colortables have changed ever so slightly. That's the result of some imprecise rounding earlier and more precise rounding now.

* Per #1568, update and rerun the rgb2ctable.py conversion script to write a header at the top of each colortable file to list the source NCL colormap.

* Updated README and README_TC references in test/config and test/config/ref_config areas

* Updated reference formatting and modified some versions to be consistent with other section of the User's Guide (#1577)

* Feature 1574 rotlatlon (#1576)

* Feature 1474 README (#1582)

* Changed name of README and README_TC, modified references to those, and cleaned up some formatting.

* Fixed formatting and language

* Update data_io.rst

* Update data_io.rst

* Update config_options_tc.rst

Co-authored-by: johnhg <johnhg@ucar.edu>

* Added tilda files and temp files surrounded by # to .gitignore

* Task 1455 doc (#1585)

* first stab at converting to sphinx

* removing all slashes

* adding new link to README.rst file

* working on lists

* Made formatting changes

* Finished fcst section

* fixing spelling, bolding and italics issues

* updating web links

* working on formatting

* updating formatting

* formatting

* first attempt to clean up formatting completed.

* adding README to the index file

* fixing warning errors

* Bringing README_TC into sphinx.  Updating section headers

* Adding README_TC

* Made formatting updates to README.rst

* corrected section under wavelet

* small changes

* removing met/data/config/README since it is now in met/docs/Users_Guide

* Added some formatting for headers

* fixing chapters & sections

* Fixed warnings from building

* adding in code blocks

* removing slashes

* changes

* Made changes to formatting

* removing For example code blocks

* major updates

* fist pass at document conversion complete.

* cleaning up questions about dashes

* Made some formatting modifications

* Removing README_TC because it is being replaced by README_TC.rst in met/docs/Users_Guide

* Removing the reference to the README_TC file

* Making title capitalization consistent with README

* Added a space in timestring

* changing to 'time string' with a space between the words.

* adding a link to the new README_TC location in met/docs/Users_Guide

* Modified references to README and README_TC

* small formatting changes

* small formatting changes

* fixing tabs

* fixing spacing around number 11

* removing parenthesis around reference dates.

* adding parenthesis back in.

* fixing references

* updating references

* Update appendixC.rst

Removed space from "HAUSDOR FF"

* Update plotting.rst

Changed a couple of references of Plot_Point_Obs to Plot-Point-Obs

* Update point-stat.rst

Added oxford commas

Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>

* Feature 1355 ioda (#1587)

* #1355 Added makefile for ioda2.nc

* #1355 Added ioda2nc

* #1355 Added unit_ioda2nc.xml

* #1355 Added yyyymmddThhmmss_to_unix and is_yyyymmddThhmmss

* #1355 Added yyyymmddThhmmss_to_unix and is_yyyymmddThhmmss

* #1355 Added parse_conf_metadata_map and parse_conf_obs_name_map

* #1355 Added conf_key_obs_name_map, conf_key_metadata_map, and conf_key_missing_thresh

* #1355 Exception handlijng at get_att_value_chars

* #1355 Initial release

* #1355 Initial release

* #1355 Cleanup

* #1355 Corretced NC_BYTE value

* #1355 Initial release

* #1355 Added IODA2NCConfig_efault

* #1355 Added IODA2NCConfig_default

* #1355 Turn off time_summary

* #1355 Changed missing_thresh +-1e9

* #1355 Terminate string

* #1355 Corrected echo statement

* #1355 Removed unused variable

* #1355 Make the string null terminated

* #1355 Move the IODA2NCConfig_default to above to avoid merge conflict

* #1355 To aoide merge conflict

* #1355 To avoid a merge conflict

* #1355 To avoid a merge conflict

* #1355 To avoid a merge conflict

* #1355 To avoid a merge conflict

* #1355 To avoid a merge conflict

* #1355 To avoid a merge conflict

* #1355 To avoid a merge conflict

* Per #1355, add .gitignore file for ioda2nc.

* Per #1355, had to add test stub for the new ioda2nc tool to enable 'make test' to run.

* Per #1355, tweak met/scripts/Makefile to NOT ignore the ENABLE_PYTHON configuration option when constructing the list of tests.

* Per #1355, ignore the scripts/ioda2nc file.

Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>

* Per #1590, change V10.0 to V10.0.0 to make use of X.Y.Z version numbering. (#1591)

* Task_1455_doc (#1595)

* first stab at converting to sphinx

* removing all slashes

* adding new link to README.rst file

* working on lists

* Made formatting changes

* Finished fcst section

* fixing spelling, bolding and italics issues

* updating web links

* working on formatting

* updating formatting

* formatting

* first attempt to clean up formatting completed.

* adding README to the index file

* fixing warning errors

* Bringing README_TC into sphinx.  Updating section headers

* Adding README_TC

* Made formatting updates to README.rst

* corrected section under wavelet

* small changes

* removing met/data/config/README since it is now in met/docs/Users_Guide

* Added some formatting for headers

* fixing chapters & sections

* Fixed warnings from building

* adding in code blocks

* removing slashes

* changes

* Made changes to formatting

* removing For example code blocks

* major updates

* fist pass at document conversion complete.

* cleaning up questions about dashes

* Made some formatting modifications

* Removing README_TC because it is being replaced by README_TC.rst in met/docs/Users_Guide

* Removing the reference to the README_TC file

* Making title capitalization consistent with README

* Added a space in timestring

* changing to 'time string' with a space between the words.

* adding a link to the new README_TC location in met/docs/Users_Guide

* Modified references to README and README_TC

* small formatting changes

* small formatting changes

* fixing tabs

* fixing spacing around number 11

* removing parenthesis around reference dates.

* adding parenthesis back in.

* fixing references

* updating references

* Update appendixC.rst

Removed space from "HAUSDOR FF"

* Update plotting.rst

Changed a couple of references of Plot_Point_Obs to Plot-Point-Obs

* Update point-stat.rst

Added oxford commas

* bolding Config and italizing directory names.

* Modified format

* italicizing directories.

* removed extra tool

* italicizing directories

* bolding

* Update plotting.rst

Changed Plot_Point_Obs to Plot-Point-Obs

Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>

* Per #1598, update comment in all the MET config files. (#1599)

* Feature 1494 ens_stat (#1600)

Merging into the develop branch to get these changes included for the met-10.0.0-beta2 development release. These are good and worthwhile changes to make on their own. However, I do still need @j-opatz to confirm that they actually fix the original issue with python embedding in ensemble-stat!

* Per #1494, add DataPlane::is_all_bad_data() function to determine whether the field is all bad data.

* Per #1494, update the process_data_plane() function in 2 ways. First, change its return from void to bool. And simply return false, if the raw input data is all bad data. Second, print debug(4) log messages to list out the range of valid data and timing information. It'll be really nice to make this consistent across all the input file types.

* Per #1494, update the logic for all the calls to process_data_plane() to account for the fact that it now returns a bool instead of void.

* Per #1494, refine the Data plane debug(4) log messages.

* Per #1494, work on log message.

* Per #1494, updated vx_data2d_python library to call the common process_data_plane() function to apply data shifting, censoring, conversion, and check for all missing data.

* Per #1494, just fixing a small typo in an unrelated source code file that I ran across when testing for met-help.

Co-authored-by: lisagoodrich <33230218+lisagoodrich@users.noreply.github.com>
Co-authored-by: Julie.Prestopnik <jpresto@ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@kiowa.rap.ucar.edu>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: blocker Blocker requestor: Navy/NRL Naval Research Laboratory type: enhancement Improve something that it is currently doing
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants