Update develop-ref after #2877 (#2881)

* 2673 Moved dvariable declaration after include * #2673 Move down namespace below include * Feature #2395 wdir (#2820) * Per #2395, add new columns to VL1L2, VAL1L2, and VCNT line types for wind direction statistics. Work still in progress. * Per #2395, write the new VCNT columns to the output and document the additions to the VL1L2, VAL1L2, and VCNT columns. * Per #2395, add the definition of new statistics to Appendix G. * Per #2395, update file version history. * Per #2395, tweak warning message about zero wind vectors and update grid-stat and point-stat to log calls to the do_vl1l2() function. * Per #2395, refine the weights for wind direction stats, ignoring the undefined directions. * Update src/tools/core/stat_analysis/aggr_stat_line.cc * Update src/tools/core/stat_analysis/parse_stat_line.cc * Update src/tools/core/stat_analysis/aggr_stat_line.cc * Recent changes to branch protection rules for the develop branch have broken the logic of the update_truth.yml GHA workflow. Instead of submitting a PR to merge develop into develop-ref directly, use an intermediate update_truth_for_develop branch. * Feature #2280 ens_prob (#2823) * Per #2280, update to support probability threshold strings like ==8, where 8 is the number of ensemble members, to create probability bins centered on the n/8 for n = 0 ... 8. * Per #2280, update docs about probability threshold settings. * Per #2280, use a loose tolerance when checking for consistent bin widths. * Per #2280, add a new unit test for grid_stat to demonstrate processing the output from gen_ens_prod. * Per #2280, when verifying NMEP probability forecasts, smooth the obs data first. * Per #2280, only request STAT output for the PCT line type to match unit_grid_stat.xml and minimize the new output files. * Per #2280, update config option docs. * Per #2280, update config option docs. * #2673 Change 0 to nullptr * #2673 Change 0 to nullptr * #2673 Change 0 to nullptr * #2673 Change 0 to nullptr * #2673 Change 0 to nullptr * #2673 Removed the redundant parentheses with return * #2673 Removed the redundant parentheses with return * #2673 Removed the redundant parentheses with return * #2673 Removed the redundant parentheses with return * #2673 Removed the redundant parentheses with return * #2673 restored return statement * #2673 Added std namespace * #2673 Moved down 'using namespace' statement. Removed trailing spaces * #2673 Moved down 'using namespace' statement. * #2673 Moved down 'using namespace' statement. * #2673 Moved down 'using namespace' statement. * #2673 Moved down 'using namespace' statement. * #2673 Added std namespace * #2673 Added std namespace * #2673 Added std namespace * #2673 Changed literal 1 to boolean value, true * Feature #2673 enum_to_string (#2835) * Feature #2583 ecnt (#2825) * Unrelated to #2583, fix typo in code comments. * Per #2583, add hooks write 3 new ECNT columns for observation error data. * Per #2583, make error messages about mis-matched array lengths more informative. * Per #2583, switch to more concise variable naming conventions of ign_oerr_cnv, ign_oerr_cor, and dawid_seb. * Per #2583, fix typo to enable compilation * Per #2583, define the 5 new ECNT column names. * Per #2583, add 5 new columns to the ECNT table in the Ensemble-Stat chapter * Per #2583, update stat_columns.cc to write these 5 new ECNT columns * Per #2583, update ECNTInfo class to compute the 5 new ECNT statistics. * Per #2583, update stat-analysis to parse the 5 new ECNT columns. * Per #2583, update aggregate_stat logic for 5 new ECNT columns. * Per #2583, update PairDataEnsemble logic for 5 new ECNT columns * Per #2583, update vx_statistics library with obs_error handling logic for the 5 new ECNT columns * Per #2583, changes to make it compile * Per #2583, changes to make it compile * Per #2583, switch to a consistent ECNT column naming convention with OERR at the end. Using IGN_CONV_OERR and IGN_CORR_OERR. * Per #2583, define ObsErrorEntry::variance() with a call to the dist_var() utility function. * Per #2583, update PairDataEnsemble::compute_pair_vals() to compute the 5 new stats with the correct inputs. * Per #2583, add DEBUG(10) log messages about computing these new stats. * Per #2583, update Stat-Analysis to compute these 5 new stats from the ORANK line type. * Per #2583, whitespace and comments. * Per #2583, update the User's Guide. * Per #2583, remove the DS_ADD_OERR and DS_MULT_OERR ECNT columns and rename DS_OERR as DSS, since observation error is not actually involved in its computation. * Per #2583, minor update to Appendix C * Per #2583, rename ECNT line type statistic DSS to IDSS. * Per #2583, fix a couple of typos * Per #2583, more error checking. * Per #2583, remove the ECNT IDSS column since its just 2*pi*IGN, the existing ignorance score, and only provides meaningful information when combined with the other Dawid-Sebastiani statistics that have already been removed. * Per #2583, add Eric's documentation of these new stats to Appendix C. Along the way, update the DOI links in the references based on this APA style guide: https://apastyle.apa.org/style-grammar-guidelines/references/dois-urls#:~:text=Include%20a%20DOI%20for%20all,URL%2C%20include%20only%20the%20DOI. * Per #2583, fix new equations with embedded underscores for PDF by defining both html and pdf formatting options. * Per #2583, update the ign_conv_oerr equation to include a 2 *pi multiplier for consistency with the existing ignorance score. Also, fix the documented equations. * Per #2583, remove log file that was inadvertently added on this branch. * Per #2583, simplify ObsErrorEntry::variance() implementation. For the distribution type of NONE, return a variance of 0.0 rather than bad data, as discussed with @michelleharrold and @JeffBeck-NOAA on 3/8/2024. --------- Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu> * Revert #2825 since more documentation and testing is needed (#2837) This reverts commit 108a895. * Feature #2583 ecnt fix IGN_OERR_CORR (#2838) * Unrelated to #2583, fix typo in code comments. * Per #2583, add hooks write 3 new ECNT columns for observation error data. * Per #2583, make error messages about mis-matched array lengths more informative. * Per #2583, switch to more concise variable naming conventions of ign_oerr_cnv, ign_oerr_cor, and dawid_seb. * Per #2583, fix typo to enable compilation * Per #2583, define the 5 new ECNT column names. * Per #2583, add 5 new columns to the ECNT table in the Ensemble-Stat chapter * Per #2583, update stat_columns.cc to write these 5 new ECNT columns * Per #2583, update ECNTInfo class to compute the 5 new ECNT statistics. * Per #2583, update stat-analysis to parse the 5 new ECNT columns. * Per #2583, update aggregate_stat logic for 5 new ECNT columns. * Per #2583, update PairDataEnsemble logic for 5 new ECNT columns * Per #2583, update vx_statistics library with obs_error handling logic for the 5 new ECNT columns * Per #2583, changes to make it compile * Per #2583, changes to make it compile * Per #2583, switch to a consistent ECNT column naming convention with OERR at the end. Using IGN_CONV_OERR and IGN_CORR_OERR. * Per #2583, define ObsErrorEntry::variance() with a call to the dist_var() utility function. * Per #2583, update PairDataEnsemble::compute_pair_vals() to compute the 5 new stats with the correct inputs. * Per #2583, add DEBUG(10) log messages about computing these new stats. * Per #2583, update Stat-Analysis to compute these 5 new stats from the ORANK line type. * Per #2583, whitespace and comments. * Per #2583, update the User's Guide. * Per #2583, remove the DS_ADD_OERR and DS_MULT_OERR ECNT columns and rename DS_OERR as DSS, since observation error is not actually involved in its computation. * Per #2583, minor update to Appendix C * Per #2583, rename ECNT line type statistic DSS to IDSS. * Per #2583, fix a couple of typos * Per #2583, more error checking. * Per #2583, remove the ECNT IDSS column since its just 2*pi*IGN, the existing ignorance score, and only provides meaningful information when combined with the other Dawid-Sebastiani statistics that have already been removed. * Per #2583, add Eric's documentation of these new stats to Appendix C. Along the way, update the DOI links in the references based on this APA style guide: https://apastyle.apa.org/style-grammar-guidelines/references/dois-urls#:~:text=Include%20a%20DOI%20for%20all,URL%2C%20include%20only%20the%20DOI. * Per #2583, fix new equations with embedded underscores for PDF by defining both html and pdf formatting options. * Per #2583, update the ign_conv_oerr equation to include a 2 *pi multiplier for consistency with the existing ignorance score. Also, fix the documented equations. * Per #2583, remove log file that was inadvertently added on this branch. * Per #2583, simplify ObsErrorEntry::variance() implementation. For the distribution type of NONE, return a variance of 0.0 rather than bad data, as discussed with @michelleharrold and @JeffBeck-NOAA on 3/8/2024. * Per #2583, updates to ensemble-stat.rst recommended by @michelleharrold and @JeffBeck-NOAA. * Per #2583, implement changes to the IGN_CORR_OERR corrected as directed by @ericgilleland. --------- Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu> * Update the pull request template to include a question about expected impacts to existing METplus Use Cases. * #2830 Changed enum Builtin to enum class * #2830 Converted enum to enum class at config_constants.h * Feature #2830 bootstrap enum (#2843) * Bugfix #2833 develop azimuth (#2840) * Per #2833, fix n-1 bug when defining the azimuth delta for range/azimuth grids. * Per #2833, when definng TcrmwData:range_max_km, divide by n_range - 1 since the range values start at 0. * Per #2833, remove max_range_km from the TC-RMW config file. Set the default rmw_scale to NA so that its not used by default. And update the documentation. Still actually need to make the logic of the code work as it should. * Per #2833, update tc_rmw to define the range as either a function of rmw or using explicit spacing in km. * Per #2833, update the TCRMW Config files to remove the max_range_km entry, and update the unit test for one call to use RMW ranges and the other to use ranges defined in kilometers. * Per #2833, just correct code comments. * Per #2833, divide by n - 1 when computing the range delta, rather than n. * Per #2833, correct the handling of the maximum range in the tc-rmw tool. For fixed delta km, need to define the max range when setting up the grid at the beginning. --------- Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu> * #2830 Changed enum PadSize to enum class * #2830 Removed redundant parantheses * #2830 Removed commenyted out code * #2830 Use auto * #2830 Changed enum to enum class for DistType, InterpMthd, GridTemplates, and NormalizeType * #2830 Moved enum_class_as_integer from header file to cc files * #2830 Added enum_as_int.hpp * #2830 Added enum_as_int.hpp * Deleted enum_class_as_integer and renamed it to enum_class_as_int * Removed redundant paranthese * #2830 Changed enum to enumclass * #2830 Changed enum_class_as_integer to enum_class_as_int * Feature #2379 sonarqube gha (#2847) * Per #2379, testing initial GHA SonarQube setup. * Per #2379, switch to only analyzing the src directory. * Per #2379, move more config logic from sonar-project.properties into the workflow. #ci-skip-all * Per #2379, try removing + symbols * Per #2379, move projectKey into xml workflow and remove sonar-project.properties. * Per #2379, try following the instructions at https://github.com/sonarsource-cfamily-examples/linux-autotools-gh-actions-sq/blob/main/.github/workflows/build.yml ci-skip-all * Per #2379, see details of progress described in this issue comment: #2379 (comment) * Unrelated to #2379, just removing spurious space that gets flagged as a diff when re-running enum_to_string on seneca. * Per #2379, try running SonarQube through GitHub. * Per #2379, remove empty env section and also disable the testing workflow temporarily during sonarqube development. * Per #2379, fix docker image name. * Per #2379, delete unneeded script. * Per #2379, update GHA to scan Python code and push to the correct SonarQube projects. * Per #2379, update GHA SonarQube project names * Per #2379, update the build job name * Per #2379, update the comile step name * Per #2379, switch to consistent SONAR variable names. * Per #2379, fix type in sed expressions. * Per #2379, just rename the log artifact * Per #2379, use time_command wrapper instead of run_command. * Per #2379, fix bad env var name * Per #2379, switch from egrep to grep. * Per #2379, just try cat-ting the logfile * Per #2379, test whether cat-ting the log file actually works. * Per #2379, revert back * Per #2379, mention SonarQube in the PR template. Make workflow name more succinct. * Per #2379, add SONAR_REFERENCE_BRANCH setting to define the sonar.newCode.referenceBranch property. The goal is to define the comparison reference branch for each SonarQube scan. * Per #2379, have the sonarqube.yml job print the reference branch it's using * Per #2379, intentionally introduce a new code smell to see if SonarQube correctly flag it as appearing in new code. * Per #2379, trying adding the SonarQube quality gate check. * Per #2379, add logic for using the report-task.txt output files to check the quality gate status for both the python and cxx scans. * Per #2379 must use unique GHA id's * Per #2379, working on syntax for quality gate checks * Per #2379, try again. * Per #2379, try again * Per #2379, try again * Per #2379, try again * Per #2379, try again * Per #2379, try again * Per #2379, try yet again * Per #2379 * Per #2379, add more debug * Per #2379, remove -it option from docker run commands * Per #2379, again * Per #2379, now that the scan works as expected, remove the intentional SonarQube code smell as well as debug logging. * Hotfix related to #2379. The sonar.newCode.referenceBranch and sonar.branch.name cannot be set to the same string! Only add the newCode definition when they differ. * #2830 Changed enum STATJobType to enum class * #2830 Changed STATLineType to enum class * #2830 Changed Action to enum class * #2830 Changed ModeDataType to enum class * #2830 Changed StepCase to enum class * #2830 Changed enum to enum class * #2830 Changed GenesisPairCategory to enum class * #2830 Removed rediundabt parenrthese * #2830 Reduced same if checking * #2830 Cleanup * #2830 USe empty() instead of lebgth checking * #2830 Adjusted indentations * Feature #2379 develop sonarqube updates (#2850) * Per #2379, move rgb2ctable.py into the python utility scripts directory for better organization and to enable convenient SonarQube scanning. * Per #2379, remove point.py from the vx_python3_utils directory which cleary was inadvertenlty added during development 4 years ago. As far as I can tell it isn't being called by any other code and doesn't belong in the repository. Note that scripts/python/met/point.py has the same name but is entirely different. * Per #2379, update the GHA SonarQube scan to do a single one with Python and C++ combined. The nightly build script is still doing 2 separate scans for now. If this all works well, they could also be combined into a single one. * Per #2379, eliminate MET_CONFIG_OPTIONS from the SonarQube workflow since it doesn't need to be and probably shouldn't be configurable. * Per #2379, trying to copy report-task.txt out of the image * Per #2379, update build_met_sonarqube.sh to check the scan return status * Per #2379, fix bash assignment syntax * Per #2379, remove unused SCRIPT_DIR envvar * Per #2379, switch to a single SonarQube scan for MET's nightly build as well * Feature 2654 ascii2nc polar buoy support (#2846) * Added iabp data type, and modified file_handler to filter based on time range, which was added as a command line option * handle time using input year, hour, min, and doy * cleanup and switch to position day of year for time computations * Added an ascii2nc unit test for iabp data * Added utility scripts to pull iabp data from the web and find files in a time range * Modified iabp_handler to always output a placeholder 'location' observation with value 1 * added description of IABP data python utility scripts * Fixed syntax error * Fixed Another syntax error. * Slight reformat of documentation * Per #2654, update the Makefiles in scripts/python/utility to include all the python scripts that should be installed. * Per #2654, remove unused code from get_iabp_from_web.py that is getting flagged as a bug by SonarQube. * Per #2654, fix typo in docs --------- Co-authored-by: John Halley Gotway <johnhg@ucar.edu> Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu> * Feature #2786 rpss_from_prob (#2861) * Per #2786, small change to a an error message unrelated to this development. * Per #2786, add RPSInfo::set_climo_prob() function to derive the RPS line type from climatology probability bins. And update Ensemble-Stat to call it. * Per #2786, minor change to clarify error log message. * Per #2786, for is_prob = TRUE input, the RPS line type is the only output option. Still need to update docs! * Per #2786, add new call to Ensemble-Stat to test computing RPS from climo probabilities * Per #2786, use name rps_climo_bin_prob to be very explicit. * Per #2786, redefine logic of RPSInfo::set_climo_bin_prob() to match the CPC definition. Note that reliability, resolution, uncertainty, and RPSS based on the sample climatology are all set to bad data. Need to investigate whether they can be computed using these inputs. * Per #2786, remove the requirement that any fcst.prob_cat_thresh thresholds must be defined. If they are defined, pass them through to the FCST_THRESH output column. If not, write NA. Add check to make sure the event occurs in exactly 1 category. * Per #2786, don't enforce fcst.prob_cat_thresh == obs.prob_cat_thresh for probabilistic inputs. And add more is_prob checks so that only the RPS line type can be written when given probabilistic inputs. * updated documentation * Per #2786, call rescale_probability() function to convert from 0-100 probs to 0-1 probs. --------- Co-authored-by: j-opatz <jopatz@ucar.edu> * Feature #2862 v12.0.0-beta4 (#2864) * Feature #2379 develop single_sq_project (#2865) * Hotfix to the documentation in the develop branch. Issue #2858 was closed as a duplicate of #2857. I had included it in the MET-12.0.0-beta4 release notes, but the work is not yet actually complete. * Feature 2842 ugrid config (#2852) * #2842 Removed UGrid related setting * #2842 Corrected vertical level for data_plane_array * #2842 Do not allow the time range * #2842 The UGridConfig file can be passed as ugrid_dataset * #2842 Changed -config option to -ugrid_config * #2842 Deleted UGrid configurations * 2842 Fix a compile error when UGrid is disabled * #2842 Cleanup * #2842 Added an unittest point_stat_ugrid_mpas_config * #2842 Added a PointStatConfig without UGrid dataset. * #2842 Corrected ty[po at the variable name * Switched from time_centered to time_instant. I think time_centered is the center of the forecast lead window and time_instant is the time the forecast is valid (end of forecast window). * #2842 Removed ugrid_max_distance_km and unused metadata names * #2842 Restored time variable time_instant for LFric * #2842 Adjust lon between -180 and 180 * #2842 Adjust lon between -180 and 180 * #2842 Adjust lon between -180 and 180 * #2842 Adjusted lon to between -180 to 180 * #2842 Changed variable names * Per #2842, switch from degrees east to west right when the longitudes are read. * #2842, switch from degrees east to west right when the longitudes are read * #2842 Cleanup debug messages --------- Co-authored-by: Howard Soh <hsoh@seneca.rap.ucar.edu> Co-authored-by: Daniel Adriaansen <dadriaan@ucar.edu> Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Feature 2753 comp script config (#2868) * set dynamic library file extension to .dylib if running on MacOS and .so otherwise * Added disabling of jasper documentation for compiliation on Hera * Updated * remove extra export of compiler env vars * include full path to log file so it is easier to file the log file to examine when a command fails * send cmake output to a log file * remove redundant semi-colon * use full path to log file so it is easier to examine on failure * use run_cmd to catch if rm command fails * Modifications for compilation on hera, gaea, and orion * Updating * fixed variable name * clean up if/else statements * set TIFF_LIBRARY_RELEASE argument to use full path to dynamic library file to prevent failure installing proj library * set LDFLAGS so that LDFLAGS value set in the user's environment will also be used * Updated based on gaea, orion, and hera installs * Updated * change extension of dynamic library files only if architecture is arm64 because older Macs still use .so * added netcdf library to args to prevent error installing NetCDF-CXX when PROJ has been installed in the same run of the script -- PATH is set in the COMPILE_PROJ if block that causes this flag from being added automatically * clean up how rpath and -L are added to LDFLAGS so that each entry is separate -- prevents errors installing on Mac arm64 because multiple rpath values aren't read using :. Also use MET_PROJLIB * Updated * removed -ltiff from MET libs * only add path to rpath and -L arguments if they are not already included in LDFLAGS * changed from using LIB_TIFF (full path to tiff lib file) to use TIFF_LIB_DIR (dir containing tiff lib file). Added TIFF_INCLUDE_DIR to proj compilation and -DJAS_ENABLE_DOC to jasper compliation taken from @jprestop branch * update comments * ensure all MET_* and MET_*LIB variables are added to the rpath for consistency * remove unnecessary if block and only export LDFLAGS at the end of setting locally * Updated * Added section for adding <VALUE>/lib64 and rearranged placement of ADDTL_DIR * Commenting out the running of the Jasper lib tests * Updating and/or removing files * Updating and/or removing files * Latest udpates which include the addition of the tiff library for proj * Remove commented out line. Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Make indentation consistent. Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Make indentation consistent. Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Make indentation consistent. Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Per 2753, added -lm to configure_lib_args for NetCDF-CXX * Per #2753 updating acorn files * Per #2753, update wcoss2 files * Per #2753, updating acorn file to include MET_PYTHON_EXE * Per #2753, updated files for 12.0.0 for derecho * Per #2753, updated derecho file adding MET_PYTHON_EXE and made corrections * Updating config files * Updating orion files * Updates for gaea's files * Updating gaea modulefile * Removing modulefile for cheyenne * Added MET_PYTHON_EXE * Added MET_PYTHON_EXE to hera too * Adding file for hercules * Removing equals sign from setenv * Adding file for hercules * Updated script to add libjpeg installation for grib2c * Per #2753, Adding file for casper --------- Co-authored-by: George McCabe <23407799+georgemccabe@users.noreply.github.com> Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Feature #2795 level_mismatch_warning (#2873) * Per #2795, move the warning message about level mismatch from the config validation step to when the forecast files are being processed. Only check this when the number of forecast fields is greater than 1, but no longer limit the check to pressure levels only. * Per #2795, add comments * Whitespace * Per #2795, port level mismatch fix over to Ensemble-Stat. Check it for each verification task, but only print it once for each task, rather than once for each task * ensemble member. * Feature #2870 removing_MISSING_warning (#2872) * Per #2870, define utility functions for parsing the file type from a file list and for logging missing files, checking for the MISSING keyword. Also, update Ensemble-Stat and Gen-Ens-Prod to call these functions. * Per #2870, update the gen_ens_prod tests to demonstrate the use of the MISSING keyword for missing files. METplus uses this keyword for Ensemble-Stat and Gen-Ens-Prod. * Feature 2842 ugrid config (#2875) * #2842 Removed UGrid related setting * #2842 Corrected vertical level for data_plane_array * #2842 Do not allow the time range * #2842 The UGridConfig file can be passed as ugrid_dataset * #2842 Changed -config option to -ugrid_config * #2842 Deleted UGrid configurations * 2842 Fix a compile error when UGrid is disabled * #2842 Cleanup * #2842 Added an unittest point_stat_ugrid_mpas_config * #2842 Added a PointStatConfig without UGrid dataset. * #2842 Corrected ty[po at the variable name * Switched from time_centered to time_instant. I think time_centered is the center of the forecast lead window and time_instant is the time the forecast is valid (end of forecast window). * #2842 Removed ugrid_max_distance_km and unused metadata names * #2842 Restored time variable time_instant for LFric * #2842 Adjust lon between -180 and 180 * #2842 Adjust lon between -180 and 180 * #2842 Adjust lon between -180 and 180 * #2842 Adjusted lon to between -180 to 180 * #2842 Changed variable names * Per #2842, switch from degrees east to west right when the longitudes are read. * #2842, switch from degrees east to west right when the longitudes are read * #2842 Cleanup debug messages * #2842 Disabled output types except STAT for sl1l2 * #2842 Disabled output types except STAT for sl1l2 and MPR * #2842 Reduced output files for UGrid --------- Co-authored-by: Howard Soh <hsoh@seneca.rap.ucar.edu> Co-authored-by: Daniel Adriaansen <dadriaan@ucar.edu> Co-authored-by: John Halley Gotway <johnhg@ucar.edu> * Hotfix to develop branch to remove duplicate test named 'point_stat_ugrid_mpas_config'. That was causing unit_ugrid.xml to fail because it was still looking for .txt output files that are no longer being generated. * Feature 2748 document ugrid (#2869) * Initial documentation of the UGRID capability. * Fixes error in references, adds appendix to index, and adds sub-section for configuration entries and a table for metadata map items. * Corrects LFRic, rewords section on UGRID conventions, updates description of using GridStat, and removes mention of nodes. * Forgot one more mention of UGRID conventions. * Incorporates more suggestions from @willmayfield. * Switches to numerical table reference. * Feature #2781 Convert MET NetCDF point obs to Pandas DataFrame (#2877) * Per #2781, added function to convert MET NetCDF point observation data to pandas so it can be read and modified in a python embedding script. Added example python embedding script * ignore python cache files * fixed function call * reduce cognitive complexity to satisfy SonarQube and add boolean return value to catch if function fails to read data * clean up script and add comments * replace call to object function that doesn't exist, handle exception when file passed to script cannot be read by the NetCDF library * rename example script * add new example script to makefiles * fix logic to build pandas DataFrame to properly get header information from observation header IDs * Per #2781, add unit test to demonstrate python embedding script that reads MET NetCDF point observation file and converts it to a pandas DataFrame * Per #2781, added init function for nc_point_obs to take an input filename. Also raise TypeError exception from nc_point_obs.read_data() if input file cannot be read * call parent class init function to properly initialize nc_point_obs --------- Co-authored-by: Howard Soh <hsoh@seneca.rap.ucar.edu> Co-authored-by: John Halley Gotway <johnhg@ucar.edu> Co-authored-by: Howard Soh <hsoh@ucar.edu> Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu> Co-authored-by: davidalbo <dave@ucar.edu> Co-authored-by: j-opatz <jopatz@ucar.edu> Co-authored-by: Daniel Adriaansen <dadriaan@ucar.edu> Co-authored-by: Julie Prestopnik <jpresto@ucar.edu> Co-authored-by: George McCabe <23407799+georgemccabe@users.noreply.github.com> Co-authored-by: metplus-bot <97135045+metplus-bot@users.noreply.github.com>
dtcenter · May 8, 2024 · ca7ca9f · ca7ca9f
1 parent fb4d8f6
commit ca7ca9f
Show file tree

Hide file tree

Showing 13 changed files with 276 additions and 68 deletions.
diff --git a/.gitignore b/.gitignore
@@ -19,3 +19,5 @@ make.log
 make_install.log
 .idea
 cmake-build-debug
+
+__pycache__
diff --git a/docs/Users_Guide/appendixH.rst b/docs/Users_Guide/appendixH.rst
@@ -0,0 +1,87 @@
+.. _appendixH:
+
+*****************************
+Appendix H Unstructured Grids
+*****************************
+
+Introduction
+============
+
+METv12.0.0+ includes limited support for using datasets on unstructured grids (UGRIDs) in the PointStat and GridStat tools. This support includes verifying UGRID forecasts against point observations (PointStat), and verifying UGRID forecasts regridded to a structured grid against gridded observations on a structured grid (GridStat). The implementation of UGRID support in METv12.0.0+ provides a mechanism for a user to describe the topology of their unstructured grid. Thus far, development to support datasets on unstructured grids has been driven by the LFRic (“elfrick”) NWP model (:ref:`Adams et al. 2019 <Adams-2019>`), and the Model for Prediction Across Scales (MPAS) NWP model (:ref:`Skamarock et al., 2012 <Skamarock-2012>`). However, the support for unstructured grids was implemented in such a way that additional unstructured grids can be utilized, with the appropriate mapping of the user's topology to the elements that MET requires.
+
+Unstructured Grid Files
+=======================
+
+Support for UGRID files in NetCDF format is provided. MET will attempt to auto-detect UGRID files. If the file is CF-compliant and the “Conventions” global attribute is “UGRID” or “MPAS”, or if the global attribute “mesh_spec” is present, MET recognizes the file as a UGRID file. If the user is not using a supported UGRID, or a UGRID file that does not contain the required global attributes for autodetection, the user should set the **file_type** configuration entry to **NETCDF_UGRID**. In some cases, the UGRID topology is provided in an ancillary file separate from forecast variables. This is controlled via the **ugrid_coordinates_file** configuration entry. This file is checked first, and then the UGRID data file. If the same elements are found in both files, the information from the data file takes precedence, and thus it is recommended to leave **ugrid_coordinates_file** unset if all required elements are found in the data file. If both a data and coordinates file are provided, the shape of the latitude and longitude dimensions are cross-referenced as a limited enforcement measure that the coordinates file and data file belong to the same UGRID.
+
+Required Unstructured Grid Metadata
+===================================
+
+To correctly parse the UGRID topology, MET needs the following information that must be contained within the UGRID coordinates file or the UGRID data file. It is often the case that the variable names for these elements differ between UGRIDs. If the user is not using one of the supported UGRIDs, they will need to define the **ugrid_metadata_map** in a custom configuration file that they provide on the command line using the **-ugrid_config** command line argument, which tells MET the variable names that correspond to the following elements:
+
+.. _table_ugrid_metadata:
+
+.. list-table:: Required UGRID Metadata
+  :widths: auto
+  :header-rows: 1
+
+  * - Metadata Map Key
+    - Description
+    - Required
+  * - dim_face
+    - Dimension name for UGRID cells
+    - Required
+  * - lat_face
+    - Coordinate variable name for the latitude of each UGRID cell
+    - Required
+  * - lon_face
+    - Coordinate variable name for the longitude of each UGRID cell
+    - Required
+  * - dim_time
+    - Dimension name for the time coordinate
+    - Required
+  * - time
+    - Coordinate variable for time
+    - Required
+  * - dim_vert
+    - Dimension name for the vertical coordinate, if present
+    - Optional
+  * - vert_face
+    - Coordinate variable name for the vertical coordinate
+    - Optional
+
+Unstructured Grid Configuration Entries
+=======================================
+
+In the PointStat and GridStat config files, a user can control aspects of the UGRID capability. Since the UGRID support in MET is optional, these items are not included in the default MET config files. If the user requires modification for any of the following, they must add them to their MET config file.
+
+ugrid_dataset
+-------------
+
+If the UGRID dataset is supported by MET, then this item is set to the string identifying the UGRID. Currently supported ugrid_dataset options include “lfric”, and “mpas”. If this item is not set, the user must provide a separate UGRID configuration file on the command line using the **-ugrid_config** command line argument.
+
+ugrid_max_distance_km
+---------------------
+
+For PointStat, this is the distance from each UGRID cell center that PointStat will search outward to collect observations to use for verification. The default is 0 km, which by default will use the closest observation to the UGRID cell for verification. For GridStat, this is the distance from each forecast grid point to search for observation grid cells to include when interpolating to the forecast grid point locations. Currently, only the nearest point observation or gridded observation cell is used. See `Unstructured Grid Limitations`_ for additional details about interpolation when using GridStat.
+
+ugrid_coordinates_file
+----------------------
+
+The absolute path to the ancillary NetCDF file that describes the UGRID topology. This file is checked for the required UGRID metadata first, and the data file is checked second and takes precedence over this file if the same elements are found in both files.
+
+ugrid_metadata_map
+------------------
+
+The mapping dictionary which allows a user to specify the variable names of the required UGRID topology elements that are required by MET. For "lfric" and "mpas", **ugrid_metadata_map** comes pre-configured with MET and a user can simply set **ugrid_dataset**. Otherwise, the user must create a separate UGRID configuration file containing the **ugrid_metadata_map** containing the elements from :numref:`table_ugrid_metadata` and provide it on the command line using the **-ugrid_config** command line argument.
+
+Unstructured Grid Limitations
+=============================
+
+We anticipate expanding the UGRID capabilities in MET in the near future. Until then, users are encouraged to be mindful of the following limitations:
+
+1. In GridStat, there is no support for verifying directly on a UGRID. The UGRID (either forecast, obs, or both) must be regridded to a structured verification grid prior to verification being performed. In most cases, gridded observations are on a structured grid and so the users are encouraged to regrid their UGRID forecast to the structured observation grid. If the user wishes to verify a UGRID observation against a UGRID forecast (for example, a UGRID model analysis), the user must define a custom verification grid tailored to their UGRID characteristics, being mindful that the regridding can be slow for large verification grids. The user is referred to :ref:`appendixB` for guidance on defining a structured verification grid.
+
+2. Data at cell edges are currently not supported, only those variables which have data at the cell centers are supported. Users should note in particular that wind components that are typically derived using data at cell edges are currently unsupported.
+
+3. No aggregation methods of point observations within the **ugrid_max_distance_km** are supported except NEAREST, and no aggregation methods of gridded observations within the **ugrid_max_distance_km** are supported except NEAREST.
diff --git a/docs/Users_Guide/config_options.rst b/docs/Users_Guide/config_options.rst
@@ -1047,6 +1047,8 @@ to be verified. This dictionary may include the following entries:
                                   single model level.
        file_type = NETCDF_NCCF;   NetCDF following the Climate Forecast
                                   (CF) convention.
+       file_type = NETCDF_UGRID;  NetCDF containing data on an
+                                  unstructured grid.
        file_type = PYTHON_NUMPY;  Run a Python script to load data into
                                   a NumPy array.
        file_type = PYTHON_XARRAY; Run a Python script to load data into

diff --git a/docs/Users_Guide/grid-stat.rst b/docs/Users_Guide/grid-stat.rst
@@ -172,6 +172,7 @@ The usage statement for the Grid-Stat tool is listed below:
          fcst_file
          obs_file
          config_file
+         [-ugrid_config config_file]
          [-outdir path]
          [-log file]
          [-v level]
@@ -191,13 +192,15 @@ Required Arguments for grid_stat
 Optional Arguments for grid_stat
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-4. The **-outdir path** indicates the directory where output files should be written.
+4. The **-ugrid_config** option provides a way for a user to provide a separate config file with metadata about their UGRID.
 
-5. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
+5. The **-outdir path** indicates the directory where output files should be written.
 
-6. The **-v level** option indicates the desired level of verbosity. The contents of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.
+6. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file.
 
-7. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
+7. The **-v level** option indicates the desired level of verbosity. The contents of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity above 1 will increase the amount of logging.
+
+8. The **-compress level** option indicates the desired level of compression (deflate level) for NetCDF variables. The valid level is between 0 and 9. The value of "level" will override the default setting of 0 from the configuration file or the environment variable MET_NC_COMPRESS. Setting the compression level to 0 will make no compression for the NetCDF output. Lower number is for fast compression and higher number is for better compression.
 
 An example of the grid_stat calling sequence is listed below:
 

diff --git a/docs/Users_Guide/index.rst b/docs/Users_Guide/index.rst
@@ -81,6 +81,7 @@ The National Center for Atmospheric Research (NCAR) is sponsored by NSF. The DTC
    appendixE
    appendixF
    appendixG
+   appendixH
 
 .. only:: html
 

diff --git a/docs/Users_Guide/point-stat.rst b/docs/Users_Guide/point-stat.rst
@@ -277,6 +277,7 @@ The usage statement for the Point-Stat tool is shown below:
          fcst_file
          obs_file
          config_file
+         [-ugrid_config config_file]
          [-point_obs file]
          [-obs_valid_beg time]
          [-obs_valid_end time]
@@ -298,17 +299,19 @@ Required Arguments for point_stat
 Optional Arguments for point_stat
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-4. The **-point_obs** file may be used to pass additional NetCDF point observation files to be used in the verification. Python embedding for point observations is also supported, as described in :numref:`pyembed-point-obs-data`.
+5. The **-ugrid_config** option provides a way for a user to provide a separate config file with metadata about their UGRID.
 
-5. The **-obs_valid_beg** time option in YYYYMMDD[_HH[MMSS]] format sets the beginning of the observation matching time window, overriding the configuration file setting.
+5. The **-point_obs** file may be used to pass additional NetCDF point observation files to be used in the verification. Python embedding for point observations is also supported, as described in :numref:`pyembed-point-obs-data`.
 
-6. The **-obs_valid_end** time option in YYYYMMDD[_HH[MMSS]] format sets the end of the observation matching time window, overriding the configuration file setting.
+6. The **-obs_valid_beg** time option in YYYYMMDD[_HH[MMSS]] format sets the beginning of the observation matching time window, overriding the configuration file setting.
 
-7. The **-outdir path** indicates the directory where output files should be written. 
+7. The **-obs_valid_end** time option in YYYYMMDD[_HH[MMSS]] format sets the end of the observation matching time window, overriding the configuration file setting.
 
-8. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file. 
+8. The **-outdir path** indicates the directory where output files should be written. 
 
-9. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity will increase the amount of logging.
+9. The **-log file** option directs output and errors to the specified log file. All messages will be written to that file as well as standard out and error. Thus, users can save the messages without having to redirect the output on the command line. The default behavior is no log file. 
+
+10. The **-v level** option indicates the desired level of verbosity. The value of "level" will override the default setting of 2. Setting the verbosity to 0 will make the tool run with no log messages, while increasing the verbosity will increase the amount of logging.
 
 An example of the point_stat calling sequence is shown below:
 

diff --git a/docs/Users_Guide/refs.rst b/docs/Users_Guide/refs.rst
@@ -10,6 +10,14 @@ References
 |   Atlantic Basin. *Weather and Forecasting*,  13, 1005-1015.
 |
 
+.. _Adams-2019:
+
+| Adams, S. V., R. W. Ford, M. Hambley, J.M. Hobson, I. Kavčič, C. M. Maynard,
+|   T. Melvin, E. H. Müller, S. Mullerworth, A. R. Porter, M. Rezny, B. J. Shipway,
+|   and R. Wong, 2019: LFRic: Meeting the challenges of scalability and performance
+|   portability in Weather and Climate models. *Journal of Parallel and Distributed Computing*,
+|   **132**, 383-396, doi: https://doi.org/10.1016/j.jpdc.2019.02.007.
+
 .. _Ahijevych-2009:
 
 | Ahijevych, D., E. Gilleland, B.G. Brown, and E.E. Ebert, 2009: Application of
@@ -333,6 +341,14 @@ References
 |   and Recommendations. *Monthly Weather Review*, 145, 3397-3418.
 |
 
+.. _Skamarock-2012:
+
+| Skamarock, W. C., J. B. Klemp, M. G. Duda, L. D. Fowler, S. Park, and 
+|   T. Ringler, 2012: A Multiscale Nonhydrostatic Atmospheric Model Using
+|   Centroidal Voronoi Tesselations and C-Grid Staggering. *Mon. Wea. Rev.*,
+|   **140**, 3090-3105, doi: https://doi.org/10.1175/MWR-D-11-00215.1.
+|
+
 .. _Stephenson-2000:
 
 | Stephenson, D.B., 2000: Use of the "Odds Ratio" for diagnosing

diff --git a/internal/test_unit/xml/unit_python.xml b/internal/test_unit/xml/unit_python.xml
@@ -557,6 +557,20 @@
     </output>
   </test>
 
+  <test name="python_plot_point_obs_met_nc_to_pandas">
+    <exec>&MET_BIN;/plot_point_obs</exec>
+    <param> \
+      'PYTHON_NUMPY=&MET_BASE;/python/examples/read_met_point_obs_pandas.py &OUTPUT_DIR;/pb2nc/ndas.20120409.t12z.prepbufr.tm00.nc' \
+      &OUTPUT_DIR;/python/ndas.20120409.t12z.prepbufr.tm00.nr_met_nc_to_pandas.ps \
+      -data_file &DATA_DIR_MODEL;/grib2/nam/nam_2012040900_F012.grib2 \
+      -dotsize 2.0 \
+      -v 1
+    </param>
+    <output>
+      <ps>&OUTPUT_DIR;/python/ndas.20120409.t12z.prepbufr.tm00.nr_met_nc_to_pandas.ps</ps>
+    </output>
+  </test>
+
   <test name="python_ensemble_stat_FILE_LIST">
     <exec>echo "&DATA_DIR_MODEL;/grib1/arw-fer-gep1/arw-fer-gep1_2012040912_F024.grib \
                 &DATA_DIR_MODEL;/grib1/arw-fer-gep5/arw-fer-gep5_2012040912_F024.grib \

diff --git a/scripts/python/examples/Makefile.am b/scripts/python/examples/Makefile.am
@@ -32,7 +32,8 @@ pythonexamples_DATA = \
 	read_ascii_numpy.py \
 	read_ascii_point.py \
 	read_ascii_xarray.py \
-	read_met_point_obs.py
+	read_met_point_obs.py \
+	read_met_point_obs_pandas.py
 
 EXTRA_DIST = ${pythonexamples_DATA}
 

diff --git a/scripts/python/examples/Makefile.in b/scripts/python/examples/Makefile.in
@@ -317,7 +317,8 @@ pythonexamples_DATA = \
 	read_ascii_numpy.py \
 	read_ascii_point.py \
 	read_ascii_xarray.py \
-	read_met_point_obs.py
+	read_met_point_obs.py \
+	read_met_point_obs_pandas.py
 
 EXTRA_DIST = ${pythonexamples_DATA}
 MAINTAINERCLEANFILES = Makefile.in

diff --git a/scripts/python/examples/read_met_point_obs_pandas.py b/scripts/python/examples/read_met_point_obs_pandas.py
@@ -0,0 +1,52 @@
+import os
+import sys
+
+from met.point_nc import nc_point_obs
+
+# Description: Reads a point observation NetCDF file created by MET and passes
+# the data to another MET tool via Python Embedding. This script can be copied
+# to perform modifications to the data before it is passed to MET.
+# Example: plot_point_obs "PYTHON_NUMPY=pyembed_met_point_nc.py in.nc" out.ps
+# Contact: George McCabe <mccabe@ucar.edu>
+
+# Read and format the input 11-column observations:
+#   (1)  string:  Message_Type
+#   (2)  string:  Station_ID
+#   (3)  string:  Valid_Time(YYYYMMDD_HHMMSS)
+#   (4)  numeric: Lat(Deg North)
+#   (5)  numeric: Lon(Deg East)
+#   (6)  numeric: Elevation(msl)
+#   (7)  string:  Var_Name(or GRIB_Code)
+#   (8)  numeric: Level
+#   (9)  numeric: Height(msl or agl)
+#   (10) string:  QC_String
+#   (11) numeric: Observation_Value
+
+print(f"Python Script:\t{sys.argv[0]}")
+
+if len(sys.argv) != 2:
+    print("ERROR: pyembed_met_point_nc.py -> Specify only 1 input file")
+    sys.exit(1)
+
+# Read the input file as the first argument
+input_file = os.path.expandvars(sys.argv[1])
+print("Input File:\t" + repr(input_file))
+
+# Read MET point observation NetCDF file
+try:
+    point_obs = nc_point_obs(input_file)
+except TypeError:
+    print(f"ERROR: Could not read MET point data file {input_file}")
+    sys.exit(1)
+
+# convert point observation data to a pandas DataFrame
+df = point_obs.to_pandas()
+
+##################################################
+# perform any modifications to the data here     #
+##################################################
+
+# convert pandas DataFrame to list format that is expected by MET
+point_data = df.values.tolist()
+print(f"     point_data: Data Length:\t{len(point_data)}")
+print(f"     point_data: Data Type:\t{type(point_data)}")
diff --git a/scripts/python/met/point.py b/scripts/python/met/point.py
@@ -185,7 +185,7 @@ def check_point_data(self):
    #   return met_point_tools.convert_to_ndarray(value_list)
 
    def dump(self):
-      met_base_point.print_point_data(self.get_point_data())
+      met_point_tools.print_point_data(self.get_point_data())
 
    def get_count_string(self):
       return f' nobs={self.nobs} nhdr={self.nhdr} ntyp={self.nhdr_typ} nsid={self.nhdr_sid} nvld={self.nhdr_vld} nqty={self.nobs_qty} nvar={self.nobs_var}'