Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature 342 write mpr file #357

Closed
wants to merge 14 commits into from
198 changes: 198 additions & 0 deletions docs/Users_Guide/aggregation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
***********
Aggregation
***********

Aggregation is an option that can be applied to MET stat output (in
the appropriate format) to calculate aggregation statistics and confidence intervals.
Input data must first be reformatted using the METdataio METreformat module to
label all the columns with the corresponding statistic name specified in the
`MET User's Guide <https://met.readthedocs.io/en/develop/Users_Guide/index.html>`_
for `point-stat <https://met.readthedocs.io/en/develop/Users_Guide/point-stat.html>`_,
`grid-stat <https://met.readthedocs.io/en/develop/Users_Guide/grid-stat.html>`_, or
`ensemble-stat <https://met.readthedocs.io/en/develop/Users_Guide/ensemble-stat.html>`_ .stat output data.

Python Requirements
===================

The third-party Python packages and the corresponding version numbers are found
in the requirements.txt and nco_requirements.txt files:

**For Non-NCO systems**:

* `requirements.txt <https://github.com/dtcenter/METcalcpy/blob/develop/requirements.txt>`_

**For NCO systems**:

* `nco_requirements.txt <https://github.com/dtcenter/METcalcpy/blob/develop/nco_requirements.txt>`_


Retrieve Code
=============

Refer to the `Installation Guide <https://metcalcpy.readthedocs.io/en/develop/Users_Guide/installation.html>`_
for instructions.


Retrieve Sample Data
====================

The sample data used for this example is located in the $METCALCPY_BASE/test directory,
where **$METCALCPY_BASE** is the full path to the location of the METcalcpy source code
(e.g. /User/my_dir/METcalcpy).
The example data file used for this example is **rrfs_ecnt_for_agg.data**.
This data was reformatted from the MET .stat output using the METdataio METreformat module.
The reformatting step labels the columns with the corresponding statistics, based on the MET tool (point-stat,
grid-stat, or ensemble-stat). The ECNT linetype of
the MET grid-stat output has been reformatted to include the statistics names for all
`ECNT <https://met.readthedocs.io/en/develop/Users_Guide/ensemble-stat.html#id2>`_ specific columns.


Input data **must** be in this format prior to using the aggregation
module, agg_stat.py.

The example data can be copied to a working directory, or left in this directory. The location
of the data will be specified in the YAML configuration file.

Please refer to the METdataio User's Guide for instructions for reformatting MET .stat files :
https://metdataio.readthedocs.io/en/develop/Users_Guide/reformat_stat_data.html


Aggregation
===========

The agg_stat module, **agg_stat.py** to is used to calculate aggregated statistics and confidence intervals.
This module can be run as a script at the command-line, or imported in another Python script.

A required YAML configuration file, **config_agg_stat.yaml** file is used to define the location of
input data and the name and location of the output file.

The agg_stat module support the ECNT linetype that are output from the MET
**ensemble-stat** tool

The input to the agg_stat module must have the appropriate format. The ECNT linetype must first be
`reformatted via the METdataio METreformat module <https://metdataio.readthedocs.io/en/develop/Users_Guide/reformat_stat_data.html>`_
by following the instructions under the **Reformatting for computing aggregation statistics with METcalcpy agg_stat**
header.

Modify the YAML configuration file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The config_agg_stat.yaml is required to perform aggregation statistics calculations. This
configuration file is located in the $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config
directory. The $METCALCPY_BASE is the directory where the METcalcpy source code is
saved (e.g. /Users/my_acct/METcalcpy). Change directory to $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config
and modify the config_agg_stat.yaml file.

1. Specify the input and output files

.. code-block:: yaml

agg_stat_input: /path-to/test/data/rrfs_ecnt_for_agg.data
agg_stat_output: /path-to/ecnt_aggregated.data

Replace the *path-to* in the above two settings to the location where the input data
was stored (either in a working directory or the $METCALCPY_BASE/test directory). **NOTE**:
Use the **full path** to the input and output directories (no environment variables).

2. Specify the meteorological and the stat variables:

.. code-block:: yaml

fcst_var_val_1:
TMP:
- ECNT_RMSE
- ECNT_SPREAD_PLUS_OERR

3. Specify the selected models/members:

.. code-block:: yaml

series_val_1:
model:
- RRFS_GEFS_GF.SPP.SPPT

4. Specify the selected statistics to be aggregated, in this case, the RMSE and SPREAD_PLUS_OERR
statistics from the ECNT ensemble-stat tool output are to be calculated. The aggregated statistics
are named ECNT_RMSE and ECNT_SPREAD_PLUS_OERR (append original statistic name with the linetype):

list_stat_1:
- ECNT_RMSE
- ECNT_SPREAD_PLUS_OERR

The full **config_agg_stat.yaml** file is shown below:


.. literalinclude:: ../../metcalcpy/pre_processing/aggregation/config/config_agg_stat.yaml



**NOTE**: Use full directory paths when specifying the location of the input file and output
file.


Set the Environment and PYTHONPATH
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

bash shell:

.. code-block:: ini

export METCALCPY_BASE=/path-to-METcalcpy

csh shell:

.. code-block:: ini

setenv METCALCPY_BASE /path-to-METcalcpy


where *path-to-METcalcpy* is the full path to where the METcalcpy source code is located
(e.g. /User/my_dir/METcalcpy)

bash shell:

.. code-block:: ini

export PYTHONPATH=$METCALCPY_BASE/:$METCALCPY_BASE/metcalcpy

csh shell

.. code-block:: ini

setenv PYTHONPATH $METCALCPY_BASE/:$METCALCPY_BASE/metcalcpy


Where $METCALCPY_BASE is the full path to where the METcalcpy code resides (e.g. /User/
my_dir/METcalcpy).

Run the python script:
^^^^^^^^^^^^^^^^^^^^^^

The following are instructions for performing aggregation from the command-line:

.. code-block:: yaml


python $METCALCPY_BASE/metcalcpy/agg_stat.py $METCALCPY_BASE/metcalcpy/pre_processing/aggregation/config/config_stat_agg.yaml


This will generate the file **ecnt_aggregated.data** (from the agg_stat_output setting) which now contains the
aggregated statistics data.


Additionally, the agg_stat.py module can be invoked by another script or module
by importing the package:

.. code-block:: ini

from metcalcpy.agg_stat import AggStat

AGG_STAT = AggStat(PARAMS)
AGG_STAT.calculate_stats_and_ci()

where PARAMS is a dictionary containing the parameters indicating the
location of input and output data. The structure is similar to the
original Rscript template from which this Python implementation was derived.

**NOTE**: Remember to use the same PYTHONPATH defined above to ensure that the agg_stat module is found by
the Python import process.
2 changes: 2 additions & 0 deletions docs/Users_Guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ National Center for Atmospheric Research (NCAR) is sponsored by NSF.
installation
vertical_interpolation
difficulty_index
aggregation
write_mpr
release-notes

**Indices and tables**
Expand Down
98 changes: 98 additions & 0 deletions docs/Users_Guide/write_mpr.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
**********************
Write MPR
**********************

Description
===========

This program writes data to an output file in MET’s Matched Pair (MPR) format. It
takes several inputs, which are described in the list below. The script will compute
the observation input and total number of observations. It will also check to see if
the output directory is present and will create that directory if it does not exist.

Example
=======

Examples for how to use this script can be found in the driver scripts of the use cases
listed below.

* `Stratosphere Polar <https://metplus.readthedocs.io/en/latest/generated/model_applications/s2s/UserScript_fcstGFS_obsERA_StratospherePolar.html#sphx-glr-generated-model-applications-s2s-userscript-fcstgfs-obsera-stratospherepolar-py>`_
* `Blocking <https://metplus.readthedocs.io/en/latest/generated/model_applications/s2s_mid_lat/UserScript_fcstGFS_obsERA_Blocking.html#sphx-glr-generated-model-applications-s2s-mid-lat-userscript-fcstgfs-obsera-blocking-py>`_
* `Weather Regime <https://metplus.readthedocs.io/en/latest/generated/model_applications/s2s_mid_lat/UserScript_fcstGFS_obsERA_WeatherRegime.html#sphx-glr-generated-model-applications-s2s-mid-lat-userscript-fcstgfs-obsera-weatherregime-py>`_

Information about Input Data
============================

At this time, all input arrays have to be one dimensional only and should be the same size.
The script does not make an attempt to check if input arrays are the same size. If any of
your input arrays are larger than the observation input array, the data will be chopped at
the length of the observation input. If an array is shorter than the observation input, the
program will error.

Currently, the the following variables cannot be set and will be output as NA: FCST_THRESH,
OBS_THRESH, COV_THRESH, ALPHA, OBS_QC, CLIMO_MEAN, CLIMO_STDEV, CLIMO_CDF. Additionally the
following variables also cannot be set and have default values: INTERP_MTHD = NEAREST,
INTERP_PNTS = 1, and OBTYPE = ADPUPA.

data_fcst: 1D array float
forecast data to write to MPR file
data_obs: 1D array float
observation data to write to MPR file
lats_in: 1D array float
data latitudes
lons_in: 1D array float
data longitudes
fcst_lead: 1D array string of format HHMMSS
forecast lead time
fcst_valid: 1D array string of format YYYYmmdd_HHMMSS
forecast valid time
obs_lead: 1D array string of format HHMMSS
observation lead time
obs_valid: 1D array string of format YYYYmmdd_HHMMSS
observation valid time
mod_name: string
output model name (the MODEL column in MET)
desc: string
output description (the DESC column in MET)
fcst_var: 1D array string
forecast variable name
fcst_unit: 1D array string
forecast variable units
fcst_lev: 1D array string
forecast variable level
obs_var: 1D array string
observation variable name
obs_unit: 1D array string
observation variable units
obs_lev: 1D array string
observation variable level
maskname: string
name of the verification masking region
obsslev: 1D array string
Pressure level of the observation in hPA or accumulation
interval in hours
outdir: string
Full path including where the output data should go
outfile_prefix: string
Prefix to use for the output filename. The time stamp will
be added in MET's format based off the first forecast time


Run from a python script
=========================

* Make sure you have these required Python packages:

* Python 3.7

* metcalcpy

* numpy

* os

.. code-block:: ini

write_mpr_file(data_fcst,data_obs,lats_in,lons_in,fcst_lead,fcst_valid,obs_lead,obs_valid,mod_name,desc,fcst_var,fcst_unit,fcst_lev,obs_var,obs_unit,obs_lev,maskname,obsslev,outdir,outfile_prefix)

The output fill be a .stat file located in outdir with data in `MET's Matched Pair Format <https://met.readthedocs.io/en/latest/Users_Guide/point-stat.html#id24>`_. The file will be labeled with outfile_prefix and then have lead time, valid YYYYMMDD, and valid HHMMSS stamped onto the file name.