Skip to content

Commit

Permalink
Merge remote-tracking branch 'remotes/origin/devel'
Browse files Browse the repository at this point in the history
  • Loading branch information
matt-kendall committed Mar 6, 2015
2 parents 29019b4 + e840e95 commit 5f45283
Show file tree
Hide file tree
Showing 26 changed files with 656 additions and 255 deletions.
121 changes: 86 additions & 35 deletions doc/colocation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
Co-location
===========

One of the key features of the Community Intercomparison Suite (CIS) is the ability to co-locate one or more arbitrary data sets onto a common set of coordinates. This page briefly describes how to perform co-location in a number of scenarios.
One of the key features of the Community Intercomparison Suite (CIS) is the ability to co-locate one or
more arbitrary data sets onto a common set of coordinates. This page briefly describes how to perform co-location
in a number of scenarios.

To perform co-location, run a command of the format::

Expand All @@ -24,50 +26,68 @@ where:

``<samplegroup>``
is of the format ``<filename>[:<options>]`` The available options are described in more detail below. They are entered
in a comma separated list, such as ``variable=Temperature,colocator=bin,kernel=mean``. Not all combinations of
colocator and kernel options can be used - see the descriptions below.
in a comma separated list, such as ``variable=Temperature,colocator=bin,kernel=mean``. Not all combinations of
colocator and data are available; see :ref:`Available Colocators <available>`.

* ``<filename>`` is a single filename with the points to colocate onto.

* ``variable`` is an optional argument used to specify which variable's coordinates to use for colocation. If a variable is specified, a missing value will be set in the output file at every point for which the sample variable has a missing value. If a variable is not specified, non-missing values will be set at all sample points unless colocation at a point does not result in a valid value.

* ``colocator`` is a mandatory argument that specifies the colocation method. Parameters for the colocator, if any, are placed in square brackets after the colocator name, for example, ``colocator=box[fill_value=-999,h_sep=1km]``. The colocators available are:

* ``bin`` For use only with gridded sample points. Data points are placed in bins corresponding to the cell bounds surrounding each grid point. The bounds are taken from the gridded data if they are defined, otherwise the mid-points between grid points are used. The binned points should then be processed by one of the kernels to give a numeric value for each bin.

* ``box`` For use with gridded and ungridded sample points and data. A search region is defined by the parameters and points within the defined separation of each sample point are associated with the point. The points should then be processed by one of the kernels to give a numeric value for each bin. The parameters defining the search box are:

* h_sep - the horizontal separation. The units can be specified as km or m (for example ``h_sep=1.5km``); if none are specified then the default is km.
* a_sep - the altitude separation. The units can be specified as km or m, as for h_sep; if none are specified then the default is m.
* p_sep - the pressure separation. This is not an absolute separation as for h_sep and a_sep, but a relative one, so is specified as a ratio. For example a constraint of p_sep = 2, for a point at 10 hPa, would cover the range 5 hPa < points < 20 hPa. Note that p_sep >= 1.
* t_sep - the time separation. This can be specified in years, months, days, hours, minutes or seconds using ``PnYnMnDTnHnMnS`` (the T separator can be replaced with a colon or a space, but if using a space quotes are required). For example to specify a time separation of one and a half months and thirty minutes you could use ``t_sep=P1M15DT30M``. It is worth noting that the units for time comparison are fractional days, so that years are converted to the number of days in a Gregorian year, and months are 1/12th of a Gregorian year.

If h_sep is specified, a k-d tree index based on longitudes and latitudes of data points is used to speed up the search for points. It h_sep is not specified, an exhaustive search is performed for points satisfying the other separation constraints.
* ``variable`` is an optional argument used to specify which variable's coordinates to use for colocation.
If a variable is specified, a missing value will be set in the output file at every point for which the sample
variable has a missing value. If a variable is not specified, non-missing values will be set at all sample points
unless colocation at a point does not result in a valid value.

* ``colocator`` is an optional argument that specifies the colocation method. Parameters for the colocator, if any,
are placed in square brackets after the colocator name, for example, ``colocator=box[fill_value=-999,h_sep=1km]``.
If not specified, a :ref:`Default Colocator <available>` is identified for your data / sample combination.
The colocators available are:

* ``bin`` For use only with ungridded data and gridded sample points. Data points are placed in bins corresponding
to the cell bounds surrounding each grid point. The bounds are taken from the gridded data if they are defined,
otherwise the mid-points between grid points are used. The binned points should then be processed by one of the
kernels to give a numeric value for each bin.

* ``box`` For use with gridded and ungridded sample points and data. A search region is defined by the parameters
and points within the defined separation of each sample point are associated with the point. The points should
then be processed by one of the kernels to give a numeric value for each bin. The parameters defining the search box are:

* ``h_sep`` - the horizontal separation. The units can be specified as km or m (for example ``h_sep=1.5km``); if
none are specified then the default is km.
* ``a_sep`` - the altitude separation. The units can be specified as km or m, as for h_sep; if none are specified
then the default is m.
* ``p_sep`` - the pressure separation. This is not an absolute separation as for h_sep and a_sep, but a relative
one, so is specified as a ratio. For example a constraint of p_sep = 2, for a point at 10 hPa, would cover the
range 5 hPa < points < 20 hPa. Note that p_sep >= 1.
* ``t_sep`` - the time separation. This can be specified in years, months, days, hours, minutes or seconds using
``PnYnMnDTnHnMnS`` (the T separator can be replaced with a colon or a space, but if using a space quotes are
required). For example to specify a time separation of one and a half months and thirty minutes you could use
``t_sep=P1M15DT30M``. It is worth noting that the units for time comparison are fractional days, so that
years are converted to the number of days in a Gregorian year, and months are 1/12th of a Gregorian year.

If ``h_sep`` is specified, a k-d tree index based on longitudes and latitudes of data points is used to speed up
the search for points. It h_sep is not specified, an exhaustive search is performed for points satisfying the
other separation constraints.

* ``lin`` For use with gridded source data only. A value is calculated by linear interpolation for each sample point.

* ``nn`` For use with gridded source data only. The data point closest to each sample point is found, and the data value is set at the sample point.
* ``nn`` For use with gridded source data only. The data point closest to each sample point is found, and the
data value is set at the sample point.

* ``dummy`` For use with ungridded data only. Returns the source data as the colocated data irrespective of the sample points. This might be useful if variables from the original sample file are wanted in the output file but are already on the correct sample points.
* ``dummy`` For use with ungridded data only. Returns the source data as the colocated data irrespective of the
sample points. This might be useful if variables from the original sample file are wanted in the output file but
are already on the correct sample points.

Colocators have the following general optional parameters, which can be used in addition to any specific ones listed above:

* fill_value - The numerical value to apply to the colocated point if there are no points which satisfy the constraint.
* var_name - Specifies the name of the variable in the resulting NetCDF file.
* var_long_name - Specifies the variable's long name.
* var_units - Specifies the variable's units.

.. warning:: When colocating two data sets with different spatio-temporal domains, the sampling points should be within the spatio-temporal domain of the source data. Otherwise, depending on the co-location options selected, strange artefacts can occur, particularly with linear interpolation. Spatio-temporal domains can be reducded in CIS with :ref:`aggregation` or :ref:`subsetting`.
* ``fill_value`` - The numerical value to apply to the colocated point if there are no points which satisfy the constraint.
* ``var_name`` - Specifies the name of the variable in the resulting NetCDF file.
* ``var_long_name`` - Specifies the variable's long name.
* ``var_units`` - Specifies the variable's units.

* ``kernel`` is used to specify the kernel to use for colocation methods that create an intermediate set of points for
further processing, that is box and bin. The default kernel for box and bin is *moments*. The built-in kernel
methods currently available are:

``kernel`` is used to specify the kernel to use for colocation methods that create an intermediate set of points for further processing, that is box and bin. Choosing a kernel is mandatory for the box and bin colocators, no defaults are provided. The built-in kernel methods currently available are:

* nn_t (or nn_time) - nearest neighbour in time algorithm
* nn_h (or nn_horizontal) - nearest neighbour in horizontal distance
* nn_a (or nn_altitude) - nearest neighbour in altitude
* nn_p (or nn_pressure) - nearest neighbour in pressure (as in a vertical coordinate). Note that similarly to the p_sep constraint that this works on the ratio of pressure, so the nearest neighbour to a point with a value of 10 hPa, out of a choice of 5 hPa and 19 hPa, would be 19 hPa, as 19/10 < 10/5.
* mean - an averaging kernel that returns the mean values of any points found by the colocation method
* moments - an averaging kernel that returns the mean, standard deviation and the number of points remaining after
* ``moments`` - **Default**. This is an averaging kernel that returns the mean, standard deviation and the number of points remaining after
the specified constraint has been applied. This can be used for gridded or ungridded sample points where the
colocator is one of 'bin' or 'box'. The names of the variables in the output file are the name of the input
variable with a suffix to identify which quantity they represent:
Expand All @@ -81,16 +101,47 @@ where:
* *Number of points* - suffix: ``_num_points`` - The number of data points mapped to that sample grid point
(data points with missing values are excluded)

* ``mean`` - an averaging kernel that returns the mean values of any points found by the colocation method
* ``nn_t`` (or ``nn_time``) - nearest neighbour in time algorithm
* ``nn_h`` (or ``nn_horizontal``) - nearest neighbour in horizontal distance
* ``nn_a`` (or ``nn_altitude``) - nearest neighbour in altitude
* ``nn_p`` (or ``nn_pressure``) - nearest neighbour in pressure (as in a vertical coordinate). Note that similarly to the
``p_sep`` constraint that this works on the ratio of pressure, so the nearest neighbour to a point with a value of
10 hPa, out of a choice of 5 hPa and 19 hPa, would be 19 hPa, as 19/10 < 10/5.

``product`` is an optional argument used to specify the type of files being read. If omitted, the program will attempt to determine which product to use based on the filename, as listed at :ref:`data-products-reading`.
* ``product`` is an optional argument used to specify the type of files being read. If omitted, the program will
attempt to determine which product to use based on the filename, as listed at :ref:`data-products-reading`.

``<outputfile>``
is an optional argument to specify the name to use for the file output. For ungridded data this is automatically given a ``.nc`` extension and prepended with ``cis-`` to make it distinguishable as a colocated file. For gridded this is only given the ``.nc`` extenstion.
is an optional argument specifying the file to output to. This will be automatically given a ``.nc`` extension if not
present and if the output is ungridded, will be prepended with ``cis-`` to identify it as a CIS output file. This must
not be the same file path as any of the input files. If not provided, the default output filename is *out.nc*

A full example would be::

$ cis col rain:"my_data_??.*" my_sample_file:colocator=box[h_sep=50km,t_sep=6000S],kernel=nn_t -o my_col

.. warning:: When colocating two data sets with different spatio-temporal domains, the sampling points should be
within the spatio-temporal domain of the source data. Otherwise, depending on the co-location options selected,
strange artefacts can occur, particularly with linear interpolation. Spatio-temporal domains can be reducded in
CIS with :ref:`Aggregation <aggregation>` or :ref:`Subsetting <subsetting>`.


.. _available:

Available Colocators and Kernels
================================

====================== ========================= ================= =================
Colocation type
( data -> sample) Available Colocators Default Colocator Default Kernel
====================== ========================= ================= =================
Gridded -> gridded ``lin``, ``nn``, ``box`` ``lin`` *None*
Ungridded -> gridded ``bin``, ``box`` ``bin`` ``moments``
Gridded -> ungridded ``nn``, ``lin`` ``nn`` *None*
Ungridded -> ungridded ``box`` ``box`` ``moments``
====================== ========================= ================= =================


Colocation output files
=======================
Expand Down
22 changes: 17 additions & 5 deletions doc/evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ using the 'eval' command. For example, you might want to interpolate a value bet

The evaluate syntax looks like this::

$ cis eval <datagroup>... <expr> [-o [<output_var>:]<outputfile>]
$ cis eval <datagroup>... <expr> <units> [-o [<output_var>:]<outputfile>] [--attributes <attributes>]

where square brackets denote optional commands and:

Expand Down Expand Up @@ -66,7 +66,7 @@ where square brackets denote optional commands and:

* ``<productname>`` is an optional CIS data product to use (see :ref:`Data Products <data-products-reading>`):

See :ref:`datagroups` for a more detailed explanation of datagroups.
See :ref:`datagroups` for a more detailed explanation of datagroups.

.. _expr:

Expand Down Expand Up @@ -114,6 +114,11 @@ See :ref:`datagroups` for a more detailed explanation of datagroups.
CIS eval command will flatten ungridded data so that structure present in the input files will be ignored. This
allows you to compare ungridded data with different shapes, e.g. (3,5) and (15,)

``<units>``
is a mandatory argument describing the units of the resulting expression. This should be a
`CF compliant <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch03.html#table-supported-units>`_
units string, e.g. ``"kg m^-3"``. Where this contains spaces, the whole string should be enclosed in quotes.

``<outputfile>``
is an optional argument specifying the file to output to. This will be automatically given a ``.nc`` extension if not
present and if the output is ungridded, will be prepended with ``cis-`` to identify it as a CIS output file. This must
Expand All @@ -123,6 +128,13 @@ See :ref:`datagroups` for a more detailed explanation of datagroups.
the output file, e.g. ``-o my_new_var:output_filename.nc``. If not provided, the default output variable name is
*calculated_variable*

``<attributes>``
is an optional argument allowing users to provide additional metadata to be included in the evaluation output variable.
This should be indicated by the attributes flag (``--attributes`` or ``-a``). The attributes should then follow in
comma-separated, key=value pairs, for example ``--attributes standard_name=convective_rainfall_amount,echam_version=6.1.00``.
Whitespace is permitted in both the names and the values, but then must be enclosed in quotes: ``-a "operating system = "AIX 6.1 Power6"``.
Colons or equals signs may not be used in attribute names or values.


Evaluation Examples
===================
Expand Down Expand Up @@ -158,7 +170,7 @@ We then linearly interpolate the HadGEM data onto the ECHAM grid::

Next we subtract the two fields using::

$ cis eval od550aer=a:echam-od550aer.nc od550=b:hadgem-od550aer-collocated.nc "a-b" -o modeldifference
$ cis eval od550aer=a:echam-od550aer.nc od550=b:hadgem-od550aer-collocated.nc "a-b" 1 -o modeldifference

Finally we plot the evaluated output::

Expand All @@ -178,7 +190,7 @@ The file agoufou.lev20 refers to ``/group_workspaces/jasmin/cis/data/aeronet/AOT

The AE is calculated using an eval statement::

$ cis eval AOT_440,AOT_870:agoufou.lev20 "(-1)* (numpy.log(AOT_870/AOT_440)/numpy.log(870./440.))" -o alfa
$ cis eval AOT_440,AOT_870:agoufou.lev20 "(-1)* (numpy.log(AOT_870/AOT_440)/numpy.log(870./440.))" 1 -o alfa

Plotting it shows the expected correlation::

Expand Down Expand Up @@ -238,7 +250,7 @@ following two plots::
First we perform an evaluation using the `numpy.masked_where <http://docs.scipy.org/doc/numpy/reference/generated/numpy.ma.masked_where.html#numpy.ma.masked_where>`_
method to produce an optical depth variable that is masked at all points where the cloud cover is more than 20%::

$ cis eval Cloud_Fraction_Ocean=cloud,Optical_Depth_Land_And_Ocean=od:MOD04_L2.A2010001.2255.005.2010005215814.hdf "numpy.ma.masked_where(cloud > 0.2, od)" -o od:masked_optical_depth.nc
$ cis eval Cloud_Fraction_Ocean=cloud,Optical_Depth_Land_And_Ocean=od:MOD04_L2.A2010001.2255.005.2010005215814.hdf "numpy.ma.masked_where(cloud > 0.2, od)" 1 -o od:masked_optical_depth.nc
$ cis plot od:cis-masked_optical_depth.nc --xmin 132 --xmax 162 --ymin -70 --title Aerosol optical depth --cbarscale 0.5 --itemwidth 10 -o masked_optical_depth.png'

.. image:: img/eval/modis_masked_optical_depth.png
Expand Down
8 changes: 7 additions & 1 deletion doc/plotting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ This will attempt to locate the variable ``variable`` in all of the specified ``
``heatmap``
a heatmap especially suitable for gridded data

.. warning::
Basemap versions <= 1.0.7 have known issues when plotting heatmaps, particularly when using ``--xmin`` or ``--xmax``
options. Use a newer version if available, otherwise check your output for validity, especially around the meridians."


``contour``
a standard contour plot, see :ref:`contour options <contour-options>`

Expand Down Expand Up @@ -176,7 +181,8 @@ There are a number of plot formatting options available:
height of the plot in inches

``--cbarscale``
this can be used to change the size of the colourbar when plotting, use --cbarscale 0.5 for lat-lon plots of the entire Earth (this is a temporary fix)
this can be used to change the size of the colourbar when plotting and defaults to 0.55 for vertical colorbars, 1.0
for horizontal.

``--coastlinescolour``
The colour of the coastlines on a map, see :ref:`colours-and-markers`
Expand Down
4 changes: 2 additions & 2 deletions jasmin_cis/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

__author__ = "David Michel, Daniel Wallis, Duncan Watson-Parris, Richard Wilkinson, Ian Bush, Matt Kendall, John Holt"
__version__ = "0.8.2"
__status__ = "Phase 3, Sprint 9 Interim Release"
__version__ = "1.0.0"
__status__ = "Phase 3 final release"
__website__ = "http://proj.badc.rl.ac.uk/cedaservices/wiki/JASMIN/CommunityIntercomparisonSuite"

0 comments on commit 5f45283

Please sign in to comment.