# Climate Data Utilitites

The `cdutil` module provides tools that are more specific to the Earth Sciences

for convenience the `averager` from `genutil` is linked back into `cdutil`

# Time operation

Some of the most useful tools are linked to the `times` submodule

## Time Bounds

Most of our tools rely on the bounds, hence having correct time bounds is fundamental

Sometimes when dealing with edge dataset one might need to reset time bounds.

A few utilities are provided as part of `cdutil`

In [1]:
import cdms2
import cdutil
my_time_axis = cdms2.createAxis([ 15.5,  45. ,  74.5, 105. , 135.5, 166. , 196.5, 227.5, 258. , 288.5, 319. , 349.5])
my_time_axis.units = "days since 2019"
my_time_axis.designateTime()
my_time_axis.id = "time"
print(my_time_axis.getBounds())

None


In [2]:
cdutil.setAxisTimeBoundsMonthly(my_time_axis)
my_time_axis.getBounds()

array([[  0.,  31.],
       [ 31.,  59.],
       [ 59.,  90.],
       [ 90., 120.],
       [120., 151.],
       [151., 181.],
       [181., 212.],
       [212., 243.],
       [243., 273.],
       [273., 304.],
       [304., 334.],
       [334., 365.]])

This sets the bounds at the begining and end of the cell's current month, respecting the calendar

In [3]:
import cdtime
my_time_axis.setCalendar(cdtime.Calendar360)
cdutil.setAxisTimeBoundsMonthly(my_time_axis)
my_time_axis.getBounds()

array([[  0.,  30.],
       [ 30.,  60.],
       [ 60.,  90.],
       [ 90., 120.],
       [120., 150.],
       [150., 180.],
       [180., 210.],
       [210., 240.],
       [240., 270.],
       [270., 300.],
       [300., 330.],
       [330., 360.]])

In [4]:
my_time_axis.setCalendar(cdtime.DefaultCalendar)
cdutil.setAxisTimeBoundsMonthly(my_time_axis)

Some models store the cell value at the end of the month to reflect they are cumulative, for example monthly value for January would be stored on Feb first.

In [5]:
my_time_axis = cdms2.createAxis([ 31.,  59.,  90., 120., 151., 181., 212., 243., 273., 304., 334., 365.])
my_time_axis.units = 'days since 2019'
my_time_axis.id = 'time'
my_time_axis.designateTime()
print(my_time_axis.asComponentTime())

[2019-2-1 0:0:0.0, 2019-3-1 0:0:0.0, 2019-4-1 0:0:0.0, 2019-5-1 0:0:0.0, 2019-6-1 0:0:0.0, 2019-7-1 0:0:0.0, 2019-8-1 0:0:0.0, 2019-9-1 0:0:0.0, 2019-10-1 0:0:0.0, 2019-11-1 0:0:0.0, 2019-12-1 0:0:0.0, 2020-1-1 0:0:0.0]


This would confuse our function

In [6]:
cdutil.setAxisTimeBoundsMonthly(my_time_axis)
my_time_axis.getBounds()

array([[ 31.,  59.],
       [ 59.,  90.],
       [ 90., 120.],
       [120., 151.],
       [151., 181.],
       [181., 212.],
       [212., 243.],
       [243., 273.],
       [273., 304.],
       [304., 334.],
       [334., 365.],
       [365., 396.]])

Because this happens frequently, we have an option for this

In [7]:
cdutil.setAxisTimeBoundsMonthly(my_time_axis, stored=1)
my_time_axis.getBounds()

array([[  0.,  31.],
       [ 31.,  59.],
       [ 59.,  90.],
       [ 90., 120.],
       [120., 151.],
       [151., 181.],
       [181., 212.],
       [212., 243.],
       [243., 273.],
       [273., 304.],
       [304., 334.],
       [334., 365.]])

Other convenience functions are available: `setTimeBoundsYearly` and `setTimeBoundsDaily`.

`setTimeBoundsDaily` takes a frequency argument that allows to define sub-daily frequency:

e.g. for twice-daily data use frequency=2
     for 6 hourly data use frequency=4
     for   hourly data use frequency=24


In [8]:
my_time_axis = cdms2.createAxis(list(range(0, 96, 6)))
my_time_axis.units = 'hours since 2019'
my_time_axis.id = 'time'
my_time_axis.designateTime()
print(my_time_axis.asComponentTime())

[2019-1-1 0:0:0.0, 2019-1-1 6:0:0.0, 2019-1-1 12:0:0.0, 2019-1-1 18:0:0.0, 2019-1-2 0:0:0.0, 2019-1-2 6:0:0.0, 2019-1-2 12:0:0.0, 2019-1-2 18:0:0.0, 2019-1-3 0:0:0.0, 2019-1-3 6:0:0.0, 2019-1-3 12:0:0.0, 2019-1-3 18:0:0.0, 2019-1-4 0:0:0.0, 2019-1-4 6:0:0.0, 2019-1-4 12:0:0.0, 2019-1-4 18:0:0.0]


In [9]:
cdutil.setTimeBoundsDaily(my_time_axis, frequency=4)
my_time_axis.getBounds()[:3]

array([[ 0.,  6.],
       [ 6., 12.],
       [12., 18.]])

## Temporal Averaging


### Predefined time averaging functions



Averaging over time is a special problem in climate data analysis. The cdutil package pays special attention to this issue to make the extraction of time averages and climatologies simple. Apart from functions that enable easy computation of annual, seasonal and monthly averages and climatologies, one can also define seasons other than those already available and specify criteria for data availability and temporal distribution to suit individual needs.

Note: It is essential that the data have an appropriate axis designated as the “time” axis. In addition to this, the results depend on the time axis having correctly set bounds. If bounds are not stored with the data in files, default bounds are generated by the data extraction steps in cdms. However, they are not always correct. The user must take care to verify that the bounds are set correctly. Since the default time bounds set by cdms puts the time point in the middle of the month, (for example time axis values of 0, 1,…. would put the bounds at [-0.5, 0.5], [0.5, 1.5]…. etc.), the user can make use of the setTimeBoundsMonthly function. To use this method to set the bounds for monthly data:

In [10]:
ipsl_ts_file = cdms2.open("/global/cscratch1/sd/cmip6/CMIP6/CMIP/IPSL/IPSL-CM6A-LR/historical/r1i1p1f1/Amon/ts/gr/v20180803/ts_Amon_IPSL-CM6A-LR_historical_r1i1p1f1_gr_185001-201412.nc")
ts = ipsl_ts_file('ts', time=("2000", "2010", "con") ) # 10 years
print(ts.shape)

(120, 143, 144)


The predefined time averaging periods are:

    JAN, FEB, MAR, …., DEC (months)
    DJF, MAM, JJA, SON (seasons)
    YEAR (annual means)
    ANNUALCYCLE (monthly means for each month of the year)
    SEASONNALCYCLE (means for the 4 predefined seasons)

Some simple examples of time averaging operations are shown here.

In [18]:
# The individual DJF (December-January-February)
# seasons are extracted using
djfs = cdutil.DJF(ts)
print(djfs.shape)  #11 because we have a djf at the beg and one at the end of the 10 year period
# To compute the DJF climatology of a variable x
djfclim = cdutil.DJF.climatology(ts)
print(djfclim.shape)
# To extract DJF seasonal anomalies (from climatology)
djf_anom = cdutil.DJF.departures(ts)
print(djf_anom.shape)
# The monthly anomalies for x are computed by:
ac_anom = cdutil.ANNUALCYCLE.departures(ts)
print(ac_anom.shape, ts.max(),ac_anom.max())
print(djfs[0].max(), djfs[-1].max())

(11, 143, 144)
(1, 143, 144)
(11, 143, 144)
(120, 143, 144) 317.08307 15.543907165527344
310.1746653238932 311.6000671386719


### Creating Custom Seasons

[Back to Top](#top)

You can even create your own “custom seasons” beyond the pre-defined seasons listed above. For example:

In [12]:
JJAS = cdutil.times.Seasons('JJAS')

### Specifying time periods for climatologies.
[Back to Top](#top)

So far we have seen the way to compute the means, climatologies, and anomalies for the entire length of the time-series. The typical application may require specified time intervals over which climatologies are computed and used in calculating departures. For example, to compute the DJF climatology for the time period 1979-1988 we would do the following:



In [13]:
import cdtime
start_time = cdtime.comptime(2002)
print('start_time = ', start_time)
end_time = cdtime.comptime(2008)
print('end_time = ', end_time)

start_time =  2002-1-1 0:0:0.0
end_time =  2008-1-1 0:0:0.0


Note that we created the time point end_time at the begining of 1989 so we can select all the time between start_time and end_time but not including end_time by specifying the option co - shorthand for closed at start_time and open at end_time. More options and details about them can be found in the Climate Data Management System Manual.

In [14]:
djfclim = cdutil.DJF.climatology(ts(time=(start_time, end_time, 'con')))

Now that we have our climatology over the desired period we can to compute anomalies over the full period relative to that climatology.



In [15]:
djfdep2 = cdutil.DJF.departures(ts, ref=djfclim)

### Specifying Data Coverage Criteria

[Back to Top](#top)

The real power of these functions is in the ability to specify minimum data coverage and to also be able to specify the distribution (both in the temporal sense) which are required for the averages to be computed. The default behaviour of the functions that compute seasonal averages, climatologies, etc. is to require that any data be present. Now let’s say you like to extract DJF but without restricting it to 50% of the data being present. You would do:

In [21]:
djfs = cdutil.DJF(ts, criteriaarg=[.5, None])
print(djfs[0].max(), djfs[-1].max())

310.1746653238932 --


The above statement comutes the DJF average with criteriaarg (passed as a list) which has 2 arguments.

    The first argument represents the minimum fraction of time that is required to compute the seasonal mean. So you can pass a fractional value between 0.0 and 1.0 (including both extremes) or even a representation such as 3.0/4.0 (in case you need at least 3 out of 4 months of data in the case of the average JJAS we defined previously).
    The second argument in the criteriaarg is None. This implies no “centroid function” is used. In other cases, this argument represents the maximum value of the “centroid function”. A value between 0 and 1 represents the spread of values across the mean time. The centroid value of 0.0 represents a full even distribution of data across the time interval. For example, if you are considering the DJF average, then if data is available for Dec, Jan and Feb months then the centroid is 0.0. On the other hand, the following criteria will “mask” (ignore) a DJF season if there is only a December month with data (and therefore has a centroid value of 1.0). Therefore any seasons resulting in centroid values above 0.5 will result in missing values!


In [22]:
djfs = cdutil.DJF(ts, criteriaarg = [.5, .5])

In the case of computing an annual mean, having data only in Jan and Dec months leads to a centroid value of 0 for the regular centroid, and the resulting annual mean for the year is biased toward the winter. In this situation, you should use a cyclical centroid where the circular nature of the year is recognised and the centroid is calculated accordingly. Here are some examples of typical usage:

1. Default behaviour i.e criteriaarg=[0., None]

In [23]:
annavg_1 = cdutil.YEAR(ts)

2. Criteria to say compute annual average for half of months.

In [24]:
annavg_2 = cdutil.YEAR(ts, criteriaarg = [0.5,None])

3. Criteria for computing annual average based on the minimum number of months (8 out of 12).

In [25]:
annavg_3 = cdutil.YEAR(ts, criteriaarg = [8./12., None])

4. Same criteria as in 3, but we account for the fact that a year is cyclical i.e Dec and Jan are adjacent months. So the centroid is computed over a circle where Dec and Jan are contiguous.

In [26]:
annavg_4 = cdutil.YEAR(ts, criteriaarg = [8./12., 0.1, 'cyclical'])