Still need to figure out:
* Running mean

In [45]:
import numpy
import pandas
import xray; print(xray.__version__)

0.5.0


### Reading data

In [46]:
#infile = '/Users/damienirving/Downloads/Data/va_ERAInterim_500hPa_2006-030day-runmean_native.nc'
infile = '/Users/damienirving/Downloads/Data/sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc'
dset = xray.open_dataset(infile)

In [47]:
dset

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480, time: 730)
Coordinates:
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * time       (time) datetime64[ns] 1993-01-01T18:00:00 1993-01-02T18:00:00 ...
Data variables:
    sic        (time, latitude, longitude) float64 nan nan nan nan nan nan ...
Attributes:
    CDI: Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)
    Conventions: CF-1.0
    history: Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc
Thu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc
Thu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim

In [48]:
data = dset[['sic']]
type(data)

xray.core.dataset.Dataset

In [49]:
data

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480, time: 730)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * time       (time) datetime64[ns] 1993-01-01T18:00:00 1993-01-02T18:00:00 ...
Data variables:
    sic        (time, latitude, longitude) float64 nan nan nan nan nan nan ...
Attributes:
    CDI: Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)
    Conventions: CF-1.0
    history: Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc
Thu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc
Thu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim

In [50]:
dset['sic'].attrs

OrderedDict([(u'standard_name', u'sea_ice_fraction'), (u'long_name', u'sea_ice_fraction'), (u'units', u'(0 - 1)')])

In [51]:
# Get the order of the dimensions
data['sic'].dims

(u'time', u'latitude', u'longitude')

#### Missing values

[This page](http://xray.readthedocs.org/en/stable/io.html) suggests that xray recognises the `_FillValue` attribute.

However, once you've read the data in the `_FillValue` attribute disappears. When you write the output file there is no `_FillValue` attribute and any missing values appear as `NaNf` when you ncdump. This doesn't appear to be a problem for cdo.

In [52]:
missing_data = data.sel(longitude=3.75, latitude=-81)['sic'].values
numpy.isnan(missing_data[0])

True

### Climatologies

You can group by `month`, `day`, `dayofyear`, `dayofweek` and whatever else belongs to the [`pandas.DatetimeIndex` class](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DatetimeIndex.html).

##### Monthly

In [53]:
mon_clim = data.groupby('time.month').mean('time')

In [54]:
mon_clim

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480, month: 12)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * month      (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
    sic        (month, latitude, longitude) float64 nan nan nan nan nan nan ...

#### Daily

(Feb 28 is day 59. In a leap year, Feb 29 is day 60.)

In [55]:
day_clim = data.groupby('time.dayofyear').mean('time')
day_clim

<xray.Dataset>
Dimensions:    (dayofyear: 365, latitude: 81, longitude: 480)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
Data variables:
    sic        (dayofyear, latitude, longitude) float64 nan nan nan nan nan ...

#### Daily anomaly 

In [56]:
dummy_dset = xray.Dataset({'foo': ('time', numpy.random.randn(4000)),
                           'time': pandas.date_range('2000-01-01', periods=4000)})
clim = dummy_dset.groupby('time.dayofyear').mean()
anom = dummy_dset.groupby('time.dayofyear') - clim

In [57]:
dummy_dset

<xray.Dataset>
Dimensions:  (time: 4000)
Coordinates:
  * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
Data variables:
    foo      (time) float64 -0.5174 1.542 -1.078 0.7001 -0.7263 -0.6313 ...

In [58]:
clim

<xray.Dataset>
Dimensions:    (dayofyear: 366)
Coordinates:
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ...
Data variables:
    foo        (dayofyear) float64 -0.3199 0.3197 -0.333 -0.1106 -0.2096 ...

### Selection

It's possible to select a single dimension value or a slice of values:

In [60]:
data.sel(time=('1993-03-25'))

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480, time: 1)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * time       (time) datetime64[ns] 1993-03-25T18:00:00
Data variables:
    sic        (time, latitude, longitude) float64 nan nan nan nan nan nan ...
Attributes:
    CDI: Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)
    Conventions: CF-1.0
    history: Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc
Thu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc
Thu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surfa

In [61]:
data.sel(time=slice('1993-03-25', '1993-04-25'))

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480, time: 32)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * time       (time) datetime64[ns] 1993-03-25T18:00:00 1993-03-26T18:00:00 ...
Data variables:
    sic        (time, latitude, longitude) float64 nan nan nan nan nan nan ...
Attributes:
    CDI: Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)
    Conventions: CF-1.0
    history: Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc
Thu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc
Thu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim/

For an arbitrary list of dimension values, you need to provide the exact date/time value (for single dates xray/pandas can guess what you meant).

In [62]:
data['latitude'][0:20]

<xray.DataArray 'latitude' (latitude: 20)>
array([-90.  , -89.25, -88.5 , -87.75, -87.  , -86.25, -85.5 , -84.75,
       -84.  , -83.25, -82.5 , -81.75, -81.  , -80.25, -79.5 , -78.75,
       -78.  , -77.25, -76.5 , -75.75], dtype=float32)
Coordinates:
  * latitude  (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
Attributes:
    standard_name: latitude
    long_name: latitude
    units: degrees_north
    axis: Y

In [63]:
data.sel(latitude=[-89.25, -81])

<xray.Dataset>
Dimensions:    (latitude: 2, longitude: 480, time: 730)
Coordinates:
  * latitude   (latitude) float32 -89.25 -81.0
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * time       (time) datetime64[ns] 1993-01-01T18:00:00 1993-01-02T18:00:00 ...
Data variables:
    sic        (time, latitude, longitude) float64 nan nan nan nan nan nan ...
Attributes:
    CDI: Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)
    Conventions: CF-1.0
    history: Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc
Thu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc
Thu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_da

In [64]:
data['time'][0:10]

<xray.DataArray 'time' (time: 10)>
array(['1993-01-02T05:00:00.000000000+1100',
       '1993-01-03T05:00:00.000000000+1100',
       '1993-01-04T05:00:00.000000000+1100',
       '1993-01-05T05:00:00.000000000+1100',
       '1993-01-06T05:00:00.000000000+1100',
       '1993-01-07T05:00:00.000000000+1100',
       '1993-01-08T05:00:00.000000000+1100',
       '1993-01-09T05:00:00.000000000+1100',
       '1993-01-10T05:00:00.000000000+1100',
       '1993-01-11T05:00:00.000000000+1100'], dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1993-01-01T18:00:00 1993-01-02T18:00:00 ...
Attributes:
    standard_name: time

In [65]:
data.sel(time=[numpy.datetime64('1993-01-04T05:00:00.000000000+1100'),
               numpy.datetime64('1993-01-10T05:00:00.000000000+1100')])

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480, time: 2)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
  * time       (time) datetime64[ns] 1993-01-03T18:00:00 1993-01-09T18:00:00
Data variables:
    sic        (time, latitude, longitude) float64 nan nan nan nan nan nan ...
Attributes:
    CDI: Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)
    Conventions: CF-1.0
    history: Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc
Thu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc
Thu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/

#### Maths / stats

#### Basic stats along a dimension

In [73]:
outdata = data.mean(dim='time')

#### Running mean

I need to figure out how to apply the pandas [moving (rolling) statistics](http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments) to an `xray.DataArray`. [This page](http://xray.readthedocs.org/en/stable/pandas.html) talks about how I might do that.

### Writing to file

In [74]:
outdata

<xray.Dataset>
Dimensions:    (latitude: 81, longitude: 480)
Coordinates:
  * latitude   (latitude) float32 -90.0 -89.25 -88.5 -87.75 -87.0 -86.25 ...
  * longitude  (longitude) float32 0.0 0.75 1.5 2.25 3.0 3.75 4.5 5.25 6.0 ...
Data variables:
    sic        (latitude, longitude) float64 nan nan nan nan nan nan nan nan ...

In [75]:
outdata.attrs = dset.attrs

In [76]:
outdata.attrs

OrderedDict([(u'CDI', u'Climate Data Interface version 1.5.3 (http://code.zmaw.de/projects/cdi)'), (u'Conventions', u'CF-1.0'), (u'history', u'Mon Jun 22 16:13:38 2015: cdo seldate,1993-01-01,1994-12-31 -sellonlatbox,0,359.9,-90,-30 sic_ERAInterim_surface_daily_native.nc sic_ERAInterim_surface_1993-1994-daily_native-shextropics30.nc\nThu Mar 19 13:05:10 2015: ncatted -O -a axis,time,c,c,T /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc\nThu Mar 19 11:42:24 2015: ncatted -O -a missing_value,sic,o,f,-32767. /mnt/meteo0/data/simmonds/dbirving/ERAInterim/data/sic_ERAInterim_surface_daily_native.nc\nWed Mar 18 17:08:32 2015: cdo mergetime sic_ERAInterim_surface_daily-2014-11-01-to-2014-12-31_native.nc ../data/sic_ERAInterim_surface_daily_native_orig.nc ../data/sic_ERAInterim_surface_daily_native.nc\nMon Jan  5 18:29:47 2015: ncatted -O -a long_name,sic,o,c,sea_ice_fraction sic_ERAInterim_surface_daily_native.nc\nMon Jan  5 18:28:45 2015: ncatted -O 

In [77]:
outdata['sic'].attrs = dset['sic'].attrs

In [78]:
outdata['sic'].attrs

OrderedDict([(u'standard_name', u'sea_ice_fraction'), (u'long_name', u'sea_ice_fraction'), (u'units', u'(0 - 1)')])

In [79]:
outdata.to_netcdf('test_xray.nc', format='NETCDF3_CLASSIC')