# Subsetting a dataset by time and space (Slicing and Dicing)

xarray creates labelled coordinate indexes for CF compliant data. This makes selecting subsets of the data in time and space straightforward.

This tutorial covers some of the common usage patterns for subsetting data.

Reload the library and the dataset from the previous notebook

In [1]:
import xarray

In [2]:
ds = xarray.open_dataset('http://dapds00.nci.org.au/thredds/dodsC/rr3/CMIP5/output1/CSIRO-BOM/ACCESS1-3/historical/mon/atmos/Amon/r1i1p1/latest/tas/tas_Amon_ACCESS1-3_historical_r1i1p1_185001-200512.nc')

Select just the data variable `tas` and save a refernce to it in another python variable

In [3]:
tas = ds.tas

In [4]:
tas

xarray builds on top of numpy, and stores its data internally as numpy arrays. It supports many numpy operations, so it is possible to find out the shape of the underlying data, and use numpy style indexing

In [5]:
tas.shape

(1872, 145, 192)

In [6]:
tas[0,:]

By selecting just the first time index it has created a DataArray with no time dimension, but time is still a coordinate not associated with any variable, as indicated by no longer having `*` beside it. It now only has one value: the value of the first time index. The index selection above is equivalent to using `isel` like so

In [7]:
tas.isel(time=0)

One way is more compact, and one more descriptive, but they have the same result.

The power of xarray comes with the close association of data with coordinates. So it is possible to use the equivalent `.sel` operator but with coordinate values. For example, to select an area that includes the Indian Ocean and Australia use `slice` to indicate the range of latitude and longitude values required and pass as key/value pairs to `sel`.  `slice` will include coordinate values less than **or equal** to the upper bound, not like `range` in basic python that excludes the upper bound

In [8]:
tas.sel(lon=slice(20,160),lat=slice(-80,25))

Operators can be chained, so multiple operations can be peformed sequentially. For example, to select the above area and the first time index

In [9]:
tas.isel(time=0).sel(lon=slice(20,160), lat=slice(-80,25))

In this case it is convenient to use `isel` to select the time, rather than specifying a date, but it is also possible to specify the date explicitly using `sel`

In [10]:
tas.sel(time='1850-01-16T12:00:00', lon=slice(20,160), lat=slice(-80,25))

It is also possible to use `slice` for the `time` dimension. To select Mar to November of 1871:

In [11]:
tas.sel(time=slice('1871-03','1871-11'), lon=slice(20,160), lat=slice(-80,25))

The `slice` operator selects values between an upper and lower bound. If a single coordinate value is required when using `sel` it must either correspond to an *exact* value in the coordinate array, or the `method` argument specified to tell xarray how to choose a value. For example, to select out just values in the cell closest to Brisbane

In [12]:
tas.sel(lat=-27.47, lon=153.03, method='nearest')

So the closest location in the data was at `lat=-27.5`, `lon=153.8`.