# Subsetting a dataset by time and space (Slicing and Dicing)

Reload the library and the dataset from the previous notebook

In [2]:
import xarray

In [3]:
ds = xarray.open_dataset('http://dapds00.nci.org.au/thredds/dodsC/rr3/CMIP5/output1/CSIRO-BOM/ACCESS1-3/historical/mon/atmos/Amon/r1i1p1/latest/tas/tas_Amon_ACCESS1-3_historical_r1i1p1_185001-200512.nc')

Select just the data variable `tas` and save it to another variable

In [4]:
tas = ds.tas
tas

<xarray.DataArray 'tas' (time: 1872, lat: 145, lon: 192)>
[52116480 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 1850-01-16T12:00:00 ... 2005-12-16T12:00:00
  * lat      (lat) float64 -90.0 -88.75 -87.5 -86.25 ... 86.25 87.5 88.75 90.0
  * lon      (lon) float64 0.0 1.875 3.75 5.625 7.5 ... 352.5 354.4 356.2 358.1
    height   float64 ...
Attributes:
    standard_name:     air_temperature
    long_name:         Near-Surface Air Temperature
    units:             K
    cell_methods:      time: mean
    cell_measures:     area: areacella
    history:           2012-02-05T23:49:51Z altered by CMOR: Treated scalar d...
    associated_files:  baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation...

xarray builds on top of numpy, and stores it's data internally as numpy arrays. It supports many numpy operations, so it is possible to find out the shape of the underlying data, and use numpy style indexing

In [None]:
tas.shape

In [None]:
tas[0,:]

By selecting just the first time index it has created a DataArray with no time dimension, but time is still a coordinate not associated with any variable, as indicated by no longer having `*` beside it. The index selection above is equivalent to using `isel` like so

In [None]:
tas.isel(time=0)

One way is more compact, and one more descriptive, but they have the same result.

The power of xarray comes with the close association of data with coordinates. So it is possible to use the equivalent `.sel` operator but with coordinate values. For example, to select an area that includes the Indian Ocean and Australia use `slice` to indicate the range of latitude and longitude values required and pass as key/value pairs to `sel`

In [None]:
tas.sel(lon=slice(20,160),lat=slice(-80,25))

Operators can be chained, so multiple operations can be peformed sequentially. For example, to select the above areas and the first time index

In [None]:
tas.isel(time=0).sel(lon=slice(20,160), lat=slice(-80,25))

In this case it is more convenient to use `isel` to select the time, rather than specifying a date, but this would also work

In [None]:
tas.sel(time='1850-01-16', lon=slice(20,160), lat=slice(-80,25))

It is also possible to use `slice` for the `time` dimension. To select Mar to November of 1871:

In [None]:
tas.sel(time=slice('1871-03','1871-11'), lon=slice(20,160), lat=slice(-80,25))

The `slice` operator selects values between an upper and lower bound. If a single coordinate value is required when using `sel` it must either correspond to an *exact* value in the coordinate array, or the `method` argument specified to tell xarray how to choose a value. For example, to select out just values in the cell closest to Brisbane

In [5]:
tas.sel(lat=-27.47, lon=153.03, method='nearest')

<xarray.DataArray 'tas' (time: 1872)>
array([296.4649 , 296.54944, 296.43167, ..., 293.6983 , 294.73325, 296.40576],
      dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 1850-01-16T12:00:00 ... 2005-12-16T12:00:00
    lat      float64 -27.5
    lon      float64 153.8
    height   float64 ...
Attributes:
    standard_name:     air_temperature
    long_name:         Near-Surface Air Temperature
    units:             K
    cell_methods:      time: mean
    cell_measures:     area: areacella
    history:           2012-02-05T23:49:51Z altered by CMOR: Treated scalar d...
    associated_files:  baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation...