Last Updated: 08-03-2017

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Indexing-and-Selecting-Data" data-toc-modified-id="Indexing-and-Selecting-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Indexing and Selecting Data</a></div><div class="lev2 toc-item"><a href="#Positional-indexing" data-toc-modified-id="Positional-indexing-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Positional indexing</a></div><div class="lev2 toc-item"><a href="#Indexing-with-labeled-dimensions" data-toc-modified-id="Indexing-with-labeled-dimensions-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Indexing with labeled dimensions</a></div><div class="lev3 toc-item"><a href="#Use-a-dictionary-as-the-argument-for-array-positional-or-label-based-array-indexing" data-toc-modified-id="Use-a-dictionary-as-the-argument-for-array-positional-or-label-based-array-indexing-121"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Use a dictionary as the argument for array positional or label based array indexing</a></div><div class="lev3 toc-item"><a href="#Use-the-sel()-and-isel()-convenience-methods" data-toc-modified-id="Use-the-sel()-and-isel()-convenience-methods-122"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Use the <code>sel()</code> and <code>isel()</code> convenience methods</a></div><div class="lev2 toc-item"><a href="#Pointwise-indexing" data-toc-modified-id="Pointwise-indexing-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Pointwise indexing</a></div><div class="lev2 toc-item"><a href="#Dataset-indexing" data-toc-modified-id="Dataset-indexing-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Dataset indexing</a></div><div class="lev2 toc-item"><a href="#Dropping-Labels" data-toc-modified-id="Dropping-Labels-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Dropping Labels</a></div><div class="lev2 toc-item"><a href="#Nearest-neighbor-lookups" data-toc-modified-id="Nearest-neighbor-lookups-16"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Nearest neighbor lookups</a></div><div class="lev2 toc-item"><a href="#Masking-with-where" data-toc-modified-id="Masking-with-where-17"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Masking with <code>where</code></a></div><div class="lev2 toc-item"><a href="#Multi-level-indexing" data-toc-modified-id="Multi-level-indexing-18"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Multi-level indexing</a></div><div class="lev2 toc-item"><a href="#Orthogonal(outer)-vs.-vectorized-indexing" data-toc-modified-id="Orthogonal(outer)-vs.-vectorized-indexing-19"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Orthogonal(outer) vs. vectorized indexing</a></div><div class="lev2 toc-item"><a href="#Key-Points" data-toc-modified-id="Key-Points-110"><span class="toc-item-num">1.10&nbsp;&nbsp;</span>Key Points</a></div>

In [1]:
import xarray as xr
import numpy as np 
import pandas as pd

In [2]:
%time ds = xr.open_dataset('/home/abanihi/Documents/climate-data/ERM/t85.an.sfc/e4moda.an.sfc.t85.sst.1957-2002.nc')

CPU times: user 48 ms, sys: 0 ns, total: 48 ms
Wall time: 48.3 ms


# Indexing and Selecting Data


Similarly to pandas objects, xarray objects support both integer and
label based lookups along each dimension. However, xarray objects also
have named dimensions, so you can optionally use dimension names instead
of relying on the positional ordering of dimensions.

Thus in total, xarray supports four different kinds of indexing, as
described below and summarized in this table:


| **Dimension lookup** | **Index lookup** | **```DataArray```**                                               | **```Dataset```**                                               |
|----------------------|------------------|-------------------------------------------------------------------|-----------------------------------------------------------------|
| Positional           | By integer       | ```arr[:, 0]```                                                   | *not available*                                                 |
| Positional           | By label         | ``arr.loc[:, 'IA']``                                              | *not available*                                                 |
| By name              | By integer       | ```ds.isel(space=0)``` or  <br> ```arr[dict(space=0)]```          | ```ds.isel(space=0)``` or <br> ```ds[dict(space=0)]```          |
| By name              | By label         | ```arr.sel(space='IA')``` or <br> ```arr.loc[dict(space='IA')]``` | ```ds.sel(space='IA')``` or <br> ```ds.loc[dict(space='IA')]``` |

## Positional indexing

Indexing a ```xarray.DataArray``` directly works (mostly) just like it
does for numpy arrays, except that the returned object is always another
DataArray:


In [3]:
sst = ds['SST']
sst

<xarray.DataArray 'SST' (time: 540, lat: 128, lon: 256)>
[17694720 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

In [4]:
sst[:2]

<xarray.DataArray 'SST' (time: 2, lat: 128, lon: 256)>
array([[[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205],
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693],
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693]]])
Coordinates:
  * time     (time) datetime64[ns] 1957-09-01 1957-10-01
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

In [5]:
sst[:, [0, 0]]

<xarray.DataArray 'SST' (time: 540, lat: 2, lon: 256)>
[276480 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat      (lat) float32 -88.9277 -88.9277
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

In [6]:
sst[0,0,0]

<xarray.DataArray 'SST' ()>
array(nan)
Coordinates:
    time     datetime64[ns] 1957-09-01
    lat      float32 -88.9277
    lon      float32 0.0
Attributes:
    units:      K
    long_name:  Sea surface temperature

In [7]:
sst[0, :, :]

<xarray.DataArray 'SST' (lat: 128, lon: 256)>
array([[        nan,         nan,         nan, ...,         nan,         nan,
                nan],
       [        nan,         nan,         nan, ...,         nan,         nan,
                nan],
       [        nan,         nan,         nan, ...,         nan,         nan,
                nan],
       ..., 
       [ 271.460205,  271.460205,  271.460205, ...,  271.460205,  271.460205,
         271.460205],
       [ 271.460205,  271.460205,  271.460205, ...,  271.460205,  271.460205,
         271.460205],
       [ 271.460205,  271.460205,  271.460205, ...,  271.460205,  271.460205,
         271.460205]])
Coordinates:
    time     datetime64[ns] 1957-09-01
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

- This method of handling arrays should be familiar to anyone who has worked with arrays in MATLAB or NumPy. One challenge with this approach: it is not simple to associate an integer index position with something meaningful in our data. For example, we would have to write some function to map a specific date in the time dimension to its associated integer. Therefore, xarray lets us perform positional indexing using labels instead of integers:

In [8]:
sst.loc['1957-09-01', :, :]

<xarray.DataArray 'SST' (lat: 128, lon: 256)>
array([[        nan,         nan,         nan, ...,         nan,         nan,
                nan],
       [        nan,         nan,         nan, ...,         nan,         nan,
                nan],
       [        nan,         nan,         nan, ...,         nan,         nan,
                nan],
       ..., 
       [ 271.460205,  271.460205,  271.460205, ...,  271.460205,  271.460205,
         271.460205],
       [ 271.460205,  271.460205,  271.460205, ...,  271.460205,  271.460205,
         271.460205],
       [ 271.460205,  271.460205,  271.460205, ...,  271.460205,  271.460205,
         271.460205]])
Coordinates:
    time     datetime64[ns] 1957-09-01
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

## Indexing with labeled dimensions

- This is great, but we still need to be keeping track of the fact that our index position 1 is the time dimension, position 2 is latitude, etc. So rather than looking up our dimension by position, xarray enables us to use the dimension name instead:

###  Use a dictionary as the argument for array positional or label based array indexing

In [9]:
# index by integer array indices
sst[dict(time=0, lat=slice(None, 2), lon=slice(None))]

<xarray.DataArray 'SST' (lat: 2, lon: 256)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])
Coordinates:
    time     datetime64[ns] 1957-09-01
  * lat      (lat) float32 -88.9277 -87.5387
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

In [10]:
# index by dimension coordinate labels
sst.loc[dict(time=slice('1957-09-01', '1957-12-01'))]

<xarray.DataArray 'SST' (time: 4, lat: 128, lon: 256)>
[131072 values with dtype=float64]
Coordinates:
  * time     (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 1957-12-01
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

### Use the ```sel()``` and ```isel()``` convenience methods

In [11]:
# index by integer array indices
sst.isel(time=slice(0, 1), lat=slice(None, 10), lon=slice(None, 100))

<xarray.DataArray 'SST' (time: 1, lat: 10, lon: 100)>
array([[[ nan,  nan, ...,  nan,  nan],
        [ nan,  nan, ...,  nan,  nan],
        ..., 
        [ nan,  nan, ...,  nan,  nan],
        [ nan,  nan, ...,  nan,  nan]]])
Coordinates:
  * time     (time) datetime64[ns] 1957-09-01
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

In [12]:
# index by dimension coordinate labels
sst.sel(time=slice('1957-09-01', '1957-10-01'))

<xarray.DataArray 'SST' (time: 2, lat: 128, lon: 256)>
array([[[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205],
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693],
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693]]])
Coordinates:
  * time     (time) datetime64[ns] 1957-09-01 1957-10-01
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Attributes:
    units:      K
    long_name:  Sea surface temperature

## Pointwise indexing


xarray pointwise indexing supports the indexing along multiple labeled dimensions using list-like objects. While ```isel()``` performs orthogonal indexing, the ```isel_points()``` method provides similar numpy indexing behavior as if you were using multiple lists to index an array (e.g. ```arr[[0, 1], [0, 1]]``` ):

In [13]:
# index by integer array indices
sst.isel_points(time=[0, 3, 10], lat=[0, 1, 6])

<xarray.DataArray 'SST' (points: 3, lon: 256)>
array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])
Coordinates:
    time     (points) datetime64[ns] 1957-09-01 1957-12-01 1958-07-01
    lat      (points) float32 -88.9277 -87.5387 -80.5421
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Dimensions without coordinates: points
Attributes:
    units:      K
    long_name:  Sea surface temperature

There is also ```sel_points()```, which analogously allows you to do point-wise indexing by label:

In [14]:
times = pd.to_datetime(['1957-09-01', '1957-10-01', '1957-11-01'])
times

DatetimeIndex(['1957-09-01', '1957-10-01', '1957-11-01'], dtype='datetime64[ns]', freq=None)

In [15]:
sst.sel_points(time=times)

<xarray.DataArray 'SST' (points: 3, lon: 256, lat: 128)>
array([[[        nan,         nan, ...,  271.460205,  271.460205],
        [        nan,         nan, ...,  271.460205,  271.460205],
        ..., 
        [        nan,         nan, ...,  271.460205,  271.460205],
        [        nan,         nan, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,  271.460693,  271.460693],
        [        nan,         nan, ...,  271.460693,  271.460693],
        ..., 
        [        nan,         nan, ...,  271.460693,  271.460693],
        [        nan,         nan, ...,  271.460693,  271.460693]],

       [[        nan,         nan, ...,  271.460449,  271.460449],
        [        nan,         nan, ...,  271.460449,  271.460449],
        ..., 
        [        nan,         nan, ...,  271.460449,  271.460449],
        [        nan,         nan, ...,  271.460449,  271.460449]]])
Coordinates:
    time     (points) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01
  * lat 

The equivalent pandas method to ```sel_points``` is ```lookup()```.

## Dataset indexing

We can also use these methods to index all variables in a dataset simultaneously, returning a new dataset:

In [16]:
ds.isel(time=slice(0, 1), lat=slice(0, 1), lon=slice(0, 1))

<xarray.Dataset>
Dimensions:     (lat: 1, lon: 1, time: 1)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01
  * lat         (lat) float32 -88.9277
  * lon         (lon) float32 0.0
Data variables:
    gw          (lat) float32 0.000449381
    date        (time) int32 19570901
    datesec     (time) int32 0
    yyyymmddhh  (time) int32 1957090100
    SST         (time, lat, lon) float64 nan
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    source_NCAR:               \nData Support Section                        ...
    source_format:             \nThe original ECMWF and the derived T85 are i...
    source_file:               \nMSS: /DSS/U82386
    source_availability:       \nThe ERA-40 data a

In [17]:
ds.sel(time='1957-09-01')

<xarray.Dataset>
Dimensions:     (lat: 128, lon: 256)
Coordinates:
    time        datetime64[ns] 1957-09-01
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581 0.0016425 0.00223829 ...
    date        int32 19570901
    datesec     int32 0
    yyyymmddhh  int32 1957090100
    SST         (lat, lon) float64 nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    source_NCAR:               \nData Support Section                        ...
    source_format:             \nThe original ECMWF and the

In [18]:
ds.isel_points(time=[2, 3, 5], lat=[0, 3, 5], lon=[23, 45, 68], dim='points')

<xarray.Dataset>
Dimensions:     (points: 3)
Coordinates:
    time        (points) datetime64[ns] 1957-11-01 1957-12-01 1958-02-01
    lat         (points) float32 -88.9277 -84.7424 -81.9425
    lon         (points) float32 32.338 63.27 95.608
Dimensions without coordinates: points
Data variables:
    gw          (points) float32 0.000449381 0.00223829 0.00342553
    date        (points) int32 19571101 19571201 19580201
    datesec     (points) int32 0 0 0
    yyyymmddhh  (points) int32 1957110100 1957120100 1958020100
    SST         (points) float64 nan nan nan
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    source_NCAR:               \nData Support Section                        ...
    source_format

Positional indexing on a dataset is not supported because the ordering of dimensions in a dataset is somewhat ambiguous (it can vary between different arrays). However, you can do normal indexing with labeled dimensions:

In [19]:
ds[dict(time=slice(0, 1), lat=slice(0, 5), lon=slice(0, 3))]

<xarray.Dataset>
Dimensions:     (lat: 5, lon: 3, time: 1)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426
  * lon         (lon) float32 0.0 1.406 2.812
Data variables:
    gw          (lat) float32 0.000449381 0.00104581 0.0016425 0.00223829 ...
    date        (time) int32 19570901
    datesec     (time) int32 0
    yyyymmddhh  (time) int32 1957090100
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    source_NCAR:               \nData Support Section                        ...
    source_format:             \nThe original ECMWF and the d

In [20]:
ds.loc[dict(time='1957-09-01')]

<xarray.Dataset>
Dimensions:     (lat: 128, lon: 256)
Coordinates:
    time        datetime64[ns] 1957-09-01
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581 0.0016425 0.00223829 ...
    date        int32 19570901
    datesec     int32 0
    yyyymmddhh  int32 1957090100
    SST         (lat, lon) float64 nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    source_NCAR:               \nData Support Section                        ...
    source_format:             \nThe original ECMWF and the

## Dropping Labels

The ```drop()``` method returns a new object with the listed index labels along a dimension dropped:

In [21]:
ds.drop([0.0], dim='lon')

<xarray.Dataset>
Dimensions:     (lat: 128, lon: 255, time: 540)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon         (lon) float32 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581 0.0016425 0.00223829 ...
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     

## Nearest neighbor lookups

The label based selection methods ```sel()```, ```reindex()``` and ```reindex_like()``` all support method and tolerance keyword argument. The method parameter allows for enabling nearest neighbor (inexact) lookups by use of the methods ```pad```, ```backfill``` or ```nearest```:

In [22]:
ds.sel(lat=[-88.9277, -87.5387], method='nearest')

<xarray.Dataset>
Dimensions:     (lat: 2, lon: 256, time: 540)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat         (lat) float32 -88.9277 -87.5387
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    

In [23]:
ds.sel(lat=0.1, method='backfill')

<xarray.Dataset>
Dimensions:     (lon: 256, time: 540)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
    lat         float32 0.700384
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          float32 0.0244462
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lon) float64 298.0 298.2 298.3 298.4 298.4 298.4 ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    source_NCAR:               \nData Support Secti

In [24]:
ds.reindex(lat=[-88.9277, -87.5387], method='pad')

<xarray.Dataset>
Dimensions:     (lat: 2, lon: 256, time: 540)
Coordinates:
  * lat         (lat) float64 -88.93 -87.54
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    sour

Tolerance limits the maximum distance for valid matches with an inexact lookup:

In [25]:
ds.reindex(lat=[-88.9277, -87.5387], method='nearest', tolerance=0.2)

<xarray.Dataset>
Dimensions:     (lat: 2, lon: 256, time: 540)
Coordinates:
  * lat         (lat) float64 -88.93 -87.54
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                     \nThis dataset is a netCDF version of ds126.0 ...
    sour

## Masking with ```where```

Indexing methods on xarray objects generally return a subset of the original data. 
However, it is sometimes useful to select an object with the same shape as the original data, 
but with some elements masked. To do this type of selection in xarray, use ```where()```:

In [26]:
ds.where((ds.lon + ds.lat > 0.0) & (ds.lon + ds.lat < 5.0))

<xarray.Dataset>
Dimensions:     (lat: 128, lon: 256, time: 540)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat, lon) float32 nan nan nan nan nan nan nan nan nan nan ...
    date        (time, lon, lat) float64 nan nan nan nan nan nan nan nan nan ...
    datesec     (time, lon, lat) float64 nan nan nan nan nan nan nan nan nan ...
    yyyymmddhh  (time, lon, lat) float64 nan nan nan nan nan nan nan nan nan ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:         

By default ```where``` maintains the original size of the data. For cases where the selected data size is much smaller than the original data, use of the option ```drop=True``` clips coordinate elements that are fully masked:

In [27]:
ds.where((ds.lon + ds.lat > 0.0) & (ds.lon + ds.lat < 5.0), drop=True)

<xarray.Dataset>
Dimensions:     (lat: 68, lon: 67, time: 540)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat, lon) float32 nan nan nan nan nan nan nan nan nan nan ...
    date        (time, lon, lat) float64 nan nan nan nan nan nan nan nan nan ...
    datesec     (time, lon, lat) float64 nan nan nan nan nan nan nan nan nan ...
    yyyymmddhh  (time, lon, lat) float64 nan nan nan nan nan nan nan nan nan ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:           

## Multi-level indexing

Just like pandas, advanced indexing on multi-level indexes is possible with ```loc``` and ```sel```. You can slice a multi-index by providing multiple indexers, i.e., a tuple of slices, labels, list of labels, or any selector allowed by pandas:

In [28]:
midx = pd.MultiIndex.from_product([list('abc'), [0, 1]], names=('one', 'two'))

In [29]:
mda = xr.DataArray(np.random.rand(6, 3), [('x', midx), ('y', range(3))])
mda

<xarray.DataArray (x: 6, y: 3)>
array([[ 0.216008,  0.270145,  0.058944],
       [ 0.472134,  0.781544,  0.248329],
       [ 0.862006,  0.343766,  0.740999],
       [ 0.257334,  0.007409,  0.539816],
       [ 0.359773,  0.27757 ,  0.766447],
       [ 0.047914,  0.210177,  0.965813]])
Coordinates:
  * x        (x) MultiIndex
  - one      (x) object 'a' 'a' 'b' 'b' 'c' 'c'
  - two      (x) int64 0 1 0 1 0 1
  * y        (y) int64 0 1 2

In [30]:
mda.sel(x=(list('ab'), [0]))

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.216008,  0.270145,  0.058944],
       [ 0.862006,  0.343766,  0.740999]])
Coordinates:
  * x        (x) MultiIndex
  - one      (x) object 'a' 'b'
  - two      (x) int64 0 0
  * y        (y) int64 0 1 2

- You can also select multiple elements by providing a list of labels or tuples or a slice of tuples.

## Orthogonal(outer) vs. vectorized indexing

In [31]:
ds.SST[ds.SST['time.month'] > 6]

<xarray.DataArray 'SST' (time: 270, lat: 128, lon: 256)>
array([[[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205],
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693],
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693]],

       ..., 
       [[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205],
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,         nan,         nan],
        [  

In [32]:
ds.SST.values[ds.SST.values != 275.0]

array([          nan,           nan,           nan, ...,  271.46044922,
        271.46044922,  271.46044922])

## Key Points
- xarray’s labeled dimensions free the user from having to track positional ordering of dimensions when accessing data, creating a more simplified workflow