This notebook explores using xarray's pointwise indexing, which allows you to extract cell values from a datacube for individual (x,y,x) tuples.  This is particularly useful for extracting values from a gridded data set (e.g. reanalysis) for a shiptrack, flightline or drifting station directory.

In [1]:
import numpy as np
import xarray as xr

For starters I'm just going to set up a simple example using a 3-D array

In [2]:
da = xr.DataArray(np.arange(4*12).reshape((4,3, 4)), dims=['t', 'x', 'y'], coords={'t': [0,1,2,3], 'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
da

<xarray.DataArray (t: 4, x: 3, y: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]],

       [[24, 25, 26, 27],
        [28, 29, 30, 31],
        [32, 33, 34, 35]],

       [[36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47]]])
Coordinates:
  * t        (t) int64 0 1 2 3
  * x        (x) int64 0 1 2
  * y        (y) <U1 'a' 'b' 'c' 'd'

To utilize pointwise indexing, indices or labels have to be in DataArrays.  Here I'm setting up index arrays that I will use with isel

In [19]:
it = xr.DataArray([0,1,2,3], dims='t')
ix = xr.DataArray([0,1,2,2], dims='t')
iy = xr.DataArray([0,1,3,2], dims='t')

We just use isel in the normal way.  This returns a 4 element DataArray.

In [20]:
da.isel(t=it, x=ix, y=iy)

<xarray.DataArray (t: 4)>
array([ 0, 17, 35, 46])
Coordinates:
  * t        (t) datetime64[ns] 2018-08-15 2018-08-16 2018-08-17 2018-08-18
    x        (t) int64 0 1 2 2
    y        (t) int64 0 1 3 2

If we just used lists or numpy arrays we get the full DataArray back because the index arrays span each dimension

In [5]:
da.isel(t=[0,1,2,3], x=[0,1,2,2], y=[0,1,2,3])

<xarray.DataArray (t: 4, x: 4, y: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23],
        [20, 21, 22, 23]],

       [[24, 25, 26, 27],
        [28, 29, 30, 31],
        [32, 33, 34, 35],
        [32, 33, 34, 35]],

       [[36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47],
        [44, 45, 46, 47]]])
Coordinates:
  * t        (t) int64 0 1 2 3
  * x        (x) int64 0 1 2 2
  * y        (y) <U1 'a' 'b' 'c' 'd'

We can also access elements by label, for example

In [6]:
dy = xr.DataArray(['a','b','c','d'], dims='t')
da.sel(t=it, x=ix, y=dy)

<xarray.DataArray (t: 4)>
array([ 0, 17, 34, 47])
Coordinates:
  * t        (t) int64 0 1 2 3
    x        (t) int64 0 1 2 2
    y        (t) <U1 'a' 'b' 'c' 'd'

What if we use timestamps

In [7]:
import pandas as pd

In [8]:
time = pd.date_range('2018-08-15', periods=4)
time

DatetimeIndex(['2018-08-15', '2018-08-16', '2018-08-17', '2018-08-18'], dtype='datetime64[ns]', freq='D')

In [9]:
da = xr.DataArray(np.arange(4*12).reshape((4,3, 4)), dims=['t', 'x', 'y'], coords={'t': time, 'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})
da

<xarray.DataArray (t: 4, x: 3, y: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]],

       [[24, 25, 26, 27],
        [28, 29, 30, 31],
        [32, 33, 34, 35]],

       [[36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47]]])
Coordinates:
  * t        (t) datetime64[ns] 2018-08-15 2018-08-16 2018-08-17 2018-08-18
  * x        (x) int64 0 1 2
  * y        (y) <U1 'a' 'b' 'c' 'd'

In [10]:
import datetime as dt

In [21]:
t = xr.DataArray([dt.datetime(2018,8,16), 
                  dt.datetime(2018,8,16), 
                  dt.datetime(2018,8,18), 
                  dt.datetime(2018,8,18)], 
                 dims=['t'])
ix = xr.DataArray([0,1,2,2], dims='t')
iy = xr.DataArray([0,1,3,2], dims='t')

In [23]:
da.sel(t=t, x=ix, y=iy)

<xarray.DataArray (t: 4)>
array([12, 17, 47, 46])
Coordinates:
  * t        (t) datetime64[ns] 2018-08-16 2018-08-16 2018-08-18 2018-08-18
    x        (t) int64 0 1 2 2
    y        (t) int64 0 1 3 2

Need to figure out how to use method='nearest' for times that are  

In [28]:
t = xr.DataArray([dt.datetime(1979,8,16), 
                  dt.datetime(2018,8,16), 
                  dt.datetime(2018,8,18), 
                  dt.datetime(2018,8,19)], 
                 dims=['t'])
t

<xarray.DataArray (t: 4)>
array(['1979-08-16T00:00:00.000000000', '2018-08-16T00:00:00.000000000',
       '2018-08-18T00:00:00.000000000', '2018-08-19T00:00:00.000000000'],
      dtype='datetime64[ns]')
Dimensions without coordinates: t

In [29]:
da = xr.DataArray(np.arange(4*12).reshape((4,3, 4)), dims=['t', 'x', 'y'], 
                  coords={'t': time, 'x': [0, 1, 2], 'y': [0,1,2,3]})
da

<xarray.DataArray (t: 4, x: 3, y: 4)>
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]],

       [[24, 25, 26, 27],
        [28, 29, 30, 31],
        [32, 33, 34, 35]],

       [[36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47]]])
Coordinates:
  * t        (t) datetime64[ns] 2018-08-15 2018-08-16 2018-08-17 2018-08-18
  * x        (x) int64 0 1 2
  * y        (y) int64 0 1 2 3

In [30]:
da.sel(t=t, x=ix, y=iy, method='nearest')

<xarray.DataArray (t: 4)>
array([ 0, 17, 47, 46])
Coordinates:
  * t        (t) datetime64[ns] 2018-08-15 2018-08-16 2018-08-18 2018-08-18
    x        (t) int64 0 1 2 2
    y        (t) int64 0 1 3 2

In [36]:
da.sel(t=t, x=ix, y=iy, method='nearest', tolerance=dt.timedelta(days=1))

KeyError: "not all values found in index 't'"

In [34]:
dt.timedelta(days=0.5)

datetime.timedelta(0, 43200)