<img src="http://xarray.pydata.org/en/stable/_static/dataset-diagram-logo.png" align="right" width="30%">

# Working with labeled data

Learing goals:

- Use different forms of indexing to select data based on position and
  coordinates
- Select datatime ranges
- Interpolate data to new coordinates

## Named dimensions

As mentioned in the previous session, labeled dimensions really help to make the
code less difficult to understand. Compare pure `numpy` indexing:


In [1]:
import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(0)

In [2]:
# axis0: x, axis1: y
np_array = np.random.randn(3, 4)
np_array[1, 3]

-0.1513572082976979

and slicing:


In [3]:
np_array[:2, 1:]

array([[ 0.40015721,  0.97873798,  2.2408932 ],
       [-0.97727788,  0.95008842, -0.15135721]])

with label based indexing:


In [4]:
arr = xr.DataArray(np_array, dims=("x", "y"))
arr.isel(x=1, y=3)

This is the same as


In [5]:
arr[{"x": 1, "y": 1}]

Due to the language syntax, slices have to be constructed manually:


In [6]:
ds = xr.Dataset(
    {
        "a": (("x", "y"), np.random.randn(3, 4)),
        "b": (("x", "y"), np.random.randn(3, 4)),
    }
)
ds.isel(x=slice(None, 2), y=slice(1, None))

We can also use these names to peek at the data if the automatic preview is not
enough:


In [7]:
ds.head(x=2, y=3)

see also `tail` and `thin`.


## Coordinate labels and label based indexing


xarray objects become much more interesting when adding coordinate labels:


In [8]:
arr = xr.DataArray(
    np.random.randn(4, 6),
    dims=("x", "y"),
    coords={
        "x": [-3.2, 2.1, 5.3, 6.5],
        "y": pd.date_range("2009-01-05", periods=6, freq="M"),
    },
)
arr

To select data by coordinate labels instead of integer indices we can use the
same syntax, using `sel` instead of `isel`:


In [9]:
arr.sel(x=5.3, y="2009-04-30")  # or a.loc[{"x": 5.3, "y": "2009-04-30"}]

this will require us to specify exact values. If we don't have those, we can use
the `method` parameter (see `Dataset.sel` for documentation):


In [10]:
arr.sel(x=4, y="2009-04-01", method="nearest")

We can also select multiple values:


In [11]:
arr.sel(x=[-3.2, 6.5], y=slice("2009-02-28", "2009-05-31"))

If instead of selecting data we want to drop it, we can use `drop_sel`:


In [12]:
arr.drop_sel(x=[-3.2, 5.3])

### Exercises


In [13]:
ds = xr.tutorial.open_dataset("air_temperature")
ds

1. Select the first 30 entries of latitude and 20th to 40th entries of longitude


In [14]:
# your code here

2. Select all data at 75 degree north and between Jan 1, 2013 and Oct 15, 2013


In [15]:
# your code here

3. Remove all entries at 260 and 270 degrees


In [16]:
# your code here

## Interpolation

If we want to look at values between the current grid cells (interpolation), we
can do that with `interp` (requires `scipy`):


In [17]:
arr.interp(
    x=np.linspace(2, 6, 10),
    y=pd.date_range("2009-04-01", "2009-04-30", freq="D"),
)

when trying to extrapolate, the resulting values will be `nan`.

If we already have a object with the desired coordinates, we can use
`interp_like`:


In [18]:
other = xr.DataArray(
    dims=("x", "y"),
    coords={
        "x": np.linspace(2, 4, 10),
        "y": pd.date_range("2009-04-01", "2009-04-30", freq="D"),
    },
)
arr.interp_like(other)

### Exercises

Increase the step size along latitude and longitude from 2.5 degrees to 1
degree.


In [19]:
# your code here

## Broadcasting and automatic alignment

Labels help with combining arrays with different coordinates:


In [20]:
a = xr.DataArray(
    np.random.randn(3, 4),
    dims=("x", "y"),
    coords={"x": ["a", "b", "c"], "y": np.arange(4)},
)
b = xr.DataArray(
    np.random.randn(2, 7),
    dims=("x", "y"),
    coords={"x": ["b", "d"], "y": [-2, -1, 0, 1, 2, 3, 4]},
)

a + b

This will automatically select only common labels from both arrays (a inner
join) and then perform the operation.


Broadcasting works similar:


In [21]:
arr1 = xr.DataArray(
    np.random.randn(3),
    dims="x",
    coords={"x": ["a", "b", "c"]},
)
arr2 = xr.DataArray(
    np.random.randn(4),
    dims="y",
    coords={"y": np.arange(4)},
)

arr1 + arr2

where both arrays were automatically broadcasted against each other:


In [22]:
arr1_, arr2_ = xr.broadcast(arr1, arr2)

In [23]:
arr1_

In [24]:
arr2_

and then the operation (a sum) was executed.

We can also call `align` speciically with different options.


In [25]:
a_al, b_al = xr.align(a, b, join="inner")
b_al