# Indexing and Selecting Data

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Indexing-and-Selecting-Data" data-toc-modified-id="Indexing-and-Selecting-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Indexing and Selecting Data</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#Label-based-Indexing" data-toc-modified-id="Label-based-Indexing-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Label-based Indexing</a></span></li><li><span><a href="#NumPy-Positional-Indexing" data-toc-modified-id="NumPy-Positional-Indexing-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>NumPy Positional Indexing</a></span></li><li><span><a href="#Indexing-with-xarray" data-toc-modified-id="Indexing-with-xarray-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Indexing with xarray</a></span></li><li><span><a href="#Vectorized-Indexing" data-toc-modified-id="Vectorized-Indexing-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Vectorized Indexing</a></span></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives


- Select data by position using .isel with values or slices
- Select data by label using .sel with values or slices
- Select timeseries data by date/time with values or slices
- Use nearest-neighbor lookups with .sel

## Label-based Indexing


Scientific data is inherently labeled. For example, time series data includes timestamps that label individual periods or points in time, spatial data has coordinates (e.g. longitude, latitude, elevation), and model or laboratory experiments are often identified by unique identifiers. 

In [None]:
import xarray as xr

In [None]:
ds = xr.open_dataset('../../../data/air_temperature.nc')
ds

## NumPy Positional Indexing

When working with numpy, indexing is done by position (slices/ranges/scalars).

In [None]:
t = ds['air'].data # numpy array 
t

In [None]:
t.shape

In [None]:
# extract a time-series for one spatial location
t[:, 20, 40]

<div class="alert alert-block alert-warning">
but wait, what labels go with 10 and 20? Was that lat/lon or lon/lat? Where are the timestamps that go along with this time-series?
</div>

## Indexing with xarray

xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection.

In [None]:
da = ds['air'] # Extract data array
da

- **NumPy style indexing still works (but preserves the labels/metadata)**

In [None]:
da[:, 20, 40]

- **Positional indexing using dimension names**

In [None]:
da.isel(lat=20, lon=40)

- **Label-based indexing**

In [None]:
da.sel(lat=50., lon=200.)

- **Nearest Neighbor Lookups**

In [None]:
da.sel(lat=52.25, lon=251.8998, method='nearest')

- **All of these indexing methods work on the dataset too:**

In [None]:
ds.sel(lat=52.25, lon=251.8998, method='nearest')

## Vectorized Indexing

Like numpy and pandas, xarray supports indexing many array elements at once in a vectorized manner:


In [None]:
# generate a coordinates for a transect of points
lat_points = xr.DataArray([52, 52.5, 53], dims='points')
lon_points = xr.DataArray([250, 250, 250], dims='points')
lat_points

In [None]:
lon_points

In [None]:
# nearest neighbor selection along the transect
da.sel(lat=lat_points, lon=lon_points, method='nearest')

## Going Further

- [Xarray Docs - Indexing and Selecting Data](https://xarray.pydata.org/en/stable/indexing.html)

<div class="alert alert-block alert-success">
  <p>Previous: <a href="02_io.ipynb">I/O</a></p>
  <p>Next: <a href="04_agg.ipynb">Aggreation</a></p>
</div>