# Indexing and Selecting data


---

## Learning Objectives 


- Select data by position using `.isel()` with values or slices
- Select data by coordinate label/value using `.sel()` with values or slices
- Use nearest-neighbor lookups with `.sel()`
- Use `interp()` to interpolate by coordinate labels

## Prerequisites


| Concepts | Importance | Notes |
| --- | --- | --- |
| [Understanding of xarray core data structures](./01-xarray-fundamentals.ipynb) | Necessary | |
| [Basic familiarity with NumPy indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html) | Helpful | |
| [Basic familiarity with Pandas indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) | Helpful | |

- **Time to learn**: *15-20 minutes*



---

## Imports


In [3]:
import xarray as xr

In [4]:
ds = xr.open_dataset(
    "/home/rdevinen/palm/current_version/JOBS/testing/INPUT/testing_static", engine="netcdf4")

ds

## NumPy Positional Indexing

When working with numpy, indexing is done by position (slices/ranges/scalars).

In [6]:
sf = ds["surface_fraction"].data  # retrieve numpy array
sf

array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)

In [7]:
sf.shape, sf.ndim

((3, 64, 64), 3)

Let's extract a timeseries for a single spatical location 


In [10]:
sf[:, 32, 0]

array([0., 0., 0.], dtype=float32)

## Different choices for indexing 


Xarray supports two kinds of indexing 

- Positional indexing via `.isel()`: provides primarily integer position based index (from `0` to `length-1` of the axis/dimension
- Label indexing via `.sel()`: provides primarily label based index

Xarray's indexing methods preserves the coordinate labels and associated metadata.



### Selection by position

The `.isel()` method is the primary access method for **purely integer based indexing**. The following are valid inputs:
- An integer e.g. `lat=10`
- A list or array of integers `lon=[10, 20, 39]`
- A slice object with integers e.g. `time=slice(2, 20)`

In [14]:
ds["surface_fraction"].isel()  # the original object i.e. no selection

In [15]:
ds["surface_fraction"].isel(x=32)

In [16]:
ds["surface_fraction"].isel(x=32, y=0)

In [17]:
ds["surface_fraction"].isel(x=32, y=slice(10, 35))

### Selection by label 


The `.sel()` method is the primary access method for **purely coordinate label based indexing.**. The following are valid inputs:

- A single coordinate label e.g. `time="2021-03-01"`
- A list or array of coordinate labels `lon=[="2021-01-01", ="2021-03-10", ="2021-03-12"]`
- A slice object with coordinate labels e.g. `time=slice("2021-01-01", "2021-03-01")`.  (Note that contrary to usual Python slices, both the start and the stop are included, when present in the index!)

In [20]:
ds["surface_fraction"].sel(y=12.5, method='nearest')

### Nearest-neighbor lookups

As shown above, when our coordinate labels are not integers or strings or datetime-like but floating point numbers, `.sel()` may throw a `KeyError`:

`ds.tas.sel(lat=39.5, lon=105.7)` fails because we are trying to use a conditional for an approximate value i.e floating numbers are represented approximately inside the computer, and xarray is unable to locate this exact value. To address this issue, xarray supports `method` and `tolerance` keyword argument. The `method` parameter allows for enabling nearest neighbor (inexact) lookups by use of the methods `'pad', 'backfill' or 'nearest'`: 

In [21]:
ds["surface_fraction"].sel(x=32, y=0, method='nearest')

See the [xarray documentation](https://xarray.pydata.org/en/stable/generated/xarray.DataArray.sel.html) for more on usage of `method` and `tolerance` parameters in `.sel()`. 

<div class="admonition alert alert-info">
    <p class="title" style="font-weight:bold">Tip</p>
Another way to use the nearest neighbor lookup is via slice objects. For e.g.:
</div>

In [23]:
ds["surface_fraction"].sel(x=slice(31, 33), y=slice(0, 1))

Operators can be chained, so multiple operations can be peformed sequentially. For example, to select an area of interest and the first time index

In [25]:
ds["surface_fraction"].isel(nsurface_fraction=0).sel(x=slice(31, 33), y=slice(0, 1))

### Interpolation

If we want to interpolate along coordinates rather than looking up the nearest neighbos, we can use the `.interp()` method. To use `interp()` requires the presence of `scipy` library. 


In [21]:
ds["surface_fraction"].interp(lat=[10, 10.1, 10.2], method='nearest')

---

## Summary 

- Xarray’s named dimensions and labeled coordinates free the user from having to track positional ordering of dimensions when accessing data
- Xarray provides a variety of methods for subsetting data via `.sel()`, `.isel()`, `.interp()` methods


## Resources and References

- [Xarray Documentation - Indexing and Selecting Data](https://xarray.pydata.org/en/stable/indexing.html)
- [Xarray Documentation - Interpolation](https://xarray.pydata.org/en/stable/user-guide/interpolation.html)


<div class="admonition alert alert-success">
    <p class="title" style="font-weight:bold">Previous: <a href="./01-xarray-fundamentals.ipynb">Xarray Fundamentals</a></p>
    <p class="title" style="font-weight:bold">Next: <a href="./03-data-visualization.ipynb">Data Visualization</a></p>
</div>