# **Tutorial 2: Selection, Interpolation and Slicing**

**Week 1, Day 1, Introduction to the Climate System**

**Content creators:** Sloane Garelick, Julia Kent

**Content reviewers:** Danika Gupta, Younkap Nina Duplex 

**Content editors:** Yosmely Bermúdez

**Production editors:** TBD

**Our 2023 Sponsors:** TBD







###**Code and Data Sources**

Code and data for this tutorial is based on existing content from [Project Pythia](https://foundations.projectpythia.org/core/xarray/xarray-intro.html).

## **Tutorial 2 Objectives**
To assess global variations in climate variables, such as temperature and incoming solar radiation, it’s useful to be able to extract and compare subsets of data from a larger dataset. 

In this tutorial, we will explore multiple computational tools in Xarray that allow us to select data from a specific spatial and temporal range. We will practice using:


*   **`.sel()`:** select data based on coordinate values using
*   **`.interp()`:** interpolate to any latitude/longitude location to extract data
*   **Slicing:** to select a range (or slice) along one or more coordinates, we can pass a Python slice object to `.sel()`
*   **`.loc()`:** extract data from a specific coordinate or date


## Imports

In [None]:
!pip install datetime

!pip install numpy
!pip install pandas
!pip install xarray
!pip install pythia_datasets

In [None]:
from datetime import timedelta

import numpy as np
import pandas as pd
import xarray as xr
from pythia_datasets import DATASETS

###  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


####  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#####  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


######  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#######  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


########  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#########  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##########  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###########  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


############  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#############  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##############  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###############  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


####################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#####################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


######################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#######################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


########################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#########################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##########################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###########################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


############################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#############################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##############################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###############################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


####################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#####################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


######################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#######################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


########################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


#########################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


##########################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


###########################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


############################################  Create the temperature and pressure Xarray Dataset we made in Tutorial 1


In [None]:
# @title Create the temperature and pressure Xarray Dataset we made in Tutorial 1

#Temperature data
data = 283 + 5 * np.random.randn(5, 3, 4)
temp = xr.DataArray(data, dims=['time', 'lat', 'lon'])
times = pd.date_range('2018-01-01', periods=5)
lons = np.linspace(-120, -60, 4)
lats = np.linspace(25, 55, 3)
temp = xr.DataArray(data, coords=[times, lats, lons], dims=['time', 'lat', 'lon'])
temp.attrs['units'] = 'kelvin'
temp.attrs['standard_name'] = 'air_temperature'

#Pressure data
pressure_data = 1000.0 + 5 * np.random.randn(5, 3, 4)
pressure = xr.DataArray(
    pressure_data, coords=[times, lats, lons], dims=['time', 'lat', 'lon']
)
pressure.attrs['units'] = 'hPa'
pressure.attrs['standard_name'] = 'air_pressure'

#Combinate temperature and pressure DataArrays into a Dataset
ds = xr.Dataset(data_vars={'Temperature': temp, 'Pressure': pressure})

## Subsetting and selection by coordinate values

Much of the power of labeled coordinates comes from the ability to select data based on coordinate names and values, rather than array indices. We'll explore this briefly here.

### NumPy-like selection

Suppose we want to extract all the spatial data for one single date: January 2, 2018. It's possible to achieve that with NumPy-like index selection:

In [None]:
indexed_selection = temp[1, :, :]  # Index 1 along axis 0 is the time slice we want...
indexed_selection

HOWEVER, notice that this requires us (the user / programmer) to have **detailed knowledge** of the order of the axes and the meaning of the indices along those axes!

_**Named coordinates free us from this burden...**_

### Selecting with `.sel()`

We can instead select data based on coordinate values using the `.sel()` method, which takes one or more named coordinate(s) as keyword argument:

In [None]:
named_selection = temp.sel(time='2018-01-02')
named_selection

We got the same result, but 
- we didn't have to know anything about how the array was created or stored
- our code is agnostic about how many dimensions we are dealing with
- the intended meaning of our code is much clearer!

### Approximate selection and interpolation

With time and space data, we frequently want to sample "near" the coordinate points in our dataset. Here are a few simple ways to achieve that.

#### Nearest-neighbor sampling

Suppose we want to sample the nearest datapoint within 2 days of date `2018-01-07`. Since the last day on our `time` axis is `2018-01-05`, this is well-posed.

`.sel` has the flexibility to perform nearest neighbor sampling, taking an optional tolerance:

In [None]:
temp.sel(time='2018-01-07', method='nearest', tolerance=timedelta(days=2))

where we see that `.sel` indeed pulled out the data for date `2018-01-05`.

#### Interpolation

Suppose we want to extract a timeseries for Boulder (40°N, 105°W). Since `lon=-105` is _not_ a point on our longitude axis, this requires interpolation between data points.

The `.interp()` method (see the docs [here](http://xarray.pydata.org/en/stable/interpolation.html)) works similarly to `.sel()`. Using `.interp()`, we can interpolate to any latitude/longitude location:

In [None]:
temp.interp(lon=-105, lat=40)

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
    Xarray's interpolation functionality requires the <a href="https://scipy.org/">SciPy</a> package!
</div>

### Slicing along coordinates

Frequently we want to select a range (or _slice_) along one or more coordinate(s). We can achieve this by passing a Python [slice](https://docs.python.org/3/library/functions.html#slice) object to `.sel()`, as follows:

In [None]:
temp.sel(
    time=slice('2018-01-01', '2018-01-03'), lon=slice(-110, -70), lat=slice(25, 45)
)

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
    The calling sequence for <code>slice</code> always looks like <code>slice(start, stop[, step])</code>, where <code>step</code> is optional.
</div>

Notice how the length of each coordinate axis has changed due to our slicing.

### One more selection method: `.loc`

All of these operations can also be done within square brackets on the `.loc` attribute of the `DataArray`:


In [None]:
temp.loc['2018-01-02']

This is sort of in between the NumPy-style selection
```
temp[1,:,:]
```
and the fully label-based selection using `.sel()`

With `.loc`, we make use of the coordinate *values*, but lose the ability to specify the *names* of the various dimensions. Instead, the slicing must be done in the correct order:

In [None]:
temp.loc['2018-01-01':'2018-01-03', 25:45, -110:-70]

One advantage of using `.loc` is that we can use NumPy-style slice notation like `25:45`, rather than the more verbose `slice(25,45)`. But of course that also works:

In [None]:
temp.loc['2018-01-01':'2018-01-03', slice(25, 45), -110:-70]

What *doesn't* work is passing the slices in a different order:

In [None]:
# This will generate an error
# temp.loc[-110:-70, 25:45,'2018-01-01':'2018-01-03']