# Book 2 of 4: Subselecting and Indexing Data

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Book-2-of-4:-Subselecting-and-Indexing-Data" data-toc-modified-id="Book-2-of-4:-Subselecting-and-Indexing-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Book 2 of 4: Subselecting and Indexing Data</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#Previously-On:" data-toc-modified-id="Previously-On:-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Previously On:</a></span></li><li><span><a href="#Selecting-data-by-Index" data-toc-modified-id="Selecting-data-by-Index-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Selecting data by Index</a></span><ul class="toc-item"><li><span><a href="#-Task-1---Select-a-different-point-from-the-dataset-ds-" data-toc-modified-id="-Task-1---Select-a-different-point-from-the-dataset-ds--1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span> Task 1 - Select a different point from the dataset ds </a></span></li></ul></li><li><span><a href="#Selecting-data-using-.isel" data-toc-modified-id="Selecting-data-using-.isel-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Selecting data using <code>.isel</code></a></span><ul class="toc-item"><li><span><a href="#-Task-2---Use-`.isel`-to-select-data-from-the-dataset-ds-" data-toc-modified-id="-Task-2---Use-`.isel`-to-select-data-from-the-dataset-ds--1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span> Task 2 - Use `.isel` to select data from the dataset ds </a></span></li></ul></li><li><span><a href="#Selecting-data-using-.sel" data-toc-modified-id="Selecting-data-using-.sel-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Selecting data using <code>.sel</code></a></span><ul class="toc-item"><li><span><a href="#-Task-3---Use-`.sel`-to-select-data-" data-toc-modified-id="-Task-3---Use-`.sel`-to-select-data--1.5.1"><span class="toc-item-num">1.5.1&nbsp;&nbsp;</span> Task 3 - Use `.sel` to select data </a></span></li></ul></li><li><span><a href="#Selecting-data-using-.where" data-toc-modified-id="Selecting-data-using-.where-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Selecting data using <code>.where</code></a></span><ul class="toc-item"><li><span><a href="#-Task-4---Select-a-data-using-`.where`-" data-toc-modified-id="-Task-4---Select-a-data-using-`.where`--1.6.1"><span class="toc-item-num">1.6.1&nbsp;&nbsp;</span> Task 4 - Select a data using `.where` </a></span></li><li><span><a href="#Using-the-other-keyword-in-.where" data-toc-modified-id="Using-the-other-keyword-in-.where-1.6.2"><span class="toc-item-num">1.6.2&nbsp;&nbsp;</span>Using the <code>other</code> keyword in <code>.where</code></a></span></li><li><span><a href="#-Task-5---Limit-temperature-values-to-depth-" data-toc-modified-id="-Task-5---Limit-temperature-values-to-depth--1.6.3"><span class="toc-item-num">1.6.3&nbsp;&nbsp;</span> Task 5 - Limit temperature values to depth </a></span></li></ul></li><li><span><a href="#Writing-functions" data-toc-modified-id="Writing-functions-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Writing functions</a></span><ul class="toc-item"><li><span><a href="#-Task-6---Write-a-function-to-limit-data-by-depth-" data-toc-modified-id="-Task-6---Write-a-function-to-limit-data-by-depth--1.7.1"><span class="toc-item-num">1.7.1&nbsp;&nbsp;</span> Task 6 - Write a function to limit data by depth </a></span></li></ul></li><li><span><a href="#Additional-Method:-.groupby" data-toc-modified-id="Additional-Method:-.groupby-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Additional Method: <code>.groupby</code></a></span></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- Understand understand different selecting and indexing methods (`.isel`, `.sel`, `.where`, and `.groupby`) through the application of calculating ocean heat content, where we limit our dataset by depth.

----------------

## Previously On:

 - We imported the Xarray library
 - We open a netCDF file containing our data, and extracted an xarray DataArray.

In [None]:
import xarray as xr

path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file)

da_thetao = ds['thetao']

------------

## Selecting data by Index


We will look at this for only one point. One way we can select data by in the index.

In Python the `n`th index of an array, `a`, is specified by `a[n]`. Note that Python is 0-indexed, so `a[0]` returns the first element, and `a[1]` returns the second. `a[-1]` returns the last index. 

In [None]:
a = [0,1,2,3,4]
print(a)
print(a[0])
print(a[-1])

To select a range use `:`, as in `a[n:m]` where `n` is inclusive and `m` is exclusive. `a[1:-1]` returns all elements except the first and the last (the second through the second to last). Use `a[1:]` to return all elements except the first.

In [None]:
print(a[1:-1])
print(a[1:])

We can use the same method for a DataArray.

In [None]:
level_point = ds['lev'][0]
level_point

<h3 style="color:red"> Task 1 - Select a different point from the dataset ds </h3>

Now you try isolating a point from any variable in the DataSet in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_2_1.py

-----------

## Selecting data using `.isel`

`isel` refers to index-select and allows you to name the dimension in which you are subselecting. Read more about indexing and selecting data from an Xarray DataSet [here](http://xarray.pydata.org/en/stable/indexing.html).

In [None]:
thetao_time0 = ds['thetao'].isel(time=0)
thetao_time0

<h3 style="color:red"> Task 2 - Use `.isel` to select data from the dataset ds </h3>

Use `.isel`, specifying the dimension and the index, in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_2_2.py

--------

## Selecting data using `.sel`

`.sel` is similar to `.isel` except you specify the value instead of the index. You can select a specific date in `yyy-mm-dd` format from which you want data, and Xarray will find the closest datapoint to your specified value, instead of having to figure out which index along the time dimension corresponds to the desired date. 

You can use the `method` keyword to specify how `.sel` should select teh closest value. And use the `tolerance` keyword to specify a maximum allowed distance from the specified value.

To specify multiple values use `slice()` object. You can read more about `slice()` [here](https://docs.python.org/3/library/functions.html#slice)

In [None]:
# select data from 1860-01-01 and 1861-01-01 (excluded)
thetao_1860 = ds['thetao'].sel(time=slice('1860-01-01','1861-01-01'))
thetao_1860

<h3 style="color:red"> Task 3 - Use `.sel` to select data </h3>

Use `.sel` to specify a lat-lon point from `ds['thetao']` in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_2_3.py

---------

## Selecting data using `.where`

`xarray.DataArray.where` takes arguments as follows:

```python
da_filtered_2_conditions = da.where(condition, other = <NA>, drop = False)
```

Where the `other` and `drop` keyword specifies what you want to do with the data points that do not meet the condition.

So if you wanted to limit `lev_bnds_m` to the top 50 meters of the ocean depth you would type:

In [None]:
level_bounds_limited = ds['lev_bnds'].where(ds['lev_bnds'] < 50, drop = True)
level_bounds_limited.values

<h3 style="color:red"> Task 4 - Select a data using `.where` </h3>

Limit depth to the top 100 meters in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_2_4.py

###  Using the `other` keyword in `.where`
This is close, but we want that last bound to be the same as our limit, not 'NaN'. So we will use the other keyword, instead of the drop keyword. This allows us to specify that we want all bounds that do not meet the condition, to be turned to a specific value.

In [None]:
level_bounds_limited = ds['lev_bnds'].where(ds['lev_bnds'] < 50, other = 50)
level_bounds_limited.values

We don't want to carry around all these extra indices! We will now drop whenever the difference between the top and bottom of a level is 0 (a step size of 0).

Here we have to use our original indexing method! But for a 2 dimensional array.

*Remember, python is zero-indexed* 
- `a[:,0]` means values from all (`:`) rows in the zeroth (`0`) column. 
- `a[:,1]` means values from all rows in the second column.

In [None]:
delta_level = abs(level_bounds_limited[:, 1] - level_bounds_limited[:, 0])
delta_level.values

Then we limit our delta_level values to drop all values of zero.

In [None]:
delta_level_limited = delta_level.where(delta_level != 0, drop = True)
delta_level_limited

<h3 style="color:red"> Task 5 - Limit temperature values to depth </h3>

We want to limit our temperature values by the same conditions. Do so in the code cell block below:

In [None]:
# Your code here

Here is my solution:

In [None]:
# %load solutions/solution_2_5.py

-----------

## Writing functions

You may want to write functions. Naming tasks that you repeat is more readable that using the same block of code repeatedly. Functions save you time! It will be easier for you to understand what you did when you revisit code, it will be easier to explain what you did to someone else, and you won't have to rewrite the same code repeatedly.

In python the general format of a function looks as follows:

```python
def add_2_numbers(number_1, number2):
    sum_of_2_numbers = number_1 + number_2
    return sum_of_2_numbers
```

Where `add_2_numbers` is the function name (it is good practice to give your function an action name that describes what it **does**), `number_1` and `number_2` are inputs, and `sum_of_2_numbers` is your output.

To call up this function it would look like:

```python
number_3 = add_2_numbers(7, 8)
```

<h3 style="color:red"> Task 6 - Write a function to limit data by depth </h3>

Let's turn this functionality into a function that repeats this work flow (replaces depth values above the limit with the limit value, finds the delta values for each level, and limits the level deltas and temperature coordinates to the depth limit) for any depth limit. Do this in the code cell block below:

In [None]:
# Your code here

Mine looks like:

In [None]:
# %load solutions/solution_2_6.py

-----------

## Additional Method: `.groupby`

You may not need `.groupby` in this specific workflow, but it is a common tool to select data so we will explain it briefly here.

`.groupby` allows you to split your data into distinct groups, apply some functionality to each group, and recombine your data into one object.

Below is an example of computing standard monthly averages (across all years) for all data variables in our DataSet.

In [None]:
ds_monthy_mean = ds.groupby('time.month').mean()
ds_monthy_mean

-----------

## Going Further


- [More in-depth xarray tutorial](../../bytopic/xarray/01_getting_started_with_xarray.ipynb)
- [Xarray's Indexing and Selecting Data Documentation](http://xarray.pydata.org/en/stable/indexing.html)


<div class="alert alert-block alert-success">
  <p>Previous: <a href="01_modules_and_xarray_datasets.ipynb">Modules and Xarray Datasets</a></p>
  <p>Next: <a href="03_units.ipynb">Working with Units</a></p>
</div>