# Book 2 of 4: Subselecting and Indexing Data

### Demonstrating Python Tools through the Calculation of Oceah Heat Content


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Congratulations-You've-Opened-Your-First-Jupyter-Notebook!" data-toc-modified-id="Congratulations-You've-Opened-Your-First-Jupyter-Notebook!-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Congratulations You've Opened Your First Jupyter Notebook!</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#What-Is-A-Jupyter-Notebook?" data-toc-modified-id="What-Is-A-Jupyter-Notebook?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>What Is A Jupyter Notebook?</a></span></li><li><span><a href="#Notebook-User-Interface" data-toc-modified-id="Notebook-User-Interface-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Notebook User Interface</a></span></li><li><span><a href="#What-Is-A-Notebook-Cell?" data-toc-modified-id="What-Is-A-Notebook-Cell?-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>What Is A Notebook Cell?</a></span><ul class="toc-item"><li><span><a href="#Code-Cells" data-toc-modified-id="Code-Cells-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Code Cells</a></span></li><li><span><a href="#Markdown-Cells" data-toc-modified-id="Markdown-Cells-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Markdown Cells</a></span></li><li><span><a href="#Raw-Cells" data-toc-modified-id="Raw-Cells-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>Raw Cells</a></span></li></ul></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- To understand different selecting and indexing methods (`.isel`, `.sel`, `.where`, and `.groupby`) through the application of calculating ocean heat content, where we limit our dataset by depth.

----------------

Previously On:
 - We imported the Xarray module and loaded our data.

In [4]:
import xarray as xr

path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file, chunks = {'time': 16})

da_thetao = ds['thetao']

------------

## 1 -- Selecting data by Index
We will look at this for only one point. One way we can select data by in the index.

In Python the `n`th index of an array, `a`, is specified by `a[n]`. Note that Python is 0-indexed, so `a[0]` returns the first element, and `a[1]` returns the second. `a[-1]` returns the last index. 

In [37]:
a = [0,1,2,3,4]
print(a)
print(a[0])
print(a[-1])

To select a range use `:`, as in `a[n:m]` where `n` is inclusive and `m` is exclusive. `a[1:-1]` returns all elements except the first and the last (the second through the second to last). Use `a[1:]` to return all elements except the first.

In [38]:
print(a[1:-1])
print(a[1:])

We can use the same method for a DataArray.

In [39]:
level_point = ds['lev'][0]
level_point

# <span style="color:red"> Task 1 - Select a different point from the dataset ds </span>

Now you try isolating a point from any variable in the DataSet in the code cell block below:

In [None]:
# Your code here

-----------

## 2 -- Selecting data using `.isel`

`isel` refers to index-select and allows you to name the dimension in which you are subselecting. Read more about indexing and selecting data from an Xarray DataSet [here](http://xarray.pydata.org/en/stable/indexing.html).

In [40]:
heat_time0 = ds['thetao'].isel(time=0)
heat_time0

# <span style="color:red"> Task 2 - Use `.isel` to select data from the dataset ds </span>

Use `.isel`, specifying the dimension and the index, in the code cell block below:

In [None]:
# Your code here

--------

## 3 -- Selecting data using `.sel`

`.sel` is similar to `.isel` except you specify the value instead of the index. You can select a specific date in yyy-mm-dd format from which you want data, and Xarray will find the closest datapoint to your specified value, instead of having to figure out which index along the time dimension corresponds to the desired date.

To specify multiple values use `slice`.

In [41]:
heat_1860 = ds['thetao'].sel(time=slice('1860-01-01','1861-01-01'))
heat_1860

# <span style="color:red"> Task 3 - Use `.sel` to select data </span>

Use `.sel` to specify a lat-lon point from `ds['thetao']` in the code cell block below:

In [None]:
# Your code here

---------

## 4 -- Selecting data using `.where`

`xarray.DataArray.where` takes arguments as follows:

```python
da_filtered_2_conditions = da.where(condition, other = <NA>, drop = False)
```

Where the `other` and `drop` keyword specifies what you want to do with the data points that do not meet the condition.

So if you wanted to limit lev_bnds_m to the top 50 meters of the ocean depth you would type:

In [43]:
level_bounds_limited = ds['lev_bnds'].where(ds['lev_bnds'] < 50, drop = True)
level_bounds_limited.values

# <span style="color:red"> Task 4 - Select a data using `.where` </span>

Limit depth to the top 100 meters in the code cell block below:

In [None]:
# Your code here

###  - Using the `other` keyword in `.where`
This is close, but we want that last bound to be the same as our limit, not 'NaN'. So we will use the other keyword, instead of the drop keyword. This allows us to specify that we want all bounds that do not meet the condition, to be turned to a specific value.

In [44]:
level_bounds_limited = ds['lev_bnds'].where(ds['lev_bnds'] < 50, other = 50)
level_bounds_limited.values

We don't want to carry around all these extra indices! We will now drop whenever the difference between the top and bottom of a level is 0 (a step size of 0).

Here we have to use our original indexing method! But for a 2 dimensional array.

*Remember, python is zero-indexed* 
- `a[:,0]` means values from all (`:`) rows in the zeroth (`0`) column. 
- `a[:,1]` means values from all rows in the second column.

In [45]:
delta_level = abs(level_bounds_limited[:, 1] - level_bounds_limited[:, 0])
delta_level.values

Then we limit our delta_level values to drop all values of zero.

In [46]:
delta_level_limited = delta_level.where(delta_level != 0, drop = True)
delta_level_limited

# <span style="color:red"> Task 5 - Limit temperature values to depth </span>

We want to limit our temperature values by the same conditions. Do so in the code cell block below:

In [None]:
# Your code here

Here is my solution:

In [47]:
temperature_limited = ds['thetao'].where(delta_level != 0, drop = True)
temperature_limited

-----------

## 6 -- Writing functions

You may want to write functions. Naming tasks that you repeat is more readable that using the same block of code repeatedly. Functions save you time! It will be easier for you to understand what you did when you revisit code, it will be easier to explain what you did to someone else, and you won't have to rewrite the same code repeatedly.

In python the general format of a function looks as follows:

```python
def add_2_numbers(number_1, number2):
    sum_of_2_numbers = number_1 + number_2
    return sum_of_2_numbers
```

Where `add_2_numbers` is the function name (it is good practice to give your function an action name that describes what it **does**), `number_1` and `number_2` are inputs, and `sum_of_2_numbers` is your output.

To call up this function it would look like:

```python
number_3 = add_2_numbers(7, 8)
```

# <span style="color:red"> Task 6 - Write a function to limit data by depth </span>

Let's turn this functionality into a function that repeats this work flow (replaces depth values above the limit with the limit value, finds the delta values for each level, and limits the level deltas and temperature coordinates to the depth limit) for any depth limit. Do this in the code cell block below:

In [None]:
# Your code here

Mine looks like:

In [48]:
def limit_depth_of_variables(level_bounds, temperature, depth_limit): 
    level_bounds_limited = level_bounds.where(level_bounds < depth_limit, depth_limit)
    delta_level = abs(level_bounds_limited[:, 1] - level_bounds_limited[:, 0])
    
    delta_level_limited = delta_level.where(delta_level != 0, drop = True)
    temperature_limited = temperature.where(delta_level != 0, drop = True)
    
    return delta_level_limited, temperature_limited

In [51]:
delta_level_limited, temperature_limited = limit_depth_of_variables(ds['lev_bnds'], ds['thetao'], 50)
delta_level_limited, temperature_limited

-----------

## 6 -- Additional Method: `.groupby`

You may not need `.groupby` in this specific workflow, but it is a common tool to select data so we will explain it briefly here.

`.groupby` allows you to split your data into distinct groups, apply some functionality to each group, and recombine your data into one object.

Below is an example of computing standard monthly averages (across all years) for all data variables in our DataSet.

In [62]:
ds_monthy_mean = ds.groupby('time.month').mean()
ds_monthy_mean

-----------

## Going further:
- Xarray's Indexing and Selecting Data Documentation: http://xarray.pydata.org/en/stable/indexing.html

<div class="alert alert-block alert-success">
  <p>Previous: <a href="01_modules_and_xarray_datasets.ipynb">Modules and Xarray Datasets</a></p>
  <p>Next: <a href="03_selecting_and_indexing_data.ipynb">Selecting and Indexing Data</a></p>
</div>