# Book 2 of 4: Subselecting and Indexing Data

### Demonstrating Python Tools through the Calculation of Oceah Heat Content


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Congratulations-You've-Opened-Your-First-Jupyter-Notebook!" data-toc-modified-id="Congratulations-You've-Opened-Your-First-Jupyter-Notebook!-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Congratulations You've Opened Your First Jupyter Notebook!</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#What-Is-A-Jupyter-Notebook?" data-toc-modified-id="What-Is-A-Jupyter-Notebook?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>What Is A Jupyter Notebook?</a></span></li><li><span><a href="#Notebook-User-Interface" data-toc-modified-id="Notebook-User-Interface-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Notebook User Interface</a></span></li><li><span><a href="#What-Is-A-Notebook-Cell?" data-toc-modified-id="What-Is-A-Notebook-Cell?-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>What Is A Notebook Cell?</a></span><ul class="toc-item"><li><span><a href="#Code-Cells" data-toc-modified-id="Code-Cells-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Code Cells</a></span></li><li><span><a href="#Markdown-Cells" data-toc-modified-id="Markdown-Cells-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Markdown Cells</a></span></li><li><span><a href="#Raw-Cells" data-toc-modified-id="Raw-Cells-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>Raw Cells</a></span></li></ul></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- Use Xarray's metadata and the `cfunits` package to convert units of dataset variables.

----------------

Previously On:
 - We imported the Xarray module and loaded our data.

In [None]:
import xarray as xr

path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file, chunks = {'time': 16})

da_thetao = ds['thetao']

------------

### 1 -- Selecting data by Index
We will look at this for only one point. Since ds.lev is a DataArray (ds alone is a DataSet), we can select data by in the index:

***Info about selecting by index here***

In [None]:
level_point = ds.lev[0]
level_point

# <span style="color:red"> Task 1 - Select a different point from the dataset ds </span>

Now you try isolating a point from any variable in the DataSet in the code cell block below:

In [None]:
# Your code here

### 2 -- Selecting data using `.isel`

Let's one point in time using Xarray's `isel` method. `isel` refers to index-select and allows you to name the dimension in which you are subselecting. Read more about indexing and selecting data from an Xarray DataSet [here](http://xarray.pydata.org/en/stable/indexing.html).

***More info here***

In [None]:
heat_time0 = heat.isel(time=0)
heat_time0

# <span style="color:red"> Task 2 - Use `.isel` to select data </span>

... in the code cell block below:

In [None]:
# Your code here

### 3 -- Selecting data using `.sel`

In [None]:
heat_Jan_1860 = heat.sel(time=slice('1860-01-01','1860-02-01')).squeeze('time')
heat_Jan_1860

# <span style="color:red"> Task 3 - Use `.sel` to select data </span>

... in the code cell block below:

In [None]:
# Your code here

### 4 -- Selecting data using `.where`

Now that we have our data, and have made sure the variables are in the correct units - let's use `xarray.DataArray.where` to filter our data to the depths where we want to integrate ocean heat content. This takes arguments as follows:

```python
da_filtered_2_conditions = da.where(condition, other = <NA>, drop = False)
```

Where the `other` and `drop` keyword specifies what you want to do with the data points that do not meet the condition.

So if you wanted to limit lev_bnds_m to the top 50 meters of the ocean depth you would type:

In [None]:
level_bounds_limited = level_bounds_in_m.where(level_bounds_in_m < 50, drop = True)
level_bounds_limited.values

# <span style="color:red"> Task 4 - Select a data using `.where` </span>

Limit depth to the top 100 meters in the code cell block below:

In [None]:
# Your code here

### 5 -- Using the `other` keyword in `.where`

This is close, but we want that last bound to be 50, not 'nan'. So we will use the other keyword, instead of drop. This allows us to specify that we want all bounds that do not meet the condition, to be turned to the value 50.

In [None]:
level_bounds_limited = level_bounds_in_m.where(level_bounds_in_m < 50, other = 50)
level_bounds_limited.values

But we don't want to carry around all these extra indices! We will now drop whenever the difference between the top and bottom of a level is 0.

To perform an integration, we need to find the step size (so the distance between each level bounds pair).

Python is zero-indexed so `a[:,0]` means values from all (`:`) rows in the zeroth (`0`) (or first or leftmost - however is easier for you to think of it) column. And `a[:,1]` means values from all rows in the next (rightmost) column.

In [None]:
delta_level = abs(level_bounds_limited[:, 1] - level_bounds_limited[:, 0])
delta_level.values

Then we limit our delta_lev values to drop all values of zero.

In [None]:
delta_level_limited = delta_level.where(delta_level != 0, drop = True)
delta_level_limited

# <span style="color:red"> Task 5 - ... </span>

... in the code cell block below:
We want to limit our temperature values by the same condition

In [None]:
# Your code here

Here is my solution:

In [None]:
temperature_limited = temperature_in_degK.where(delta_level != 0, drop = True)

-----------

### 6 -- Writing functions

**Review of functions**

# <span style="color:red"> Task 6 - ... </span>

Let's turn this functionality into a function that repeats this work flow (replaces depth values above the limit with the limit value, finds the delta values for each level, and limits the level deltas and temperature coordinates to the depth limit) for any depth limit. 
...in the code cell block below:

In [None]:
# Your code here

Mine looks like:

In [None]:
def limit_depth_of_variables(level_bounds, temperature, depth_limit): 
    level_bounds_limited = level_bounds.where(level_bounds < depth_limit, depth_limit)
    delta_level = abs(level_bounds_limited[:, 1] - level_bounds_limited[:, 0])
    
    delta_level_limited = delta_level.where(delta_level != 0, drop = True)
    temperature_limited = temperature.where(delta_level != 0, drop = True)
    
    return delta_level_limited, temperature_limited

In [None]:
delta_level_limited, temperature_limited = limit_depth_of_variables(level_bounds_in_m, temperature_in_degK, 50)
delta_level_limited, temperature_limited

## Going further:

<div class="alert alert-block alert-success">
  <p>Previous: <a href="01_modules_and_xarray_datasets.ipynb">Modules and Xarray Datasets</a></p>
  <p>Next: <a href="03_selecting_and_indexing_data.ipynb">Selecting and Indexing Data</a></p>
</div>