# Book 3 of 4: Units Data

### Demonstrating Python Tools through the Calculation of Oceah Heat Content

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Congratulations-You've-Opened-Your-First-Jupyter-Notebook!" data-toc-modified-id="Congratulations-You've-Opened-Your-First-Jupyter-Notebook!-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Congratulations You've Opened Your First Jupyter Notebook!</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#What-Is-A-Jupyter-Notebook?" data-toc-modified-id="What-Is-A-Jupyter-Notebook?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>What Is A Jupyter Notebook?</a></span></li><li><span><a href="#Notebook-User-Interface" data-toc-modified-id="Notebook-User-Interface-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Notebook User Interface</a></span></li><li><span><a href="#What-Is-A-Notebook-Cell?" data-toc-modified-id="What-Is-A-Notebook-Cell?-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>What Is A Notebook Cell?</a></span><ul class="toc-item"><li><span><a href="#Code-Cells" data-toc-modified-id="Code-Cells-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Code Cells</a></span></li><li><span><a href="#Markdown-Cells" data-toc-modified-id="Markdown-Cells-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Markdown Cells</a></span></li><li><span><a href="#Raw-Cells" data-toc-modified-id="Raw-Cells-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>Raw Cells</a></span></li></ul></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- Use Xarray's metadata and the `cfunits` package to convert units of dataset variables.

----------------

### Previously:

- We imported the Xarray module and loaded our data.
- We wrote a function to subselect by depth.

In [None]:
import xarray as xr

path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'
ds = xr.open_dataset(file, chunks = {'time': 16})

def limit_depth_of_variables(level_bounds, temperature, depth_limit): 
    level_bounds_limited = level_bounds.where(level_bounds < depth_limit, depth_limit)
    delta_level = abs(level_bounds_limited[:, 1] - level_bounds_limited[:, 0])
    
    delta_level_limited = delta_level.where(delta_level != 0, drop = True)
    temperature_limited = temperature.where(delta_level != 0, drop = True)
    
    return delta_level_limited, temperature_limited

delta_level_limited, temperature_limited = limit_depth_of_variables(ds['lev_bnds'], ds['thetao'], 50)

----------------

### 1 -- Checking Units Manually

In an Xarray Dataset, units are stores as attributes of variables. You can access the units as follows:

In [9]:
level_units = ds['lev'].attrs['units']

<xarray.DataArray 'lev' ()>
array(5.)
Coordinates:
    lev      float64 5.0
Attributes:
    bounds:         lev_bnds
    units:          m
    axis:           Z
    positive:       down
    long_name:      ocean depth coordinate
    standard_name:  depth

In [None]:
####Use mapping to alter all units

### 2 -- Using `cf_units`

In many routines, checking and converting betweeen units is complicated. But in Python we have a great tool `cf_units` that can do this quickly and easily as long as the data is cf-compliant.

In [8]:
import cf_units as cf

**What is cf_units?** 
Cf_units is a package that stores, combines, and compares physical units, allowing the user to perform unit conversion. You can read more about this package [here](https://scitools.org.uk/cf-units/docs/latest/unit.html)

The first functionality we will use is `cf_units.Unit()` where you pass in units either as a string or by pointing to the datset attribute containing units and cf_units checks if this unit is supported and converts it to the class `Units`.

First, we wil look at the units at only one point along the levels dimension.

In [None]:
level_point = ds['lev'][0]
level_point

In [10]:
orig_units = cf.Unit(level_point.attrs['units'])
orig_units

Unit('m')

Then we will use `cf_units.Unit.convert` to convert from our original units to our target units.

In [11]:
target_units = cf.Unit('km')
orig_units.convert(level_point, target_units)

0.005

# <span style="color:red"> Task 1 - Perform unit conversion on a temperature value </span>

Use cf_units to make sure or convert the time variable (thetao) into degrees Kelvin (degK) in the code cell block below:

In [None]:
# Your code here

--------------

### 3 - Using `apply_ufunc`

You will notice that the `cf_units.Unit.convert` function caused us to lose the information contained in our Xarray DataArray. We will fix this by using `xarray.apply_ufunc` (u_func refers to user function).

**What is apply_ufunc?**
`apply_ufunc` is a tool from the Xarray package that maps functions. It allows you to apply a function to every element of a DataArray while maintaining Xarray's attribute formatting and functionality. You can read more about `xarray.apply_ufunc` [here](http://xarray.pydata.org/en/stable/generated/xarray.apply_ufunc.html).

In this example the keyword arguments to `apply_ufunc` are the function (`orig_units.convert`), then the input arguments of that function (the dataarray and target units), then dask (here we specify we want to parallelize the function), and `output_dtypes` (where we specify the datatype of the output to be the same as the input).

In [12]:
level_bounds_in_km = xr.apply_ufunc(orig_units.convert, ds.lev_bnds, target_units, output_dtypes=[ds.lev_bnds.dtype])
level_bounds_in_km

<xarray.DataArray (lev: 40, bnds: 2)>
dask.array<shape=(40, 2), dtype=float64, chunksize=(40, 2)>
Coordinates:
  * lev      (lev) float64 5.0 16.0 29.0 44.0 ... 4.453e+03 4.675e+03 4.897e+03
Dimensions without coordinates: bnds

# <span style="color:red"> Task 2 -  Write a function for unit conversion</span>

This is only took us three lines of code to write, but you may want to check, convert, or assert desired units for every variable in a dataset. Your code will be much easier to read if this process is inside a function which you call up in one line. So let's write a function for unit conversion in the code cell block below:

In [None]:
# Your code here

I used inputs of the dataset, the variable in that dataset, the variable bounds (because often the unit attribute is associated with the variable but you want to adjust values of the variable bounds too), and the target units. You may have used different inputs.

Here is my function:

In [13]:
def change_units(ds, variable_str, variable_bounds_str, target_unit_str):
    orig_units = cf.Unit(ds[variable_str].attrs['units'])
    target_units = cf.Unit(target_unit_str)
    variable_in_new_units = xr.apply_ufunc(orig_units.convert, ds[variable_bounds_str], target_units, dask='parallelized', output_dtypes=[ds[variable_bounds_str].dtype])
    return variable_in_new_units

# <span style="color:red"> Task 3 -  Call up your unit conversion function</span>
Now use your function to convert your level bounds (`lev_bnds`) and temperature (`thetao`) DataSet variables to units of meters ('m') and degrees Kelvin ('degK') in the code cell block below:

In [None]:
# Your code here

For my function, this looks as follows:

In [15]:
level_bounds_in_m = change_units(ds, 'lev', 'lev_bnds', 'm')
level_bounds_in_m

<xarray.DataArray (lev: 40, bnds: 2)>
dask.array<shape=(40, 2), dtype=float64, chunksize=(40, 2)>
Coordinates:
  * lev      (lev) float64 5.0 16.0 29.0 44.0 ... 4.453e+03 4.675e+03 4.897e+03
Dimensions without coordinates: bnds

In [16]:
temperature_in_degK = change_units(ds, 'thetao', 'thetao', 'degK') 
temperature_in_degK

<xarray.DataArray (time: 252, lev: 40, lat: 180, lon: 288)>
dask.array<shape=(252, 40, 180, 288), dtype=float32, chunksize=(8, 40, 180, 288)>
Coordinates:
  * time     (time) object 1850-01-16 12:00:00 ... 1870-12-16 12:00:00
  * lev      (lev) float64 5.0 16.0 29.0 44.0 ... 4.453e+03 4.675e+03 4.897e+03
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 0.625 1.875 3.125 4.375 ... 355.6 356.9 358.1 359.4

## Going Futher:
 - Reading about cf compliance:
 - Reading about cf_units the package:

<div class="alert alert-block alert-success">
  <p>Previous: <a href="02_subselecting_and_indexing_data.ipynb">Subselecting and Indexing Data</a></p>
  <p>Next: <a href="04_calculation_and_plotting.ipynb">Subselecting and Indexing Data</a></p>
</div>