# Book 1 of 4: Importing Modules and the Xarray DataSet

### Demonstrating Python Tools through the Calculation of Oceah Heat Content


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Congratulations-You've-Opened-Your-First-Jupyter-Notebook!" data-toc-modified-id="Congratulations-You've-Opened-Your-First-Jupyter-Notebook!-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Congratulations You've Opened Your First Jupyter Notebook!</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#What-Is-A-Jupyter-Notebook?" data-toc-modified-id="What-Is-A-Jupyter-Notebook?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>What Is A Jupyter Notebook?</a></span></li><li><span><a href="#Notebook-User-Interface" data-toc-modified-id="Notebook-User-Interface-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Notebook User Interface</a></span></li><li><span><a href="#What-Is-A-Notebook-Cell?" data-toc-modified-id="What-Is-A-Notebook-Cell?-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>What Is A Notebook Cell?</a></span><ul class="toc-item"><li><span><a href="#Code-Cells" data-toc-modified-id="Code-Cells-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Code Cells</a></span></li><li><span><a href="#Markdown-Cells" data-toc-modified-id="Markdown-Cells-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Markdown Cells</a></span></li><li><span><a href="#Raw-Cells" data-toc-modified-id="Raw-Cells-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>Raw Cells</a></span></li></ul></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- Now that you're familiar with the Jupyter Notebook workspace, let's use some Python in a way that mirrors a potential usecase and integrates the teaching of Python geoscience tools when you would need them. This notebook will follow the first step of almost any workflow - reading in your data. We will show you how to do so with Xarray.

----------------

### 1 -- Importing Modules

This workflow will utilize the Xarray and Cf_units modules. Read about these here[]. 

Modules are `.py` Python files that consist of Python code to be called upon (read: imported) into other Python files or in the command line. A module contains Python classes, functions, or variables to be referenced elsewhere. This allows you to tuck away widely used helper functions, while providing a pointer to where the base code is.

You can import a whole package, change its name using the `as` code, or import only select functions using `from`.

It is common practice to import all modules at once at the beginning of a script, but we will import modules as we use them for clarity of use. For this notebook you will only need xarray.

In [2]:
import xarray as xr

Did that work for you? If you do not have xarray installed it won't work. 

-- Let's pause --
and make sure everyone has xarray!

----------------

### 2 -- Reading in a .nc File Using Xarray.open_dataset

**What is Xarray?** 
Xarray is a package that allows for the labeling of dimensions in multi-dimensional datasets. [Here](http://xarray.pydata.org/en/stable/index.html) is Xarray's documentation.

**To open a file** pass into `xarray.open_dataset` the relative file path plus name (`file`).

In [4]:
path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file)

**What is an Xarray DataSet anyway?**

Let's take a look!

You will see dimensions, coordinates, variables, and attributes.

**Dimensions**, or `dims`, are comprable to x, y, z arrays that span the length of your dataset in its first, second, and third dimension. What is unique about having dimensions in an xarray dataset, is you now have the functionality to name your dimensions in a way that has physical understanding. So if your data is two spatial dimensions and one time, you can name it lat, lon, and time, instead of 0, 1, and 2.

**Coordinates**, or `coords`, contain information about each dimension. So the actual latitude, longitude, and time values as opposed to a generic array of the same length. You can have dimensionless coordinates or coordinateless dimensions.

**Variables**, or `vars`, are your data. You can have more than one variable in a dataset.

**Attributes**, or `attrs`, are everything else. All the meta data associated with this dataset.

In [5]:
ds

<xarray.Dataset>
Dimensions:    (bnds: 2, lat: 180, lev: 40, lon: 288, time: 252)
Coordinates:
  * time       (time) object 1850-01-16 12:00:00 ... 1870-12-16 12:00:00
  * lev        (lev) float64 5.0 16.0 29.0 ... 4.453e+03 4.675e+03 4.897e+03
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lon        (lon) float64 0.625 1.875 3.125 4.375 ... 355.6 356.9 358.1 359.4
Dimensions without coordinates: bnds
Data variables:
    time_bnds  (time, bnds) object ...
    lev_bnds   (lev, bnds) float64 ...
    lat_bnds   (lat, bnds) float64 ...
    lon_bnds   (lon, bnds) float64 ...
    thetao     (time, lev, lat, lon) float32 ...
Attributes:
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    branch_time_in_parent:  0.0
    contact:                Kenneth Lo (cdkkl@giss.nasa.gov)
    creation_date:          2018-08-27T13:51:30Z
    data_specs_version:     01.00.23
  

## Think: like ncdump for Xarray

**What is an Xarray DataArray?**

A DataArray is smaller than a DataSet. It only contains information pertaining to one variable.


It is one variable -- with all of its coordinates.
Dataset is a container for dataarrays.

One way to convert a DataSet and a DataArray is to select the variable. And there are two main ways of doing that:

In [6]:
da = ds['thetao']
da

<xarray.DataArray 'thetao' (time: 252, lev: 40, lat: 180, lon: 288)>
[522547200 values with dtype=float32]
Coordinates:
  * time     (time) object 1850-01-16 12:00:00 ... 1870-12-16 12:00:00
  * lev      (lev) float64 5.0 16.0 29.0 44.0 ... 4.453e+03 4.675e+03 4.897e+03
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 0.625 1.875 3.125 4.375 ... 355.6 356.9 358.1 359.4
Attributes:
    standard_name:  sea_water_potential_temperature
    long_name:      Sea Water Potential Temperature
    comment:        Diagnostic should be contributed even for models using co...
    units:          degC
    cell_methods:   area: mean where sea time: mean
    cell_measures:  area: areacello volume: volcello
    history:        2018-08-27T13:51:26Z altered by CMOR: replaced missing va...

In [7]:
da.coords

Coordinates:
  * time     (time) object 1850-01-16 12:00:00 ... 1870-12-16 12:00:00
  * lev      (lev) float64 5.0 16.0 29.0 44.0 ... 4.453e+03 4.675e+03 4.897e+03
  * lat      (lat) float64 -89.5 -88.5 -87.5 -86.5 -85.5 ... 86.5 87.5 88.5 89.5
  * lon      (lon) float64 0.625 1.875 3.125 4.375 ... 355.6 356.9 358.1 359.4

In [8]:
da.attrs

OrderedDict([('standard_name', 'sea_water_potential_temperature'),
             ('long_name', 'Sea Water Potential Temperature'),
             ('comment',
              'Diagnostic should be contributed even for models using conservative temperature as prognostic field.'),
             ('units', 'degC'),
             ('cell_methods', 'area: mean where sea time: mean'),
             ('cell_measures', 'area: areacello volume: volcello'),
             ('history',
              '2018-08-27T13:51:26Z altered by CMOR: replaced missing value flag (-1e+30) with standard missing value (1e+20).')])

In [None]:
da = ds.thetao
da

You can convert a DataArray back to a DataSet using `xarray.DataArray.to_dataset`. You would want to do this if you plan on adding new dimension, coordinate, or attribute information.

# <span style="color:red"> Task 1 - Select a different variable from the dataset ds </span>

Now you try isolating the DataArray for any other variable in the DataSet in the code cell block below:

In [None]:
# Your code here

-----------

## Going further:

<div class="alert alert-block alert-success">
  <p>Previous: <a href="01_modules_and_xarray_datasets.ipynb">What is Git?</a></p>
  <p>Next: <a href="02_subselecting_and_indexing_data.ipynb">Branches</a></p>
</div>