# Book 1 of 4: Importing Modules and the Xarray DataSet

### Demonstrating Python Tools through the Calculation of Oceah Heat Content


<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Congratulations-You've-Opened-Your-First-Jupyter-Notebook!" data-toc-modified-id="Congratulations-You've-Opened-Your-First-Jupyter-Notebook!-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Congratulations You've Opened Your First Jupyter Notebook!</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#What-Is-A-Jupyter-Notebook?" data-toc-modified-id="What-Is-A-Jupyter-Notebook?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>What Is A Jupyter Notebook?</a></span></li><li><span><a href="#Notebook-User-Interface" data-toc-modified-id="Notebook-User-Interface-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Notebook User Interface</a></span></li><li><span><a href="#What-Is-A-Notebook-Cell?" data-toc-modified-id="What-Is-A-Notebook-Cell?-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>What Is A Notebook Cell?</a></span><ul class="toc-item"><li><span><a href="#Code-Cells" data-toc-modified-id="Code-Cells-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Code Cells</a></span></li><li><span><a href="#Markdown-Cells" data-toc-modified-id="Markdown-Cells-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Markdown Cells</a></span></li><li><span><a href="#Raw-Cells" data-toc-modified-id="Raw-Cells-1.4.3"><span class="toc-item-num">1.4.3&nbsp;&nbsp;</span>Raw Cells</a></span></li></ul></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- Now that you're familiar with the Jupyter Notebook workspace, let's use some Python in a way that mirrors a potential usecase and integrates the teaching of Python geoscience tools when you would need them. This notebook will follow the first step of almost any workflow - reading in your data. We will show you how to do so with Xarray.

----------------

## 1 -- Importing Modules

Modules are `.py` Python files that consist of Python code to be called upon (read: imported) into other Python files or in the command line. A module contains Python classes, functions, or variables to be referenced elsewhere.

You can `import` a whole package, change its name using the `as` code, or import only select functions using `from`.

It is common practice to import all modules at once at the beginning of a script, but we will import modules as we use them for clarity of each module's application. For this notebook you will only need xarray.

In [12]:
import xarray as xr

Did that work for you? If you do not have xarray installed it won't work. 

-- Let's pause --
and make sure everyone has xarray!

----------------

## 2 -- Reading in a .nc File Using Xarray.open_dataset

**What is Xarray?** 
Xarray is a package that allows for the labeling of dimensions in multi-dimensional datasets. [Here](http://xarray.pydata.org/en/stable/index.html) is Xarray's documentation.

**To open a file** pass into `xarray.open_dataset` the relative file path plus name (`file`).

In [4]:
path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file)

**What is an Xarray DataSet anyway?**

Let's take a look!

You will see dimensions, coordinates, variables, and attributes.

![Image of Dataset-Components](https://ecco-v4-python-tutorial.readthedocs.io/_images/Dataset-diagram.png)
[Diagram Source](https://ecco-v4-python-tutorial.readthedocs.io/ECCO_v4_data_structure_basics.html)

**Dimensions**, or `dims`, are comprable to x, y, z arrays that span the length of your dataset in its first, second, and third dimension. What is unique about having dimensions in an xarray dataset, is you now have the functionality to name your dimensions in a way that has physical understanding. So if your data is two spatial dimensions and one time, you can name it lat, lon, and time, instead of 0, 1, and 2. Dimensions are stored in a dictionary.

**Coordinates**, or `coords`, contain information about each dimension. So the actual latitude, longitude, and time values as opposed to a generic array of the same length. You can have dimensionless coordinates or coordinateless dimensions. Coordinates are stored as an array.

**Variables**, or `vars`, are your data. You can have more than one variable in a dataset. Variables are stored as an array.

**Attributes**, or `attrs`, are everything else. All the meta data associated with this dataset. Attributes are stored as a dictionary.

In [1]:
ds

This outputs all of the information in your dataset. *Think: like ncdump but for Xarray*

----------------------

## 3 -- What is an Xarray DataArray?**

A DataArray is smaller than a DataSet. It only contains information pertaining to one variable (all relevant coordinates, dimensions, and attributes). This is in contrast to the Dataset, which is a container for DataArrays.

One way to convert a DataSet and a DataArray is to select the variable. And there are two main ways of doing that:

In [4]:
da = ds['thetao']
da

In [5]:
da = ds.thetao
da

# <span style="color:red"> Task 1 - Select a different variable from the dataset ds </span>

Now you try isolating a DataArray for any other variable in the DataSet in the code cell block below:

In [None]:
# Your code here

You can convert a DataArray back to a DataSet using `xarray.DataArray.to_dataset`. You would want to do this if you plan on adding new dimension, coordinate, or attribute information.

# <span style="color:red"> Task 2 - Convert your DataArray to a DataSet </span>

Use `to_dataset` in the code cell block below:

In [None]:
# Your code here

# <span style="color:red"> Task 3 - Getting coords, dims, and attrs information out of the DataSet/DataArray </span>

Play with the command `ds.coords` for your dataset `ds` or DataArray `da` for `coords`, `dims`, and `attrs` in the code cell block below:

In [None]:
# Your code here

Do you understand the difference between how Xarray treats coordinates and dimensions? Can you figure it out?

-----------

## Going further:
 - Xarray Documentation: http://xarray.pydata.org/en/stable/index.html

<div class="alert alert-block alert-success">
  <p>Previous: <a href="01_modules_and_xarray_datasets.ipynb">What is Git?</a></p>
  <p>Next: <a href="02_subselecting_and_indexing_data.ipynb">Branches</a></p>
</div>