## End-to-End Workflow -- Book 1/4 Importing Modules and the Xarray DataSet

### Demonstrating Python Tools through the Calculation of Oceah Heat Content

Now that you're familiar with the Jupyter Notebook workspace, let's use some Python in a way that mirrors a potential usecase and integrates the teaching of Python geoscience tools when you would need them.

----------------

### Step 1 -- Importing Modules

This workflow will utilize the Xarray and Cf_units modules. Read about these here[]. 

Modules are `.py` Python files that consist of Python code to be called upon (read: imported) into other Python files or in the command line. A module contains Python classes, functions, or variables to be referenced elsewhere. This allows you to tuck away widely used helper functions, while providing a pointer to where the base code is.

You can import a whole package, change its name using the `as` code, or import only select functions using `from`.

It is common practice to import all modules at once at the beginning of a script, but we will import modules as we use them for clarity of use. For this notebook you will only need xarray.

In [1]:
import xarray as xr

Did that work for you? If you do not have xarray installed it won't work. 

-- Let's pause --
and make sure everyone has xarray!

----------------

### Step 2 -- Reading in a .nc File Using Xarray.open_dataset

**What is Xarray?** 
Xarray is a package that allows for the labeling of dimensions in multi-dimensional datasets. [Here](http://xarray.pydata.org/en/stable/index.html) is Xarray's documentation.

**To open a file** pass into `xarray.open_dataset` the relative file path plus name (`file`).

In [2]:
path = '../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file)

**What is an Xarray DataSet anyway?**

Let's take a look!

You will see dimensions, coordinates, variables, and attributes.

**Dimensions**, or `dims`, are comprable to x, y, z arrays that span the length of your dataset in its first, second, and third dimension. What is unique about having dimensions in an xarray dataset, is you now have the functionality to name your dimensions in a way that has physical understanding. So if your data is two spatial dimensions and one time, you can name it lat, lon, and time, instead of 0, 1, and 2.

**Coordinates**, or `coords`, contain information about each dimension. So the actual latitude, longitude, and time values as opposed to a generic array of the same length. You can have dimensionless coordinates or coordinateless dimensions.

**Variables**, or `vars`, are your data. You can have more than one variable in a dataset.

**Attributes**, or `attrs`, are everything else. All the meta data associated with this dataset.

In [None]:
ds

**What is an Xarray DataArray?**

A DataArray is smaller than a DataSet. It only contains information pertaining to one variable.

One way to convert a DataSet and a DataArray is to select the variable. And there are two main ways of doing that:

In [None]:
da = ds['thetao']
da

In [None]:
da = ds.thetao
da

You can convert a DataArray back to a DataSet using `xarray.DataArray.to_dataset`. You would want to do this if you plan on adding new dimension, coordinate, or attribute information.

# <span style="color:red"> Task 1 - Select a different variable from the dataset ds </span>

Now you try isolating the DataArray for any other variable in the DataSet in the code cell block below:

In [None]:
# Your code here

<div class="alert alert-block alert-success">
  <p>Previous: <a href="01_modules_and_xarray_datasets.ipynb">What is Git?</a></p>
  <p>Next: <a href="02_subselecting_and_indexing_data.ipynb">Branches</a></p>
</div>