# Book 1 of 4: Importing Modules and the Xarray DataSet

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Book-1-of-4:-Importing-Modules-and-the-Xarray-DataSet" data-toc-modified-id="Book-1-of-4:-Importing-Modules-and-the-Xarray-DataSet-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Book 1 of 4: Importing Modules and the Xarray DataSet</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#Importing-Modules" data-toc-modified-id="Importing-Modules-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Importing Modules</a></span></li><li><span><a href="#What-is-Xarray?" data-toc-modified-id="What-is-Xarray?-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>What is Xarray?</a></span></li><li><span><a href="#Reading-in-a-netCDF-file-Using-xarray.open_dataset" data-toc-modified-id="Reading-in-a-netCDF-file-Using-xarray.open_dataset-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Reading in a netCDF file Using xarray.open_dataset</a></span><ul class="toc-item"><li><span><a href="#What-is-an-Xarray-DataSet-anyway?" data-toc-modified-id="What-is-an-Xarray-DataSet-anyway?-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>What is an Xarray DataSet anyway?</a></span></li><li><span><a href="#What-is-an-Xarray-DataArray?" data-toc-modified-id="What-is-an-Xarray-DataArray?-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>What is an Xarray DataArray?</a></span></li></ul></li><li><span><a href="#Task-1---Select-a-different-variable-from-the-dataset-ds" data-toc-modified-id="Task-1---Select-a-different-variable-from-the-dataset-ds-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Task 1 - Select a different variable from the dataset ds</a></span></li><li><span><a href="#Task-2---Convert-your-DataArray-to-a-DataSet" data-toc-modified-id="Task-2---Convert-your-DataArray-to-a-DataSet-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Task 2 - Convert your DataArray to a DataSet</a></span></li><li><span><a href="#Task-3---Getting-coords,-dims,-and-attrs-information-out-of-the-DataSet/DataArray" data-toc-modified-id="Task-3---Getting-coords,-dims,-and-attrs-information-out-of-the-DataSet/DataArray-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Task 3 - Getting coords, dims, and attrs information out of the DataSet/DataArray</a></span></li><li><span><a href="#Going-Further" data-toc-modified-id="Going-Further-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Going Further</a></span></li></ul></li></ul></div>

## Learning Objectives

- Understand and work with Python modules
- Get familiar with Xarray and its core data structures

----------------

## Importing Modules

Modules are `.py` Python files that consist of Python code to be called upon (read: imported) into other Python files or in the command line. A module contains Python classes, functions, or variables to be referenced elsewhere.

You can `import` a whole package under an alias using the `as` keyword, or import only selected functions using `from`.

It is common practice to import all modules at once at the beginning of a script, but we will import modules as we use them for clarity of each module's application. For this notebook you will only need xarray.

In [None]:
import xarray as xr

Did that work for you? If you do not have xarray installed it won't work. 


----------------

## What is Xarray?

**Xarray** is an open source library providing high-level, easy-to-use data structures and analysis tools for working with **multi-dimensional labeled** datasets and arrays in Python.


## Reading in a netCDF file Using xarray.open_dataset


**To open a netCDF file**, we call the `xarray.open_dataset` function by passing in the relative/absolute file path.

Let's open the netCDF file that contains the dataset we will be using.

In [None]:
path = '../../../data/'
file = path + 'thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc'

ds = xr.open_dataset(file)

The `open_dataset` function opens the file and returns an xarray dataset.

### What is an Xarray DataSet anyway?

- Xarray's `Dataset` is a container of labeled arrays (`DataArrays`) with aligned dimensions. 

- It is designed as an in-memory representation of a netCDF dataset. 


- `Dataset` holds multiple variables that potentially share the same coordinates

![](../../bytopic/xarray/images/xarray-data-structures.png)



- Xarray Datasets have the following key properties:


| Attribute   	| Description                                                                                                                              	|
|-------------	|------------------------------------------------------------------------------------------------------------------------------------------	|
| `data_vars` 	| a dictionary of `DataArray` objects corresponding to data variables.                                                                      	|
| `dims`      	| dictionary mapping from dimension names to the fixed length of each dimension  (e.g., {`lat`: 6, `lon`: 6, `time`: 8}).                  	|
| `coords`    	| a  container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) 	|
| `attrs`     	| a dictionary to hold arbitrary metadata pertaining to the dataset.                                                                        	|
| `name`      	| an arbitrary name of the dataset                                                                                                         	|

Here's the netCDF dataset content with `ncdump`

In [None]:
!ncdump -h ../../../data/thetao_Omon_historical_GISS-E2-1-G_r1i1p1f1_gn_185001-187012.nc

Here's the same content in our xarray dataset

In [None]:
ds

----------------------

### What is an Xarray DataArray?

- The DataArray is xarray's implementation of a labeled, multi-dimensional array. 
- It only contains information pertaining to one variable (all relevant coordinates, dimensions, and attributes). This is in contrast to the Dataset, which is a container for DataArrays.


It has several key properties:

| Attribute 	| Description                                                                                                                              	|
|-----------	|------------------------------------------------------------------------------------------------------------------------------------------	|
| `data`    	| `numpy.ndarray` or `dask.array` holding the array's values.                                                                              	|
| `dims`    	| dimension names for each axis. For example:(`x`, `y`, `z`) (`lat`, `lon`, `time`).                                                       	|
| `coords`  	| a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings) 	|
| `attrs`   	| an `OrderedDict` to hold arbitrary attributes/metadata (such as units)                                                                   	|
| `name`    	| an arbitrary name of the array                                                                                                           	|


<div class="alert alert-block alert-warning">
NumPy and Dask are covered in other notebooks. See <a href="#Going-Further">Going Further</a> section for more information.
</div>

You can extract a dataarray from a dataset in the following two ways:

In [None]:
da = ds['thetao']
da

In [None]:
da = ds.thetao
da

<h2 style="color:red">Task 1 - Select a different variable from the dataset ds</h2>

Now you try isolating a DataArray for any other variable in the DataSet in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_1_1.py

<div class="alert alert-block alert-warning">
You can convert a DataArray back to a DataSet using your_xarray_datarray.to_dataset(). You would want to do this if you plan on adding new dimension, coordinate, or attribute information.
</div>

<h2 style="color:red">Task 2 - Convert your DataArray to a DataSet</h2>

Use `to_dataset` in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_1_2.py

<h2 style="color:red">Task 3 - Getting coords, dims, and attrs information out of the DataSet/DataArray</h2>

Play with the command `ds.coords` for your dataset `ds` or DataArray `da` for `coords`, `dims`, and `attrs` in the code cell block below:

In [None]:
# Your code here

In [None]:
# %load solutions/solution_1_3.py

Do you understand the difference between how Xarray treats coordinates and dimensions? Can you figure it out?

-----------

## Going Further


- [More in-depth xarray tutorial](../../bytopic/xarray/01_getting_started_with_xarray.ipynb)
- [More in-depth numpy tutorial](../../bytopic/numpy/01_getting_started_with_numpy.ipynb)
- [Xarray Documentation](http://xarray.pydata.org/en/stable/index.html)
- [NumPy Documentation](https://numpy.org/)
- [Dask Documentation](https://dask.org/)

<div class="alert alert-block alert-success">
   <p>Previous: <a href="00_intro.ipynb">Introduction</a></p>
  <p>Next: <a href="02_subselecting_and_indexing_data.ipynb">Book 2 of 4: Subselecting and Indexing Data</a></p>
</div>