Last Updated: 7-29-2017

# Table of Contents
 <p><div class="lev1 toc-item"><a href="#xarray-architecture" data-toc-modified-id="xarray-architecture-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>xarray architecture</a></div><div class="lev2 toc-item"><a href="#What-is-xarray?" data-toc-modified-id="What-is-xarray?-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>What is xarray?</a></div><div class="lev2 toc-item"><a href="#When-to-use-xarray:" data-toc-modified-id="When-to-use-xarray:-12"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>When to use xarray:</a></div><div class="lev2 toc-item"><a href="#Basic-xarray-data-structures:" data-toc-modified-id="Basic-xarray-data-structures:-13"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Basic xarray data structures:</a></div><div class="lev3 toc-item"><a href="#DataArray" data-toc-modified-id="DataArray-131"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span><code>DataArray</code></a></div><div class="lev3 toc-item"><a href="#Dataset" data-toc-modified-id="Dataset-132"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span><code>Dataset</code></a></div><div class="lev2 toc-item"><a href="#Importing-the-xarray-library" data-toc-modified-id="Importing-the-xarray-library-14"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Importing the xarray library</a></div><div class="lev2 toc-item"><a href="#Open-the-dataset" data-toc-modified-id="Open-the-dataset-15"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Open the dataset</a></div><div class="lev2 toc-item"><a href="#Dataset-Properties" data-toc-modified-id="Dataset-Properties-16"><span class="toc-item-num">1.6&nbsp;&nbsp;</span><code>Dataset</code> Properties</a></div><div class="lev2 toc-item"><a href="#Extracting-DataArrays-from-a-Dataset" data-toc-modified-id="Extracting-DataArrays-from-a-Dataset-17"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Extracting <code>DataArrays</code> from a <code>Dataset</code></a></div><div class="lev2 toc-item"><a href="#Key-Points" data-toc-modified-id="Key-Points-18"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Key Points</a></div>

# xarray architecture

## What is xarray?

* originally developed by employees (Stephan Hoyer, Alex Kleeman and Eugene Brevdo) at [The Climate Corporation](https://climate.com/)
* xarray extends some of the core functionality of the Pandas library:
    * operations over _named_ dimensions
    * selection by label instead of integer location
    * powerful _groupby_ functionality
    * database-like joins

## When to use xarray:

* if your data are multidimensional (e.g. climate data: x, y, z, time)
* if your data are structured on a regular grid
* if you can represent your data in netCDF format

## Basic xarray data structures:
* NetCDF forms the basis of the xarray data structure
* two main data structures are the `DataArray` and the `Dataset`

### `DataArray`
* the `DataArray` is xarray's implementation of a labeled, multi-dimensional array
* the `DataArray` has these key properties:
  * `data`: N-dimensional array (NumPy or dask) holding the array's values,
  * `dims`: dimension names for each axis,
  * `coords`: dictionary-like container of arrays that label each point, and
  * `attrs`: ordered dictionary holding metadata

![Imgur](http://i.imgur.com/Jj5JINC.png)
* dimensions(x, y, time); variables(temp, precip); coords(lat, long); attributes

### `Dataset`
* xarray's multi-dimensional equivalent of a Pandas `DataFrame`
* dict-like container of DataArray objects with aligned dimensions
* Datasets have these key properties:
  * `dims`: dictionary mapping from dimension names to the fixed length of each dimension,
  * `data_vars`: dict-like container of `DataArrays` corresponding to data variables,
  * `coords`: dictionary-like container of `DataArrays` intended to label points used in data_vars
  * `attrs`: ordered dictionary holding metadata

## Importing the xarray library

In [1]:
import xarray as xr

## Open the dataset

In [2]:
%time ds = xr.open_dataset('/home/abanihi/Documents/climate-data/ERM/t85.an.sfc/e4moda.an.sfc.t85.sst.1957-2002.nc')

CPU times: user 20 ms, sys: 16 ms, total: 36 ms
Wall time: 475 ms


- You will notice this seemed to go very fast. That is because this step doesn't actually ask Python to read the data into memory. Rather, Python is just scanning the contents of the file.
- This is called lazy loading.

## ```Dataset``` Properties

In [3]:
ds

<xarray.Dataset>
Dimensions:     (lat: 128, lon: 256, time: 540)
Coordinates:
  * time        (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat         (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon         (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...
Data variables:
    gw          (lat) float32 0.000449381 0.00104581 0.0016425 0.00223829 ...
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     \nERA40 T85 Surface Analysis: created at NCAR
    temporal_span:             \nThe entire ERA40 archive spans 45 years: Sep...
    source_original:           \nEuropean Center for Medium-Range Weather For...
    story:                 

In [4]:
# coordinates
ds.coords

Coordinates:
  * time     (time) datetime64[ns] 1957-09-01 1957-10-01 1957-11-01 ...
  * lat      (lat) float32 -88.9277 -87.5387 -86.1415 -84.7424 -83.3426 ...
  * lon      (lon) float32 0.0 1.406 2.812 4.218 5.624 7.03 8.436 9.842 ...

In [5]:
# attributes
ds.attrs

OrderedDict([('title', '\nERA40 T85 Surface Analysis: created at NCAR'),
             ('temporal_span',
              '\nThe entire ERA40 archive spans 45 years: September 1957 - August 2002.'),
             ('source_original',
              '\nEuropean Center for Medium-Range Weather Forecasts - Reading   \n'),
             ('story',
              "\nThis dataset is a netCDF version of ds126.0 which is archived  \nin GRIB format. The original dataset was implemented and       \ncomputed by NCAR's Data Support Section (DSS), and forms an    \nessential part of efforts undertaken in late 2004, early 2005, \nto produce an archive of selected segments of ERA-40 on a      \nstandard transformation grid. In this case, 47 ERA-40 monthly  \nmean surface and single level analysis (*instantaneous*)       \nvariables were transformed from a reduced N80 Gaussian grid to \na 256x128 regular Gaussian grid. All fields were transformed using\nroutines from the ECMWF EMOS library, including 10 meter w

In [6]:
# data variables
ds.data_vars

Data variables:
    gw          (lat) float32 0.000449381 0.00104581 0.0016425 0.00223829 ...
    date        (time) int32 19570901 19571001 19571101 19571201 19580101 ...
    datesec     (time) int32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
    yyyymmddhh  (time) int32 1957090100 1957100100 1957110100 1957120100 ...
    SST         (time, lat, lon) float64 nan nan nan nan nan nan nan nan nan ...

## Extracting ```DataArrays``` from a ```Dataset```

In [8]:
sst = ds['SST']

Now, take a look at the contents of the temperature variable. Note that the associated coordinates and attributes get carried along for the ride. Also note that we are still not reading any data into memory.

In [9]:
sst

<xarray.DataArray 'SST' (time: 540, lat: 128, lon: 256)>
array([[[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205],
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693],
        [ 271.460693,  271.460693, ...,  271.460693,  271.460693]],

       ..., 
       [[        nan,         nan, ...,         nan,         nan],
        [        nan,         nan, ...,         nan,         nan],
        ..., 
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205],
        [ 271.460205,  271.460205, ...,  271.460205,  271.460205]],

       [[        nan,         nan, ...,         nan,         nan],
        [  

## Key Points

- xarray is build on the netCDF data model
- xarray has two main data structures: DataArray and Dataset
- DataArrays store the multi-dimensional arrays
- Datasets are the multi-dimensional equivalent of a Pandas dataframe
