In [1]:
import numpy as np
import pandas as pd
import xarray as xr
xr.set_options(display_style="html")


<xarray.core.options.set_options at 0x10359abd0>

### Introduction

xarray is a powerful tool that allows you to work with multi-dimensional data with coordinates. It is particularly powerful for data that varies over space and time.

A DataArray has four main components:

1. The **data** itself. In general, the data are varying/measured/dependent quantities.
2. The **dimensions** are the named dimensional axes. So if we think of a traditional cartesian 2D plane, the horizontal dimension/axis is "x" and the vertical dimension/axis is "y". These dimensions can be anything -- like county name instead of "x" and the time of measurement instead of "y", or animal tag number and weight, etc, etc. 
3. The **coordinates**. This, in my opinion, is where the power of xarray lies. By building in a coordinate system, you can compare different measurements, model outputs, etc over time and space. In general, coordinates are constant/fixed/independent quantities. There are **two types of coordinates**.
    - Dimensional coordinates: used primarily as a labeled index, like rows a,b,c, very similar to the pandas index. Note that data is automatically indexed (row 1, 2, 3, column 1,2,3), but dimensional coordinates allow you to give the data an additional index that may be more sensible for your use
    - Non-dimensional coordinates: In 2D, these are the lat/lons of your data. In 3D, they could be lat/lon/time (or depth, or whatever is relevant to you). These are NOT indexed.
    
    
4. **Attributes** are stored metadata, e.g. units, attributions, etc



### Let's initialize your first 2D Data Array.

The data is a random 2x3 array `np.random.randn(2,3)`
with two dimensions, labelled **x** and **y**
and the x dimension is given a dimensional coordinate `{'x':[10,20]}` 

In [34]:
FirstArray = xr.DataArray(np.random.randn(2,3),
                   dims=('x','y'),
                   coords={'x':[10,20]})

Let's take a look at our data

*Note - When you load your packages, include `xr.set_options(display_style="html")` to make it easy to view your data.*


In [35]:
FirstArray

You can click on the stacked pancake symbol just below xarray.DataArray to see the data in full, and to the right of **x** under coordinates to see the coordinates in full.

### Let's make our example more tangible

Here we create an 11-year record of accumulation rates in m.w.e./year on three imaginary glaciers named "Blue", "White", and "Dusty"

In [37]:
acc = np.random.rand(11, 3)*4+1 #create record of accumulation (snowfall) rates
gls = ['BLUE','WHITE','DUSTY'] #names of the glaciers
yrs = np.linspace(1990,2000,11) #years of accumulation rate records

Now, we can create our DataArray.

We can set the coordinate and the dimensions at the same time

In [39]:
record = xr.DataArray(acc,
                      coords=[('year', yrs),('glacier',gls)])

If we provide the coordinates as a dictionary, we can add in non-dimensional coordinates. If you do this, you have to explicitly define the dimensions.

In [40]:
record = xr.DataArray(acc,
                      dims=['year','glacier'],
                      coords={'year': yrs,'glacier': gls,'lat': ('glacier', np.linspace(50,80,3))})

We can add an attribute so that we know what units accumulation are in


In [42]:
record.attrs['units'] = 'm.w.e./yr'

And name our dataset

In [43]:
record.name = 'Accumulation rate at three glaciers'

Ok, let's take a look

In [44]:
record

### Great! Now let's practice accessing our data

In [45]:
record.name #name

'Accumulation rate at three glaciers'

In [46]:
record.coords #all coords

Coordinates:
  * year     (year) float64 1.99e+03 1.991e+03 1.992e+03 ... 1.999e+03 2e+03
  * glacier  (glacier) <U5 'BLUE' 'WHITE' 'DUSTY'
    lat      (glacier) float64 50.0 65.0 80.0

In [49]:
record.coords['year'] #one coord

In [50]:
record.data #data

array([[3.97059551, 3.23378393, 4.43189687],
       [3.13889853, 1.12846474, 3.8069183 ],
       [3.12481753, 1.2207355 , 3.37753112],
       [3.77999034, 4.50610982, 4.20493141],
       [4.10294848, 4.62495127, 3.07740751],
       [1.17846858, 4.30577895, 4.82061719],
       [1.08429015, 3.36794374, 2.25544492],
       [1.06317933, 1.31807451, 2.60316921],
       [4.11014339, 1.58063859, 3.14976428],
       [3.93503771, 4.19714628, 3.42113066],
       [1.56184794, 4.94853053, 3.73499334]])

In [53]:
# plus you can add coordinates

record['fakeConstant'] = 42 #constant
record['fakeCoord'] = ('glacier', ['alaska','alps','greenland'])

record

In [56]:
# and delete coordinates

del record['fakeConstant']
del record['fakeCoord']

record

### DataSets allow us to combine multiple sets of data/datatypes

![example dataset](https://xarray.pydata.org/en/stable/_images/dataset-diagram.png)

*from the xarray docs*

The above image conceptualizes the utility of xarray. Temperature and precipitation are each their own DataArray, with coordinates of latitude, longitude, and reference time. They are combined together into a DataSet so that we can now access the temperature and precipiation history from one location, or their distribution at one point in time.

### Let's get started by making two datasets: one for temperature and one for precipitation