# Quick overview
Here are some quick examples of what you can do with xarray.DataArray objects. Everything is explained in much more detail in the rest of the documentation.
To begin, import numpy, pandas and xarray using their customary abbreviations:

In [123]:
import numpy as np

In [124]:
import pandas as pd

In [125]:
import xarray as xr

## Create a DataArray
You can make a DataArray from scratch by supplying data in the form of a numpy array or list, with optional dimensions and coordinates:

In [126]:
xr.DataArray(np.random.randn(2, 3))

<xarray.DataArray (dim_0: 2, dim_1: 3)>
array([[ 0.25664579,  0.73390571, -0.31375901],
       [-2.79265918, -0.61088269, -0.05074002]])
Coordinates:
  * dim_0    (dim_0) int64 0 1
  * dim_1    (dim_1) int64 0 1 2

In [127]:
data = xr.DataArray(np.random.randn(2, 3), [('x', ['a', 'b']), ('y', [-2, 0, 2])])

In [128]:
data

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

If you supply a pandas Series or DataFrame, metadata is copied directly:

In [129]:
xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))

<xarray.DataArray 'foo' (dim_0: 3)>
array([0, 1, 2])
Coordinates:
  * dim_0    (dim_0) object 'a' 'b' 'c'

Here are the key properties for a DataArray:

In [130]:
data.values

array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])

In [131]:
data.dims

('x', 'y')

In [132]:
data.coords

Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [133]:
len(data.coords)

2

In [134]:
data.coords['x']

<xarray.DataArray 'x' (x: 2)>
array(['a', 'b'], 
      dtype='<U1')
Coordinates:
  * x        (x) <U1 'a' 'b'

In [135]:
data.attrs

OrderedDict()

## Indexing
xarray supports four kind of indexing. These operations are just as fast as in pandas, because we borrow pandas’ indexing machinery.

In [136]:
data[[0, 1]]

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [137]:
data.loc['a':'b']

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [138]:
data.loc

<xarray.core.dataarray._LocIndexer at 0x1e5c1765b70>

In [139]:
data.isel(x=slice(2))

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [140]:
data.sel(x=['a', 'b'])

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

## Computation
Data arrays work very similarly to numpy ndarrays:

In [141]:
data

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [142]:
data + 10

<xarray.DataArray (x: 2, y: 3)>
array([[ 11.19628083,  11.43208965,   9.86124094],
       [  7.60819486,   9.56873782,  10.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [143]:
np.sin(data)

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.93068497,  0.99039564, -0.13831421],
       [-0.68148327, -0.41801775,  0.27117453]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [144]:
data.T

<xarray.DataArray (y: 3, x: 2)>
array([[ 1.19628083, -2.39180514],
       [ 1.43208965, -0.43126218],
       [-0.13875906,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [145]:
data.sum()

<xarray.DataArray ()>
array(-0.05884281404722852)

However, aggregation operations can use dimension names instead of axis numbers:

In [146]:
data.mean(dim='x')

<xarray.DataArray (y: 3)>
array([-0.59776215,  0.50041374,  0.06792701])
Coordinates:
  * y        (y) int32 -2 0 2

Arithmetic operations broadcast based on dimension name. This means you don’t need to insert dummy dimensions for alignment:

In [147]:
a = xr.DataArray(np.random.randn(3), [data.coords['y']])

In [148]:
b = xr.DataArray(np.random.randn(4), dims='z')

In [149]:
a

<xarray.DataArray (y: 3)>
array([ 0.10719215,  0.6387999 , -0.1267494 ])
Coordinates:
  * y        (y) int32 -2 0 2

In [150]:
b

<xarray.DataArray (z: 4)>
array([ 1.80160956,  0.16734233,  0.54868579, -0.40664812])
Coordinates:
  * z        (z) int64 0 1 2 3

In [151]:
a + b

<xarray.DataArray (y: 3, z: 4)>
array([[ 1.90880171,  0.27453448,  0.65587794, -0.29945597],
       [ 2.44040946,  0.80614222,  1.18748569,  0.23215177],
       [ 1.67486016,  0.04059293,  0.42193639, -0.53339752]])
Coordinates:
  * y        (y) int32 -2 0 2
  * z        (z) int64 0 1 2 3

-----------------
Another broadcast example:

In [152]:
v1 = xr.DataArray(np.random.rand(3, 2, 4), dims=['t', 'y', 'x'])

In [153]:
v2 = xr.DataArray(np.random.rand(2, 4), dims=['y', 'x'])

In [154]:
v1

<xarray.DataArray (t: 3, y: 2, x: 4)>
array([[[ 0.75505725,  0.70412875,  0.71951414,  0.41214218],
        [ 0.92641421,  0.91357002,  0.49075572,  0.21882196]],

       [[ 0.57530964,  0.46241613,  0.00148094,  0.44202443],
        [ 0.38101931,  0.90512515,  0.54204429,  0.83758081]],

       [[ 0.42336956,  0.48360626,  0.20541671,  0.97148341],
        [ 0.25445268,  0.33520223,  0.08730216,  0.96957316]]])
Coordinates:
  * t        (t) int64 0 1 2
  * y        (y) int64 0 1
  * x        (x) int64 0 1 2 3

In [155]:
v2

<xarray.DataArray (y: 2, x: 4)>
array([[ 0.05473322,  0.00177718,  0.07513416,  0.18226461],
       [ 0.90563027,  0.64230208,  0.90437014,  0.9735466 ]])
Coordinates:
  * y        (y) int64 0 1
  * x        (x) int64 0 1 2 3

In [156]:
v1 + v2

<xarray.DataArray (t: 3, y: 2, x: 4)>
array([[[ 0.80979048,  0.70590593,  0.79464831,  0.59440678],
        [ 1.83204449,  1.5558721 ,  1.39512586,  1.19236856]],

       [[ 0.63004286,  0.46419332,  0.0766151 ,  0.62428904],
        [ 1.28664958,  1.54742723,  1.44641442,  1.8111274 ]],

       [[ 0.47810278,  0.48538344,  0.28055088,  1.15374802],
        [ 1.16008296,  0.97750431,  0.9916723 ,  1.94311975]]])
Coordinates:
  * t        (t) int64 0 1 2
  * y        (y) int64 0 1
  * x        (x) int64 0 1 2 3

It also means that in most cases you do not need to worry about the order of dimensions:

In [157]:
data - data.T

<xarray.DataArray (x: 2, y: 3)>
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

Operations also align based on index labels:

In [173]:
data[:-1]

<xarray.DataArray (x: 1, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906]])
Coordinates:
  * x        (x) <U1 'a'
  * y        (y) int32 -2 0 2

In [177]:
data[:1]

<xarray.DataArray (x: 1, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906]])
Coordinates:
  * x        (x) <U1 'a'
  * y        (y) int32 -2 0 2

In [176]:
data[:-1] - data[:1]

<xarray.DataArray (x: 1, y: 3)>
array([[ 3.58808597,  1.86335183, -0.41337214]])
Coordinates:
  * x        (x) <U1 'a'
  * y        (y) int32 -2 0 2

## GroupBy

xarray supports grouped operations using a very similar API to pandas:

In [159]:
labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')

In [160]:
labels

<xarray.DataArray 'labels' (y: 3)>
array(['E', 'F', 'E'], 
      dtype='<U1')
Coordinates:
  * y        (y) int32 -2 0 2

In [185]:
data

<xarray.DataArray (x: 2, y: 3)>
array([[ 1.19628083,  1.43208965, -0.13875906],
       [-2.39180514, -0.43126218,  0.27461308]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2

In [184]:
data.groupby(labels).mean('y')

<xarray.DataArray (x: 2, labels: 2)>
array([[ 0.52876089,  1.43208965],
       [-1.05859603, -0.43126218]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * labels   (labels) object 'E' 'F'

In [186]:
data.groupby(labels).apply(lambda x: x - x.min())

<xarray.DataArray (x: 2, y: 3)>
array([[ 3.58808597,  1.86335183,  2.25304607],
       [ 0.        ,  0.        ,  2.66641822]])
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2
    labels   (y) <U1 'E' 'F' 'E'

## Convert to pandas

A key feature of xarray is robust conversion to and from pandas objects:

In [187]:
data.to_series()

x  y 
a  -2    1.196281
    0    1.432090
    2   -0.138759
b  -2   -2.391805
    0   -0.431262
    2    0.274613
dtype: float64

In [188]:
data.to_pandas()

y,-2,0,2
x,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a,1.196281,1.43209,-0.138759
b,-2.391805,-0.431262,0.274613


## Datasets and NetCDF

`xarray.Dataset` is a dict-like container of `DataArray` objects that share index labels and dimensions. It looks a lot like a netCDF file:

In [189]:
ds = data.to_dataset(name='foo')

In [190]:
ds

<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) <U1 'a' 'b'
  * y        (y) int32 -2 0 2
Data variables:
    foo      (x, y) float64 1.196 1.432 -0.1388 -2.392 -0.4313 0.2746

You can do almost everything you can do with `DataArray` objects with `Dataset` objects if you prefer to work with multiple variables at once.

Datasets also let you easily read and write netCDF files:

In [191]:
 ds.to_netcdf('example.nc')

In [192]:
xr.open_dataset('example.nc')

<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) object 'a' 'b'
  * y        (y) int32 -2 0 2
Data variables:
    foo      (x, y) float64 1.196 1.432 -0.1388 -2.392 -0.4313 0.2746