In [1]:
import xarray as xr
import numpy as np

In [2]:
data = xr.DataArray(np.random.rand(5))

In [3]:
data

<xarray.DataArray (dim_0: 5)>
array([ 0.58109333,  0.38717359,  0.68459167,  0.95377795,  0.53054858])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4

## Dimension labels

In [4]:
data1 = xr.DataArray(np.random.rand(4, 2), [('sample',['a', 'b', 'c', 'd']), ('size',['width', 'height'])])

In [5]:
data1.dtype

dtype('float64')

In [6]:
data1

<xarray.DataArray (sample: 4, size: 2)>
array([[ 0.00836294,  0.52655775],
       [ 0.54623053,  0.77450702],
       [ 0.70822589,  0.59344444],
       [ 0.21614525,  0.1698368 ]])
Coordinates:
  * sample   (sample) <U1 'a' 'b' 'c' 'd'
  * size     (size) <U6 'width' 'height'

In [7]:
data1.sum('sample')

<xarray.DataArray (size: 2)>
array([ 1.47896462,  2.06434601])
Coordinates:
  * size     (size) <U6 'width' 'height'

# Exercise

These two arrays contain average monthly temperatures (in Celsius degrees) in Erlangen and Paris:

```
erlangen = [-0.5, 0.7, 4.4, 8.5, 13.3, 16.7, 18.2, 17.5, 13.7, 8.9, 4.0, 0.9]
paris = [3.3, 4.2, 7.8, 10.8, 14.3, 17.5, 19.4, 19.1, 16.4, 11.6, 7.2, 4.2]
```

Design a `DataArray` for storing these data. Calculate average annual temperature per location.

## Indexing

In [8]:
data1[2]

<xarray.DataArray (size: 2)>
array([ 0.70822589,  0.59344444])
Coordinates:
    sample   <U1 'c'
  * size     (size) <U6 'width' 'height'

In [9]:
data1.loc['a']

<xarray.DataArray (size: 2)>
array([ 0.00836294,  0.52655775])
Coordinates:
    sample   <U1 'a'
  * size     (size) <U6 'width' 'height'

In [10]:
data1.sel(size='width')

<xarray.DataArray (sample: 4)>
array([ 0.00836294,  0.54623053,  0.70822589,  0.21614525])
Coordinates:
  * sample   (sample) <U1 'a' 'b' 'c' 'd'
    size     <U6 'width'

## Alignment

In [11]:
day2 = xr.DataArray(np.random.rand(4,2), [('sample',['b', 'c', 'd', 'e']), ('size', ['width', 'height'])])

In [12]:
data1 + day2

<xarray.DataArray (sample: 3, size: 2)>
array([[ 0.86884505,  0.8110718 ],
       [ 0.75982048,  0.60637582],
       [ 0.93107656,  0.97859498]])
Coordinates:
  * sample   (sample) object 'b' 'c' 'd'
  * size     (size) <U6 'width' 'height'

## Broadcasting

In [13]:
units = xr.DataArray([0.001, 0.01, 1], [('unit', ['mm', 'cm', 'm'])])

In [14]:
data1 * units

<xarray.DataArray (sample: 4, size: 2, unit: 3)>
array([[[  8.36294376e-06,   8.36294376e-05,   8.36294376e-03],
        [  5.26557755e-04,   5.26557755e-03,   5.26557755e-01]],

       [[  5.46230529e-04,   5.46230529e-03,   5.46230529e-01],
        [  7.74507017e-04,   7.74507017e-03,   7.74507017e-01]],

       [[  7.08225894e-04,   7.08225894e-03,   7.08225894e-01],
        [  5.93444441e-04,   5.93444441e-03,   5.93444441e-01]],

       [[  2.16145252e-04,   2.16145252e-03,   2.16145252e-01],
        [  1.69836799e-04,   1.69836799e-03,   1.69836799e-01]]])
Coordinates:
  * sample   (sample) <U1 'a' 'b' 'c' 'd'
  * size     (size) <U6 'width' 'height'
  * unit     (unit) <U2 'mm' 'cm' 'm'

## Exercise

A researcher measured a Raman spectrum of a unknown sample. Now he wants to determine the substance and its concentration. He has calibration data with Raman spectra of four different compounds at three different concentrations. Calculate mean square error between the sample spectrum and the calibration data for all compounds and concentrations.

```python
import pickle
with open('raman_data.pickle', 'rb') as fid: 
    calibration = pkl.load(fid)

sample = xr.DataArray([[0, 10]], [('sample', ['X1042']),
                                  ('wavelength', [100, 300])])
```

**Hint**: To find the calibration sample with minimum error, you may convert the DataArray to pandas:

```python
err.to_dataframe(name='error')['error'].argmin()
```

## Comparison

|     | pandas.DataFrame | xarray.DataArray | Structured NumPy array|
|-----|------------------|------------------|--------------|
|max. dimensions | 2 | 32 | 32 |
| non-homogeneous arrays | Yes | No | Yes |
|labelled dimensions | 2 | 32 | 1 |
| labelled coordinates | Yes | Yes | No |
| broadcasting | No | Yes | Yes |
| auto-alignment | Yes | Yes | No |
| groupby-split-combine | Yes | Yes | No |

# Other features

* `Dataset` -- key/value store; generalisation of `DataFrame` in `pandas` for N-dimenisonal data
* groupby/split/combine
* NetCDF io