# `xarray` &mdash; arrays with labels

In [None]:
import xarray as xr
import numpy as np

## Dimension and coordinate labels

In [None]:
sample_array = np.random.rand(4, 2)
sample_array

xarray allows to add labels to dimensions and coordinates (indices):

In [None]:
data1 = xr.DataArray(sample_array,
                     [('sample',['a', 'b', 'c', 'd']), 
                      ('size',['width', 'height'])])

In [None]:
data1

You can use dimension names, for example, for axis-based reductions:

In [None]:
data1.sum('sample') # same as sample_array.sum(0)

DataArray is homogeneous:

In [None]:
data1.dtype

## Indexing

Three types of indexing:
    
    
* positional (like in NumPy)

In [None]:
data1[2, 1]

* by coordinate labels

In [None]:
data1.loc['a', 'width']

* by dimension and coordinate label

In [None]:
data1.sel(size='width', sample='a')

## Exercise

These two arrays contain average monthly temperatures (in Celsius degrees) in Erlangen and Paris:

```
erlangen = [-0.5, 0.7, 4.4, 8.5, 13.3, 16.7, 18.2, 17.5, 13.7, 8.9, 4.0, 0.9]
paris = [3.3, 4.2, 7.8, 10.8, 14.3, 17.5, 19.4, 19.1, 16.4, 11.6, 7.2, 4.2]
```

Design a `DataArray` for storing these data. Calculate average annual temperature per location.

## Alignment

Element-wise operations between two arrays are automatically aligned on labels:

In [None]:
day2 = xr.DataArray(np.random.rand(4,2), 
                    [('sample',['b', 'c', 'd', 'e']), 
                     ('size', ['width', 'height'])])

In [None]:
data1 - day2

We can also align DataArrays manually:

In [None]:
xr.align(data1, day2, join='outer')

Aligned DataArrays can be concatenated along a new dimension

In [None]:
xr.concat(xr.align(data1, day2), dim='time')

## Broadcasting

Arithmetic operations broadcast based on dimension name. This means you don’t need to insert dummy dimensions for alignment:

In [None]:
units = xr.DataArray([0.001, 0.01, 1], [('unit', ['mm', 'cm', 'm'])])
data1 * units

## Interoperability 

`xarray` takes best of two worlds: pandas `DataFrame`/`Series` objects and NumPy's `ndarray`.

### Pandas

In [None]:
data1.to_series()

In [None]:
data1.to_dataframe(name='dim')

Round-trip is also possible. For example, to rank samples in terms of their width and height, you might use the following:

In [None]:
series = data1.to_series()
ranks = series.unstack().rank().stack() # pandas code
xr.DataArray.from_series(ranks)

### NumPy arrays

In [None]:
a = np.asarray(data1)
a[0, 0]= 0
print(data1)

In [None]:
data1.variable.data

## Exercise

*Inspired by data science [challenge](http://www.ramp.studio/events/drug_spectra) by C. Marini et al*

A researcher measured a [Raman spectrum](https://en.wikipedia.org/wiki/Raman_spectroscopy) of an unknown sample. Now he wants to determine the substance and its concentration. He has calibration data with Raman spectra of four different compounds at three different concentrations. Find the calibration compound and concentration with the Raman spectrum most similar to the sample. You may choose the criterion (such as mean square error or max deviation).

```python
import pandas as pd

# import calibration data from a file
df = pd.DataFrame.from_csv('raman_data.csv', index_col=[0, 1, 2])
calibration = xr.DataArray.from_series(df['Raman'])

sample = xr.DataArray([[0, 10]], [('sample', ['X1042']),
                                  ('wavelength', [100, 300])])
```

**Hint**: To find the calibration sample with minimum error, you may convert the DataArray to pandas:

```python
err.to_series().argmin()
```

## Comparison

|     | pandas.DataFrame | xarray.DataArray | Structured NumPy array|
|-----|------------------|------------------|--------------|
|max. dimensions | 2 | 32 | 32 |
| non-homogeneous arrays | Yes | No | Yes |
| dimensions with labels| 2 | 32 | 1 |
| labelled coordinates | Yes | Yes | No |
| broadcasting | No | Yes | Yes |
| auto-alignment | Yes | Yes | No |
| groupby-split-combine | Yes | Yes | No |

# Other features

* `Dataset` -- key/value store; generalisation of `DataFrame` in `pandas` for N-dimenisonal data
* groupby/split/combine
* NetCDF io

## Further reading

* xarray docs, http://xarray.pydata.org/en/stable/