# 4. `xarray`

## Exercise

These two arrays contain average monthly temperatures (in Celsius degrees) in Erlangen and Paris:

```
erlangen = [-0.5, 0.7, 4.4, 8.5, 13.3, 16.7, 18.2, 17.5, 13.7, 8.9, 4.0, 0.9]
paris = [3.3, 4.2, 7.8, 10.8, 14.3, 17.5, 19.4, 19.1, 16.4, 11.6, 7.2, 4.2]
```

Design a `DataArray` for storing these data. Calculate average annual temperature per location.

In [1]:
import xarray as xr
import numpy as np
erlangen = [-0.5, 0.7, 4.4, 8.5, 13.3, 16.7, 18.2, 17.5, 13.7, 8.9, 4.0, 0.9]
paris = [3.3, 4.2, 7.8, 10.8, 14.3, 17.5, 19.4, 19.1, 16.4, 11.6, 7.2, 4.2]

months = np.arange(12) + 1
temperature = xr.DataArray([erlangen, paris], [('city', ['erlangen', 'paris']),
                                               ('month', months)])
temperature.mean('month')

<xarray.DataArray (city: 2)>
array([  8.85833333,  11.31666667])
Coordinates:
  * city     (city) <U8 'erlangen' 'paris'

## Exercise

*Inspired by data science [challenge](http://www.ramp.studio/events/drug_spectra) by C. Marini et al*

A researcher measured a [Raman spectrum](https://en.wikipedia.org/wiki/Raman_spectroscopy) of an unknown sample. Now he wants to determine the substance and its concentration. He has calibration data with Raman spectra of four different compounds at three different concentrations. Calculate mean square error between sample and all calibration spectra and find the closest compound and concentration.

```python
import pandas as pd
df = pd.DataFrame.from_csv('raman_data.csv', index_col=[0, 1, 2])
calibration = xr.DataArray.from_series(df['Raman'])

sample = xr.DataArray([[0, 10]], [('sample', ['X1042']),
                                  ('wavelength', [100, 300])])
```

**Hint**: To find the calibration sample with minimum error, you may convert the DataArray to pandas:

```python
err.to_series().argmin()
```

In [2]:
# Code used to generate the calibration data
import xarray as xr
import numpy as np

spectra = np.array([[0, 1, 0], [1, 0, 0], [0, 0, 1], [0, 1, 1]])
concentrations = np.array([1., 10., 100.])
data = spectra * concentrations[:, None, None]
dataarray = xr.DataArray(data, [
                               ('concentration', [1, 2, 5]),
                               ('compound', ['A', 'B', 'C', 'D']),
                               ('wavelength', [100, 200, 300])
                               ],
                              name='Raman')

dataarray.to_dataframe().to_csv('raman_data.csv')

In [3]:
# Solution 

import pandas as pd
df = pd.DataFrame.from_csv('raman_data.csv', index_col=[0, 1, 2])
calibration = xr.DataArray.from_series(df['Raman'])

sample = xr.DataArray([[0, 10]], [('sample', ['X1042']),
                                  ('wavelength', [100, 300])])

err = ((calibration - sample)**2).sum('wavelength')
err.to_series().argmin()

(2, 'C', 'X1042')

# 5. Array interface

## Exercise

Original exercise by Stefan van der Walt and Juan Nunez-Iglesias.

An author of a foreign package (included with the exercizes as
``problems/mutable_str.py``) provides a string class that
allocates its own memory:

```ipython
In [1]: from mutable_str import MutableString
In [2]: s = MutableString('abcde')
In [3]: print s
abcde
```

You'd like to view these mutable (*mutable* means the ability to modify in place)
strings as ndarrays, in order to manipulate the underlying memory.

Add an __array_interface__ dictionary attribute to s, then convert s to an
ndarray. Numerically add "2" to the array (use the in-place operator ``+=``).

Then print the original string to ensure that its value was modified.

> **Hint:** Documentation for NumPy's ``__array_interface__``
  may be found [in the online docs](http://docs.scipy.org/doc/numpy/reference/arrays.interface.html).

Here's a skeleton outline:

```python
import numpy as np
from mutable_str import MutableString

s = MutableString('abcde')

# --- EDIT THIS SECTION ---

# Create an array interface to this foreign object
s.__array_interface__ = {'data' : XXX # (ptr, is read_only?)
                         'shape' : XXX,
                         'typestr' : XXX, # typecode unsigned character
                         }

# --- EDIT THIS SECTION ---

print('String before converting to array:', s)
sa = np.asarray(s)

print('String after converting to array:', sa)

sa += 2
print('String after adding "2" to array:', s)
```

In [4]:
import numpy as np
from mutable_str import MutableString

s = MutableString('abcde')

# --- EDIT THIS SECTION ---

# Create an array interface to this foreign object
s.__array_interface__ = {'data' : (s.data_ptr, False), # (ptr, is read_only?)
                         'shape' : (len(s),),
                         'typestr' : '|u1', # typecode unsigned character
                         }

# --- EDIT THIS SECTION ---

print('String before converting to array:', s)
sa = np.asarray(s)

print('String after converting to array:', sa)

sa += 2
print('String after adding "2" to array:', s)

String before converting to array: abcde 
String after converting to array: [ 97  98  99 100 101]
String after adding "2" to array: cdefg 


# 6. Ufuncs

### Exercise

*Exercise from [NumPy 100 exercises](https://github.com/rougier/numpy-100/blob/master/100%20Numpy%20exercises.md)*:

Compute `((A+B)*(-A/2))` in place (without a copy):

```
A = np.ones(3)*1
B = np.ones(3)*2
```

**Hint**: Use `out` argument of ufuncs.

In [5]:
A = np.ones(3)*1
B = np.ones(3)*2

np.add(A, B, out=B)
np.divide(A,2,out=A)
np.negative(A,out=A)
np.multiply(A,B,out=A)

array([-1.5, -1.5, -1.5])

### Exercise  (`np.einsum`)

*Exercise from [100 numpy exercises](https://github.com/rougier/numpy-100)*

Use `np.einsum` to calculate the **diagonal of a dot product** of two matrices (`np.diag(np.dot(A, B))`).

```
A = np.arange(6).reshape(3, 2)
B = np.ones((2, 3))
np.einsum('your signature goes here', A, B)
```

Then, test your solution on stacked arrays:

```
A = np.arange(12).reshape(2, 3, 2)
B = np.ones((2, 3))
```

In [6]:
A = np.arange(12).reshape(2, 3, 2)
B = np.ones((2, 3))
np.einsum('...ij,...ji->...i', A, B)

array([[  1.,   5.,   9.],
       [ 13.,  17.,  21.]])


# 7. Extending NumPy

### Exercise

Take the following function calculating logit and turn it into a ufunc using the above example.

```cython

cdef extern from "math.h":
    double log "log" (double) nogil
    
import cython

@cython.cdivision(True)
cdef double logit_double(double p) nogil:
    p = p/(1-p);
    p = log(p);
    return p
```

In [7]:
%load_ext cython

In [8]:
%%cython

# The elementwise function

cdef extern from "math.h":
    double log "log" (double) nogil
    
import cython

@cython.cdivision(True)
cdef double logit_double(double p) nogil:
    p = p/(1-p);
    p = log(p);
    return p


# Required module initialization
# ------------------------------

cimport numpy as np
np.import_array()
np.import_ufunc()

# The actual ufunc declaration
# ----------------------------

cdef np.PyUFuncGenericFunction loop_func[1]
cdef char input_output_types[2]
cdef void *elementwise_funcs[1]

loop_func[0] = np.PyUFunc_d_d # generic function to implement looping

input_output_types[0] = np.NPY_DOUBLE
input_output_types[1] = np.NPY_DOUBLE


elementwise_funcs[0] = <void*>logit_double

logit = np.PyUFunc_FromFuncAndData(
    loop_func,
    elementwise_funcs,
    input_output_types,
    1, # number of supported input types
    1, # number of input args
    1, # number of output args
    0, # `identity` element, never mind this
    "logit", # function name
    "computes logit", # docstring
    0 # unused
    )


In [9]:
import numpy as np
a = np.array([0.9, 0.3], dtype=np.double)
logit(a)

array([ 2.19722458, -0.84729786])