# **--- Data manipulation with Xarray ---**
---

In this tutorial we're going to learn how to manipulate neurophysiological data using Xarray. In details :
- Data transformation (NumPy to Xarray and conversaly)
- Data selection (time, space, multi-indexing etc.)
- Save and load the data
- Data plotting

<div class="alert alert-success"><p>

Link to Xarray [documentation](http://xarray.pydata.org/en/stable/index.html)
</p></div>

In [None]:
import os

import numpy as np
import xarray as xr
import pandas as pd

import matplotlib.pyplot as plt

---
# **--- ROOT PATH ---**

<div class="alert alert-info"><p>

Define the path to where the data are located !
</p></div>

In [None]:
ROOT = '/run/media/etienne/DATA/Toolbox/BraiNets/CookingFrites/dataset/'

---
# **1 - Data transformation**

## 1.1 From NumPy array to Xarray

In [None]:
# properties for the simulated data
n_trials = 10
n_channels = 5
n_times = 500

# generate coordinates
conditions = ['Stimulus 0'] * 5 + ['Stimulus 1'] * 5
channels = [f"ch_{k}" for k in range(n_channels)]
times = (np.arange(n_times) - 100) / 256.

# create the (random) data
data_np = np.random.rand(n_trials, n_channels, n_times)

# create the DataArray
data_xr = xr.DataArray(
    data_np, dims=('conditions', 'channels', 'times'),
    coords=(conditions, channels, times)
)

# data_xr

## 1.2 From Xarray to NumPy array

In [None]:
# data_xr.data  # that's it

## 1.3 Get coordinates

In [None]:
# data_xr['times']  # or data_xr['times'].data
# data_xr['channels']
# data_xr['conditions']

## 1.4 Add attributes, name etc.

In [None]:
data_xr.name = 'GDR tuto'
data_xr.attrs = {
    "sampling frequency": 512.,
    "subject": 0,
    "info": "subject was distracted at sample 1s"
}
# data_xr

## 1.4 Dataset creation (Bonus)

In [None]:
n_subjects = 5

dt = {}
for n_s in range(n_subjects):
    # create a random xarray the same way as before
    data_s = xr.DataArray(
        np.random.rand(n_trials, n_channels, n_times),
        dims=('conditions', 'channels', 'times'),
        coords=(conditions, channels, times),
        name=f"HGA-{n_s}"
    )
    
    # fill a dictionary
    dt[f"subject-{n_s}"] = data_s

# dataset creation
dt = xr.Dataset(dt)
# dt

---
# **2 - Saving and loading**

## 2.1 Saving a `DataArray`

In [None]:
# nothing hard here !
# data_xr.to_netcdf("save_dataarray.nc")

## 2.2 Loading a `DataArray`

In [None]:
# not much harder
# xr.load_dataarray("save_dataarray.nc")

## 2.3 Saving and loading a `Dataset`

In [None]:
# saving : "this is how we do it"
# dt.to_netcdf("my_dataset.nc")

# loading : "this is how we do it"
# xr.load_dataset("my_dataset.nc")

---
# **3 - Data selection**

<div class="alert alert-success"><p>

Full tutorial here : [Indexing and selecting](http://xarray.pydata.org/en/stable/user-guide/indexing.html)
</p></div>


## 3.1 Load the data of a single subject

In [None]:
###############################################################################
subject_nb = 2
###############################################################################

# load the high-gamma activity
file_hga = os.path.join(ROOT, 'hga', f'hga_s-{subject_nb}.nc')
hga = xr.load_dataarray(file_hga)

# load the name of the brain regions
file_anat = os.path.join(ROOT, 'anat', f'anat_s-{subject_nb}.xlsx')
anat = pd.read_excel(file_anat)

# load the behavior
file_beh = os.path.join(ROOT, 'beh', f'beh_s-{subject_nb}.xlsx')
beh = pd.read_excel(file_beh)
hga['trials'] = list(beh['valence'])

# hga
# anat
# beh

## 3.1 Temporal selection

In [None]:
# select a temporal section
# hga.sel(times=slice(0., 1.))

# select the data at a specific time point
# hga.sel(times=0.)

# select the data at a specific time point (oups)
# hga.sel(times=0.1)  # method='nearest'

## 3.2 Spatial selection

In [None]:
# select the data coming from a single channel
# hga.sel(channels='O4-O3')

# select the data coming from multiple channels
# hga.sel(channels=['O4-O3', "F'8-F'7"])

## 3.3 Select condition

In [None]:
# select the trials leading to a "-1€" outcome
# hga.sel(trials='-1€')

# select the trials leading to a "-1€" and "+1€" outcome
# hga.sel(trials=['-1€', '+1€'])  # oups

# patch
# outcomes = hga['trials'].data
# is_1 = np.logical_or(outcomes == '-1€', outcomes == '+1€')
# hga.sel(trials=is_1)

## 3.4 Multi-selection

In [None]:
# hga.sel(
#     times=slice(0., 1.),
#     channels=['O4-O3', "F'8-F'7"],
#     trials='-1€'
# )

## 3.5 Multi-indexing (advanced)
### 3.5.1 Define the multi-index

In [None]:
# let's start by making a copy of the hga
hga_c = hga.copy()

# get contact names and brain region names
contacts = hga_c['channels'].data
parcels = list(anat['roi'])

# rename the spatial dimension of the hga
hga_c = hga_c.rename(channels='spatial')

# build the multi-index
midx = pd.MultiIndex.from_arrays(
    (contacts, parcels), names=('channels', 'roi')
)
# midx

# replace in the spatial dimension
hga_c['spatial'] = midx
# hga_c

### Data selection using multi-indexing

In [None]:
# select all the contacts in the dlPFC
# hga_c.sel(roi='dlPFC')

---
# **4 - Operations on Xarray**

## 4.1 Classical _min_, _max_ and _mean_

In [None]:
# compute the mean of the hga across channels
# hga.mean('channels')

# compute the mean of the hga across channels and trials !
# hga_m = hga.mean(['channels', 'trials'])
# hga_m

# on this mean, get the minimum value of hga across all time points
# hga_m.min('times')

# on this mean, get the maximum value of the hga between [0, 1] seconds
# hga_m.sel(times=slice(0., 1.)).max('times')

## 4.2 The `groupby` miracle

In [None]:
# group by condition type and take the mean hga
# hga.groupby('trials').mean('trials')

# same, but also group by brain region !
# hga_c = hga.copy().rename(channels='parcels')
# hga_c['parcels'] = parcels
# hga_c.groupby('trials').mean('trials').groupby('parcels').mean('parcels')

---
# **5 - Data plotting**

## 5.1 Simple lineplot

In [None]:
# plot the first trial of the first channel
# hga.isel(trials=0, channels=0).plot(x='times');

# plot the mean over channels and trials
# hga.mean(['trials', 'channels']).plot(x='times');

## 5.2 Combining _groupby_ and _plot_

In [None]:
# plot the hga per outcome of the first contact
# hga.isel(channels=0).groupby('trials').mean('trials').plot(x='times', hue='trials');

# plot the mean hga per outcome and per brain region
# hga_c = hga.copy().rename(channels='parcels')
# hga_c['parcels'] = parcels
# hga_m = hga_c.groupby('trials').mean('trials').groupby('parcels').mean('parcels')
# hga_m.plot(x='times', hue='trials', col='parcels')  # , col_wrap=2

## 5.3 Heatmap (bonus)

In [None]:
# plot the single trial activity of the first channel
# hga_c = hga.copy()
# hga_c['trials'] = np.arange(len(hga_c['trials']))
# hga_c.isel(channels=0).plot(x='times', y='trials', vmin=-10, vmax=10, cmap='RdBu_r')
# plt.axvline(0., color='black')

# same, but on the mean inside the dlPFC
# hga_c = hga.copy().rename(channels='parcels')
# hga_c['parcels'] = parcels
# hga_c = hga_c.groupby('parcels').mean('parcels')
# hga_c['trials'] = np.arange(len(hga_c['trials']))
# hga_c.sel(parcels='dlPFC').plot(x='times', y='trials', vmin=-5, vmax=5, cmap='RdBu_r')
# plt.axvline(0., color='black')

---
# **---- Test yourself ! ----**

## **1. Load fresh data !**

<div class="alert alert-warning"><p>

**[Instructions]** Load the data, behavior and anatomy of subject #7
</p></div>

In [None]:
# write your answer

## **2. Data manipulation**

### 2.1 Multi-items selection

<div class="alert alert-warning"><p>

**[Instructions]**

Select the high-gamma activity when the subject received an outcome `+0€` for the channel `"Q'2-Q'1"` and select only the time points between `[0., 1]s`
</p></div>


In [None]:
# write your answer

### 2.2 Mean over time

<div class="alert alert-warning"><p>

**[Instructions]**

Select the hga for the channel `"Q'2-Q'1"` and take the mean across trials and across the temporal period between `[0., 1]s`
</p></div>

In [None]:
# write your answer

### 2.3 Group by outcome

<div class="alert alert-warning"><p>

**[Instructions]**

Group the data by `outcome` and take the `mean` per outcome
</p></div>

In [None]:
# write your answer

### 2.4 Set the name of the brain regions

<div class="alert alert-warning"><p>

**[Instructions]**    
- Get the list of brain region names associated to each contact
- Rename the `channels` dimension of the hga `DataArray` with `parcels`. Put the result in a new `DataArray` variable named `hga_roi`
- Replace the chanel names by the name of the brain regions for the dimension `parcels`
</p></div>

In [None]:
# write your answer

### 2.5 Group by parcel name

<div class="alert alert-warning"><p>

**[Instructions]**    
On the variable `hga_roi`, group the hga by the name of the brain regions and also take the mean per brain region
</p></div>

In [None]:
# write your answer

### 2.6 Group by outcome and parcels

<div class="alert alert-warning"><p>

**[Instructions]**    
Using the variable `hga_roi` :
- First, group by outcome and take the mean per outcome. Place the result in a variable `hga_outc`
- On `hga_outc`, group by brain regions and take the mean per brain region. Place the result in a variable called `hga_outcr`
</p></div>

In [None]:
# write your answer

## **3. Plotting**
### 3.1 Single time-series

<div class="alert alert-warning"><p>

**[Instructions]**    
On the variable `hga_outcr`, plot the mean activity across outcomes for the brain region _anterior insula_ (`aINS`)
</p></div>

In [None]:
# write your answer

### 3.2 Plot all outcomes

<div class="alert alert-warning"><p>

**[Instructions]**    
On the variable `hga_outcr`, plot the hga of each outcome (i.e. 4 lines superimposed where each color describe the hga of a single outcome), only for the anterior insula.
    
_help : hga.plot(x="...", hue="...")_
</p></div>

In [None]:
# write your answer

### 3.3 Plot all outcomes for all of the brain regions

<div class="alert alert-warning"><p>

**[Instructions]**    
Same as above, on the variable `hga_outcr`, plot the hga of each outcome (i.e. 4 lines superimposed where each color describe the hga of a single outcome), but this time specify that you also want that each column (or row, as you want !) is going to be dedicated to a single brain region
    
_help : hga.plot(x="...", hue="...", col="...")_
</p></div>

In [None]:
# write your answer