# Instructions

First make sure you followed the steps sent per email:

_To make the most of the workshop you should bring your laptop with Anaconda installed (see [anaconda](https://www.anaconda.com/products/distribution%3E) for how to install). Once conda is installed you can already download MNE python (https://mne.tools/) to save some time during the set-up, a recommended way is to use a dedicated conda environment as follows (see [conda managing environments](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#)):_
```
$ conda create --override-channels --channel=conda-forge --name=hmp mne
```

Then activate the created environment through the terminal (linux/MacOS) or through the Anaconda Prompt (windows):
```
conda activate hmp
pip install hmp
pip install matplotlib==3.7.1 #issue with non-interactive plotting and MNE
```

If everything went without error you can download the tutorials folder on the associated github page https://github.com/GWeindel/hsmm_mvpy/tree/main/tutorials
If you had troubles during the installation contact us.

Navigate to the folder where you downloaded the repository through the terminal/prompt and launch open a jupyter notebook session (alternatively you can also first launch the following command and then navigate to the folder):

```
jupyter notebook
```

You should see the content of the folder you downloaded, during the workshop we will be using the notebooks in the folder /workshop.
After this execute these cells up to data format and wait for the workshop to resume

# Practical, methodological and theoretical grounds

## Data 

### Simulation

In [None]:
## Importing these packages is specific for this simulation case
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import gamma

## Importing HMP
import hsmm_mvpy as hmp
from hsmm_mvpy import simulations

In [None]:
cpus = 2 # For multiprocessing, usually a good idea to use multiple CPUs as long as you have enough RAM

n_trials = 20 #Number of trials to simulate

##### Here we define the sources of the brain activity (event) for each trial
n_sources = 4
frequency = 10.#Frequency of the event defining its duration, half-sine of 10Hz = 50ms
amplitude = .5e-6 #Amplitude of the event in nAm, defining signal to noise ratio
shape = 2 #shape of the gamma distribution
means = np.array([60, 150, 200, 100, 80])/shape #Mean duration of the stages in ms
names = simulations.available_sources()[:n_sources+2]#Which source to activate at each stage (see atlas when calling simulations.available_sources())

sources = []
for source in zip(names, means):#One source = one frequency, one amplitude and a given by-trial variability distribution
    sources.append([source[0], frequency, amplitude, gamma(shape, scale=source[1])])

# Function used to generate the data
file = simulations.simulate(sources, n_trials, cpus, 'dataset_raw', location=25, overwrite=False)
#Recovering sampling frequency of the simulated dataset
sfreq = simulations.simulation_sfreq()
#load electrode position, specific to the simulations
positions = simulations.simulation_positions()

The goal of this cell is to have a realistic EEG dataset for a single participant

In [None]:
#Recovering the events to epoch the data (in the number of trials defined above)
generating_events = np.load(file[1])
resp_trigger = int(np.max(np.unique(generating_events[:,2])))#Resp trigger is the last source in each trial
event_id = {'stimulus':1}#trigger 1 = stimulus
resp_id = {'response':resp_trigger}
#Keeping only stimulus and response triggers
events = generating_events#[(generating_events[:,2] == 1) | (generating_events[:,2] == resp_trigger)]#only retain stimulus and response triggers

#Visualising the raw simulated EEG data
import mne
raw = mne.io.read_raw_fif(file[0], preload=False, verbose=False)
raw.pick_types(eeg=True).plot(scalings=dict(eeg=1e-5), events=events, block=True);

### Data Format 
Now we read the data into an xarray.

In [None]:
# Reading the data
eeg_data = hmp.utils.read_mne_data(file[0], event_id=event_id, resp_id=resp_id, sfreq=sfreq, 
            events_provided=events, verbose=False)
eeg_data

The previous function epochs the data, cut the EEG data at each RT value and stores any additional infomation along (here 'event_name')


In [None]:
#example of usage of xarray
print(eeg_data)
eeg_data.sel(channels=['EEG 001','EEG 002','EEG 003'], samples=range(400))\
    .data.groupby('samples').mean(['participant','epochs']).plot.line(hue='channels');

Transformation of the data:
- Standardise the individual variances
- Apply PCA
- Zscore the data

In [None]:
hmp_data = hmp.utils.transform_data(eeg_data, apply_standard=False)

In [None]:
print(hmp_data)

In [None]:
hmp.visu.plot_components_sensor(hmp_data, positions)

### HMP and implementation assumptions

As in Anderson, Zhang, Borst & Walsh, 2016

In [None]:
init = hmp.models.hmp(data=hmp_data, eeg_data=eeg_data, sfreq=eeg_data.sfreq,
                      event_width=50, distribution='gamma', shape=2, location=25)

Template of a 50 ms bump (i.e. a 10Hz half-sin)

In [None]:
plt.plot(init.template, 'x')
plt.ylabel('Normalized value')
plt.xlabel('Samples NOT time');

When calling ```hmp.models.hmp```, the function automatically cross-correlates the data with the provided template

In [None]:
plt.plot(init.data_matrix[:,0,:]);

number_of_sources = len(np.unique(generating_events[:,2])[1:])#one trigger = one source
#Recover the actual time of the simulated events
random_source_times = np.reshape(np.ediff1d(generating_events[:,0],to_begin=0)[generating_events[:,2] > 1], \
           (n_trials, number_of_sources))
plt.vlines(random_source_times[0,:-1].cumsum()-1, -3, 3, 'k');#overlaying the simulated stage transition times

### HMP parameters

For each stage, the scale of the probability distribution

In [None]:
true_pars = np.reshape(np.concatenate([
    np.repeat(init.shape, np.shape(random_source_times)[1]), 
    np.mean(random_source_times/init.shape,axis=0)]),
                       [2,np.shape(random_source_times)[1]]).T

T = 350
for stage in range(5):
    plt.plot(np.linspace(0,T,1001),gamma.pdf(np.linspace(0,T,1001), 2, scale=true_pars[stage,1]), label=f'Stage {stage}') 
plt.xlabel('t')
plt.legend();

In [None]:
true_pars

For each event, the contribution of the components/electrodes to the event

In [None]:
sample_times = np.zeros((init.n_trials, n_sources), dtype=int)
for event in range(n_sources):
    sample_times[:,event] = init.starts+np.sum(random_source_times[:,:event+1], axis=1)-1
true_magnitudes = np.mean(init.events[sample_times[:,:]], axis=0)

In [None]:
true_magnitudes

Together these two set of parameters allow us to fit an HMP

In [None]:
estimates = init.fit_single(n_sources, parameters = true_pars, magnitudes=true_magnitudes, maximization=False)

hmp.visu.plot_topo_timecourse(eeg_data, estimates, positions, init, magnify=1, sensors=True, 
        times_to_display = np.mean(np.cumsum(random_source_times,axis=1),axis=0))

To estimate this set of parameters (magnitudes and gamma scales) we use the expectation maximization algorithm on the event probability as obtained through the Baum-Welsh algorithm.

In [None]:
estimates = init.fit_single(n_sources)

hmp.visu.plot_topo_timecourse(eeg_data, estimates, positions, init, magnify=1, sensors=True, 
        times_to_display = np.mean(np.cumsum(random_source_times,axis=1),axis=0))

In [None]:
plt.plot(estimates.traces)
plt.ylabel('Log-likelihood')
plt.xlabel('EM iteration');
