## The Dataset

EEG data sampled at 500Hz in 1000 trials of two conditions. 

In [None]:
# load libraries
import os
import scipy as sp
import numpy as np
from numpy import sqrt
from scipy.io import loadmat
import matplotlib.pyplot as plt

%matplotlib inline 

Load the data that we will work on. The data contains two mat files. The first file is a 1000 by 1000 matrix that corresponds to the two conditions ```(trials x time)```. That means, the first 500 columns (time points) belong to the 1st condition and the last 500 columns (time points) belong to the second condition. The second file is a 1 by 500 matrix that corresponds to the time axis. 

In [None]:
# Define paths:
root_dir = '/Users/christinadelta/Desktop/intro_to_eeg_analyses/'
mat_dir = os.path.join(root_dir, 'data', 'my_matfiles')

In [None]:
# define file-specific paths:
eegcond_dir = os.path.join(mat_dir, 'eeg_allcond.mat') # path for the eeg data file
eegtimes_dir = os.path.join(mat_dir, 'eeg_times.mat')  # path for the time-points file

# load the data files using the scipy.io.loadmat() function 
eeg_conditions = loadmat(eegcond_dir)['eeg_allcond'] # a 1000 x 1000 .m file
eeg_times = loadmat(eegtimes_dir)['eeg_times'][0] # a 1 x 500 .m file

# look at the data
print(eeg_conditions.shape)
print(eeg_times.shape)

The ```eeg_conditions``` file contains both conditions in a 1000 by 1000 matrix:
* columns 1:500 belong to condition A
* columns 501:1000 belong to condition B

Note that the ```eeg_times``` matrix corresponds to the recording of one second. It will be more clear when we plot it. 

Split the matrix in two different matrices. One containing the eeg condition A and the other containing the eeg condition B:

In [None]:
eeg_a = eeg_conditions[:,0:500] # the eeg data for condition one
eeg_b = eeg_conditions[:,500:1000] # the eeg data for condition 2

# look at their shape. If they are both: 1000 x 500 then everything is alright
eeg_a.shape
eeg_b.shape

The rows of the matrices correspond to the 1000 trials of this dataset. Let's compute the mean and SD of the signal across trials. For the sake of this example, we will only compute the descriptes for condition one:

In [None]:
# compute the average ERP 
trials = len(eeg_a)
mean_a = eeg_a.mean(0) # we added zero in parentheses to specify that we compute mean across x-axis/rows/trials
sd_a = eeg_a.std(0) # compute the sd across trials
se_a = sd_a / sqrt(trials) # compute standard error of the mean

Great! We computed the ERP for condition A. Now let's visualise. Plot the ERP for condition A and include confidence intervals (CI) as shade around the signal:

In [None]:
plt.plot(eeg_times, mean_a, 'k', lw=3) # ERP
plt.plot(eeg_times, mean_a + 2 * se_a, 'k:', lw=1) # upper CI
plt.plot(eeg_times, mean_a - 2 * se_a, 'k:', lw=1) # lower CI
plt.xlabel('Time in seconds')
plt.ylabel('voltage [$\mu$ v]')
plt.title('ERP of condition A')
plt.show()

# and save the plot:
save_file = os.path.join(root_dir, 'figures', f'ERP_condA.png')
plt.savefig(save_file, facecolor='w', edgecolor='w')

### The EEG signal 

The voltage recording that comes from the scalp's surface (electroencephalogram) provides very useful information abou the temporal dynamics of neural activity, at the level of milliseconds. One of the main reasons we run EEG analyses is to investigate the **neural oscillations** or **rhythmic activity** whci is observed in different frequencies: 

* **Delta activity is observed at 1-4Hz** and is  linked to a broad variety of perceptual, sensorimotor, and cognitive operations.  Delta rhythms are very commonly associated with the deep stage 3 of NREM sleep, also known as slow-wave sleep (SWS), and aid in characterizing the depth of sleep.
* **Theta activity is observed at 5-8Hz** and underlies various aspects of cognition and behavior, including learning, memory, and spatial navigation in many animals. There are two types of theta activity: *the hippocampal theta rhythm*, a strong oscillation that can be observed in the hippocampus and other brain structures and the *cortical theta rhythm*, low-frequency components of scalp EEG, usually recorded from humans. 
* **Alpha activity is observed at 8-12Hz**, likely originating from the synchronous and coherent (in phase or constructive) electrical activity of thalamic pacemaker cells in humans. Alpha activity is the most studied one and is thought to have at least two different forms, which may have different functions in the wake-sleep cycle:  
    * The most widely researched is during the relaxed mental state, where the subject is at rest with eyes closed, but is not tired or asleep. This alpha activity is centered in the occipital lobe, although there has been speculation that it has a thalamic origin.
    * The second form of alpha wave activity is during REM sleep. As opposed to the awake form of alpha activity, this form is located in a frontal-central location in the brain. The purpose of alpha activity during REM sleep is not yet fully understood. Currently, there are arguments that alpha patterns are a normal part of REM sleep, and for the notion that it indicates a semi-arousal period.
* **Beta activity is observed at 13-30Hz**, they are associated with normal waking consciousness and can be split into three sections:
    * Low Beta Waves (12.5–16 Hz) or **Beta 1 power**
    * Beta Waves (16.5–20 Hz) or **Beta 2 power**
    * High Beta Waves (20.5–29 Hz) or **Beta 3 power**
* **Gamma activity is observed at 30-200Hz**. In humans, a gamma Rhythm is a pattern of neural oscillation with a frequency between 25 and 140 Hz, with the 40-Hz point being of particular interest. Gamma rhythms are correlated with large scale brain network activity and cognitive phenomena such as working memory, attention, and perceptual grouping.

### Now back to the dataset:

we can use the ```whos``` command to get more information about our variables 

In the experiments of this EEG recording participants were presented with two squares of different colour. The were asked to press space whenever the square was red and do nothing when the square was green. Note that the colour/response correspondence was counterbalanced across participants. 

Here, condition A corresponds to the **response emitted** trials and condition B corresponds to the **response omitted** trials.

The dataset contains 1000 trials (rows of the matrices) for each condition at 500 time points (columns of the matrices). 



In [None]:
whos

To look at the total number of trials we can either use the ```.shape``` function or the ```len()``` function

In [None]:
# the len function
nb_trials = len(eeg_a) 
print(nb_trials)

# or 
# the shape function
nb_trials = eeg_a.shape[0]
print(nb_trials)

The shape function of an array returns a tuple with the size of each dimension. By adding [0] we explicitely ask for the rows dimension, that is the first value of the tuple. 

Another useful property of python is that we can assign two variables at once:

In [None]:
nb_trials, nb_tps = eeg_a.shape
print(nb_trials, nb_tps)

Do you understand why? If not, look [here](https://note.nkmk.me/en/python-tuple-list-unpack/) for a clear explanation 