# Power Spectrum Anlalysis 

In short, Power Spectrum Analysis or Power Spectral Density (PSD) is one of the standard methods used to quantify the EEG signal. **Power spectrum** reflects the frequency content (related to the activity bands introduced in the 2nd notebook), or the distribution of signal power over frequency. 

In this notebook you will be introduced to a few fundamental aspects of analysing EEG (or MEG) data. We will walk step-by step through more efficient ways of conducting visual inspection of the EEG, we will focus even more on the measures of spread (variance, sd, mean) and on autocovariance, we will learn about the spectral density (or power spectrum), about the Fourier Transform and more.

In [None]:
# Load the libraries
import os
import scipy as sp
import numpy as np
from pylab import *
from numpy import sqrt, where
from numpy.fft import fft, rfft
from scipy.signal import spectrogram 
from scipy.io import loadmat
import matplotlib.pyplot as plt

%matplotlib inline 

## Quick overview of the notebook

In the next block of code, we will quickly run an analysis to introduce you to the core concept of the content of this notebook. Before moving to the more details parts that come next, please make sure to play around with the code of the overview and make sure you understand all the aspects. In my experience, understanding the EEG signal can be difficult and it will probably require effort, but if you spend some time in the beginning to comprehend all the fundamental aspects, then, when the time comes and you analyse your own data, things will be much easier.

In [None]:
# Define paths:
root_dir = '/Users/christinadelta/Desktop/intro_to_eeg_analyses/'
mat_dir = os.path.join(root_dir, 'data', 'my_matfiles')

# define file-specific paths:
eegfile = os.path.join(mat_dir, 'eeg.mat') # path for the eeg data file
eegtimesfiles = os.path.join(mat_dir, 'eegtime.mat')  # path for the eeg time-points file

# load the data
eeg = loadmat(eegfile)['eeg'].reshape(-1)
eeg_times = loadmat(eegtimesfiles)['time'][0]

ti = eeg_times[1] - eeg_times[0] # define the interval between each time point
eeglen = eeg.shape[0] # length of data matrix
total_time = eeglen * ti # duration of the data

# now moving to the more complicated functions:
ft_eeg = fft(eeg - eeg.mean()) # run fourier transform of the eeg
ft_eeg.shape

spectrum_eeg = 2 * ti ** 2 / total_time * (ft_eeg * ft_eeg.conj()) # calculate eeg spectrum 
spectrum_eeg = spectrum_eeg[:int(len(eeg) / 2)] # remove negative frequencies-values

fres = 1 / total_time.max() # frequency resolution
fn = 1 / ti / 2 # nyquist frequency 
xx = np.arange(0, fn,fres) # this is how the frequency will be plot in the x-axis

# make a plot 
plt.plot(xx, spectrum_eeg.real) # plot power spectrum vs frequency 
plt.xlim([0, 100]) # this is the frequency range
plt.xlabel('frequency in Hz')
plt.ylabel('Power [$\mu V^2$/Hz]')

Play with the code above and try to understand it.
* Do you understand how the data is loaded?
* Do you understand what the ```.reshape()``` function does?
* Do you understand how we compute the power spectrum and the rest? 

In the next blocks We we see all the above in detail, step by step.

## Introduction

EEG data provide a measure of brain voltage activity with very high temporal resolution (in terms of milliseconds) but poor spatial resolution (around $10cm^{2}$) of the cortex.

Here we will how to analyse EEG data to determine what rhythmic activity is present. This way we wiil learn about important techniques to characterise rhythms and data. You will be introduced to **Fourier Transform (FT)** and **Power Spectral Density (PSD)**, and many other methods associated with these techniques.

The dataset we are working with is the same as in the previous notebook. We will use data from one subject again, only here I added more data. This dataset contains 2 sec of recording (instead of one sec that we had in the previous notebook). The data is saved in the mat file called ```eeg.mat```, if you wonder how I saved the EEG signal in a mat file ask me directly. 

### What are we going to do in this notebook? 

We will analyse the 2 seconds of EEG data by characterising (in terms of mean, variance and sd) the observed activity. There are different ways to do so. We will focus on working with the **Fourier Transform**. You will first learn how to compute the **FT** and teh associated spectrum. This technique provides a nice way to assess rhytmic structure in time series data.

In a nutshell, these are the steps we are going to follow:
* Initial data visualisation 
* Computation of mean, sd, variance and autocovariance
* Introduction to power spectrum and the spectogram

### Initial data visualisation
Before running any type of analysis on your data, run a visual inspection to make sure that everything is fine:

In [None]:
# Define paths:
root_dir = '/Users/christinadelta/Desktop/intro_to_eeg_analyses/'
mat_dir = os.path.join(root_dir, 'data', 'my_matfiles')
eegfile = os.path.join(mat_dir, 'eeg_2.mat') # path for the eeg data file
eegtimesfiles = os.path.join(mat_dir, 'tp.mat')  # path for the eeg time-points file

# load the data
eeg = loadmat(eegfile)['eeg_2'][:, 0]
tp = loadmat(eegtimesfiles)['time'][0]

# plot the signal 
plt.plot(tp, eeg)
plt.xlabel('Time in seconds')
plt.ylabel('Voltage [$\mu$ v]')

Take a look at the graph, what can you tell about the EEG signal? 

Probably your first thought is that the activity is very rhythmic, meaning, the EEG data goes up and down in time periodically. This is called dominant rhythmic activity. Well, this is fine *qualitative observation* but we need to go beyond this and make it *quantitative*.

We can approximate the frequency of this rhythmic activity by counting the number of oscillations that occur in a 1 second interval. In order to do that we will need to count the total number of maxima (or peaks) in our data and then divide by the seconds of the data (i.e. total number of maxima divided by 2).
This may look quite convinient here because we have such a short time series (only two 2 sec) to play with, but, in the real world your EEG data will be much longer and analysing so many maxima over an extended interval can probe errors. What we can do instead is to count the number of maxima in a smaller interval (e.g. 0.2 sec) and then multiply by 5 to look at the number of peaks in the first 0.2 sec.
For example, if we have 8 peaks in 0.2 sec and we multply this by 5 we have: $12*5 = 60$. This means 40 peaks per second or 60 cycles per second or 60 Hz

If you count the maxima of the first 0.2 sec of our data, you will see that we have 8 peaks and $12*5 = 60$, that corresponds to 60Hz, which is a frequency in the gamma band. This is pretty nice because high frequency oscillations at the range of 40-60 Hz (the gamma band oscillations) are thought to be associated with cognitive processing in the brain. However, we shouldn't make conclusions just yet. In a normal rhytmic activity we would observe a spread of ranges of rhythms at neighboring frequencies. But the rhythmic activity we see here seems very concentrated and regular around 60Hz thus we conclude that this is probably electrical noise. 

So, what can we do when we realise that our rhytmic activity is dominated by electrical noise? Should we abandon this dataset and move to analyse other? Well, this is not advised. What we can do instead is to keep analysing this dataset. Don't forget that when analysing EEG signal, only 25% of the information there is task related, the rest 75% is noise + irrelevant brain activity. Besides, in future tutorials you will see that there are very powerfull algorithms dedicated to clean the signal for us. 

Back to our noisy data. As decided, we continue working on it because there is probably *some* information hiding in the signal background. 

### Sampling  
When we plot the data, we see that the voltage trace looks like a continuous line. However, this is NOT the case. If we pick one interval of this continuum and zoom in, we find that the data consist of discrete points. Although the true brain signal may indeed evolve as a continuous voltage trace in time, we do not observe this true signal in the EEG signal. Instead, what we collect is a discrete sampling of this signal in time. Imagine the EEG data collected as time points very closely to one another. The EEG device decides how close the time-points are to each other and it draws lines connecting them to one another so that the signal appears continuous. The way the device defines how close these points are, is called "sampling". Our data were sampled at 1000Hz, which corresponds to 1 sample of data every 1 millisecond. 

So, we observe not the true signal but discrete samples of the signal in time. Those points are so close to each other that the signal appears as a continuum in time. 

Let's visualise it:

In [None]:
plt.plot(tp[:25], eeg[:25], 'o-') # plot the first 25 time points of the eeg
plt.xlabel('Time in seconds')
plt.ylabel('Voltage [$\mu$ v]')