# ******* Preparing the data for analysis **********************

The following script is an overview of the basic steps applied to prepare the data for analysis.
This is called *pre-processing* and involves quite simply cleaning the data. 
The steps applied here are:
- Downsampling
- Filtering 
- Re-referencing
- Detecting noisy electrodes


In [None]:
# Magic command to allow us to interact with figures in Jupyter; only works in Jupyter.
import os
import matplotlib.pyplot as plt
import numpy as np
import mne
import ipympl
import pandas as pd

%matplotlib qt

In [None]:
'''
    Load in an BDF dataset. It should be in the same directory as this script.
    We will read in the bdf file using mne input-output module.
    This will load in a raw object.
    Then plot all channels of the raw data.
'''
%matplotlib widget
fname = 'sub-001_eeg_sub-001_task-think1_eeg.bdf'
rawIn = mne.io.read_raw_bdf(fname, preload=True)

scale_dict = dict(mag=1e-12, grad=4e-11, eeg=20e-6, eog=150e-6, ecg=5e-4,
     emg=1e-3, ref_meg=1e-12, misc=1e-3, stim=1,
     resp=1, chpi=1e-4)
mne.viz.plot_raw(rawIn, duration=5.0, scalings=scale_dict, remove_dc=True)

## 1. DOWNSAMPLING

Represents the number of times per second that the acquisition system samples the continuous EEG.
So, given sampling frequency (or sampling rate) of 1024Hz, this means that the system samples the signal every ______ seconds?

The sampling rate has an effect on the analyses that we can carry out on the EEG.
For example, if we are interested interested in studying EEG activity around 80Hz, sampling frequency needs to be **at least** twice this frequency of interest - this is the **Nyquist Rule**.

However, having a high sampling frequency also implies having a greater volume of data. This can mean longer computing times when we are analysing our data.
Generally, in EEG analysis, we are interested in activity in the 0.1Hz to 80Hz frequency band. This means that we do not necessarily need to have a sampling frequency as high as 1024Hz; a sampling frequency of 512Hz or 250Hz will be sufficient to capture the characteristics of the EEG of interest.

To reduce the rate at which our EEG is sampled, we can **resample** or **downsample** our data.
- How does resampling change the EEG signal?
- What other variable is automatically changed when we resample the EEG data?

In [None]:
rsamp = srate/2                                    # Downsample to half of the original sampling frequency.
RawIn_rs = RawIn.copy().resample(sfreq=rsamp)      # Create a copy of RawIn and apply downsampling to this copy.

# 2. Filtering

In EEG, we generally filter to remove high frequency artifacts and low frequency drifts.
We can filter our time-domain data, our continuous EEG.
We can also filter our spatial-domain data using spatial filters.

We begin by filtering our time-domain data:
- we apply a high-pass filter to remove low frequency drifts
- we apply a low-pass filter to remove high frequency artifacts.

In [None]:
## Filter the EEG Signal.
#  High-pass filter with limit of 0.1Hz. 
#  Note that we create a copy of the original rawIn object before filtering.

rawIn_hifilt = rawIn.copy().filter(0.1, None, fir_design='firwin')

In [None]:
## Filter the EEG Signal
#. Low-pass filter with a limit of 40Hz
#  Note that we create a copy of the original rawIn object before filtering.

rawIn_lofilt = rawIn.copy().filter(None, 40, fir_design='firwin'

# 3. Re-referencing 

The potential measured in microVolts is measured in relation to the potential at another point, called the reference.

This means that the activity at each channel is interpreted relative to the potential at a reference.
- the reference can be the mean activity of all electrodes.
- the average of the two mastoids (generally these reference channels are marked as Ref1, Ref2 or EXG1, EXG2)
The current dataset does not have the external (EXG) channels, so we will apply an average reference.

However, we cannot include the bad channels or the VEOG when applying the reference.
We use the *pick_types()* method to exclude these channels when applying the average reference.

<a href="https://predictablynoisy.com/mne-python/generated/mne.set_eeg_reference.html"> Link to MNE page on **mne.set_eeg.reference()**</a>

In [None]:
'''
    Note that we are excluding the eog channel and the bad channels from the average reference calculation.
'''
rawIn_ref = rawIn_lofilt.copy().pick_types(eeg=True, exclude= ['bads','misc', 'stim']).set_eeg_reference()

# 4. Detecting noisy electrodes 

Different approaches can be taken to identify those electrodes to reject from further analysis:
- Manual detection, manual annotation of the data.
- Study the spectrum of the data to detect outliers.
- Automatic detection of outliers based on measures based on amplitude, the predictability of the signal, the presence of energy in certain frequency bands.

## Task 1:

Select 4 midline electrodes and compare them as follows:
- Before and after bandpass filtering
- Before and after average reference
- Compare the result of reference using the average reference and the Cz as a reference. 