# LEARNING REPRESENTATIONS FROM EEG WITH DEEP RECURRENT-CONVOLUTIONAL NEURAL NETWORKS

## by Pouya Bashivan(2016)

### Challenge:
One of the challenges in modeling cognitive events from EEG data is finding representations that are invariant to inter- and intra-subject differences, as well as to inherent noise associated with EEG data collection.

### Overview:
First, EEG activities are transformed into a sequence of topology-preserving multi-spectral images, as opposed to standard EEG analysis techniques that ignore such spatial information.

Next, deep recurrent-convolutional network is used to learn representations of the images.

<img src="presentation_files/diagram.png">

EEG includes multiple time series corresponding to measurements across different spatial locations over the cortex.

The Fast Fourier Transform (FFT) is performed on the time series for each trial to estimate the power spectrum of the signal. 

Bashivan's paper uses dataset of the experiment related to Memory Operations, therefore the paper covers the eeg from three frequency bands of theta (4-7Hz), alpha (8-13Hz), and beta (13-30Hz)

### What is usually done:
Sum of squared absolute values within each of the three frequency bands was computed and used as separate measurement for each electrode. Aggregating spectral measurements for all electrodes to form a feature vector is the standard approach in EEG data analysis.

### The proposed method:
Transforming the measurements into a 2D image to preserve spatial dimensions and use miltiple color channels to represent the spectral dimension. 
In order to transform the 3D spacing of electrodes, Azimuthal Equidistant Projection is being used (<strong>Polar Projection</strong>). It is commonly used for representation of the Earth on a 2D image. 

Note:
A drawback of this method is that the distances between the points on the map are only preserved with respect to a single point (the center point) and therefore the relative distances between all pairs of electrodes will not be exactly preserved.

<img src="presentation_files/PPdiagram.png">
above is the application of the Polar Projection on the electrodes

Bashivan in the paper writes:
Approach is <strong>general enough</strong> to be used in any EEG-based classification task, and a specific problem of mental load classification presented later only serves as an example demonstrating potential advantages of the proposed approach.

### Architecture of the recurrent-convolutional neural network

VGG network used in Imagenet classification was mimicked for this project, however it was done with 2 approaches:
- Single frame approach. A single image is generated over the complete trial duration


- Multi frame approach. Each trial is separated into 0.5 second windows and then image is generated. Making 7 frames per trial. 

### Single - Frame approach
Single EEG image is generated by applying FFT on the whole trial duration(3.5 seconds).
There are 4 used ConvNet configurations used:
<img src="presentation_files/single-frame.png">

*For more information on Maxpool press down

<strong>Maxpool</strong> - For each of the regions represented by the filter, we will take the max of that region and create a new, output matrix where each element is the max of a region in the original input.

<strong>Softmax</strong> - a way of forcing the outputs of nn to sum to 1 so that they can represent the probability distribution across discrete mutually exclusive alternatives

## Maxpool 
A sample-based discretization process. The objective is to down-sample an input representation (image, hidden-layer output matrix, etc.), reducing its dimensionality and allowing for assumptions to be made about features contained in the sub-regions binned.

This is done to in part to help over-fitting by providing an abstracted form of the representation. As well, it reduces the computational cost by reducing the number of parameters to learn and provides basic translation invariance to the internal representation.

Max pooling is done by applying a max filter to (usually) non-overlapping subregions of the initial representation.

### Multi - Frame approach
The best performing ConvNet arch from single frame approach was used for each frame. 
Once again 4 approaches were used:
<img src="presentation_files/multi-frame.png">

<strong>Max-pooling:</strong> performs max-pooling over ConvNet outputs across time frames. While representations found from this model preserve spatial location, they are nonetheless order invariant.

<strong>Temporal convolution:</strong> applies a 1-D convolution to ConvNet outputs across time frames. We evaluated two models consisting of 16 and 32 kernels of size 3 with stride of 1 frame. Kernels capture distinct temporal patterns across multiple frames.

<strong>Long Short-Term Memory (LSTM):</strong> are a special kind of RNN, capable of learning long-term dependencies.

### Baseline Methods:

Approach was compared against various classifiers commonly used in the field, including Support-Vector Machines (SVM), Random Forest, sparse Logistic Regression, and Deep Belief Networks (DBN).

### Dataset:
EEG dataset acquired during a working memory experiment was used. 

During the experiment, an array of English characters was shown for 0.5 second (SET) and participants were instructed to memorize the characters. A TEST character was shown three seconds later and participants indicated whether the test character was among the first array (’SET’) or not by press of a button. Each participant repeated the experiment for 240 times.

The number of characters in the SET for each trial was randomly chosen to be 2, 4, 6, or 8. The number of characters in the SET determines the amount of cognitive load induced on the participant as with increasing number of characters more mental resources are required to retain the information.

<strong>The classification task is to recognize the load level corresponding to set size (number of characters presented to the subject) from EEG recordings.</strong> Four distinct classes corresponding to load 1-4 are defined and the 2670 samples collected from 13 subjects are assigned to these four categories.


Continuous EEG was sliced offline to equal lengths of 3.5 seconds corresponding to each trial. A total of 3120 trials were recorded. 

Only data corresponding to correctly responded trials were included in the data set which reduced the data set size to 2670 trials. 

For evaluating the performance of each classifier we followed the <strong>leave-subject-out cross validation approach</strong>. In each of the 13 folds, all trials belonging to one of the subjects were used as the test set. A number of samples equal to the test set were then randomly extracted from rest of data for validation set and the remaining samples were used as training set.

### Results:

In the experiment there was seen a slight improvement of classification error in using topology preserving projection over non-equidistant flattening projection (~0.6%). However, this observation could be dependent on the particular dataset and requires further exploration to conclude.

### Single - Frame:
<img src="presentation_files/results-single.png">

### Multi - Frame:
<img src="presentation_files/results-multi.png">

# Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG


## by Schirrmeister(2017)

Here, we studied deep ConvNets with a range of different architectures, designed for decoding imagined or executed movements from raw EEG.

Our results show that recent advances from the machine learning field, including batch normalization and exponential linear units, together with a cropped training strategy, boosted the deep ConvNets decoding performance, reaching or surpassing that of the widely-used filter bank common spatial patterns (FBCSP) decoding algorithm

*for more information on FBCSP press down

Our novel methods for visualizing the learned features demonstrated that ConvNets indeed learned to use spectral power modulations in the alpha, beta and high gamma frequencies. These methods also proved useful as a technique for spatially mapping the learned features, revealing the topography of the causal contributions of features in different frequency bands to decoding the movement classes.
Our study thus shows how to design and train ConvNets to decode movement-related information from the raw EEG without handcrafted features and highlights the potential of deep ConvNets combined with advanced visualization techniques for EEG-based brain mapping.

FBCSP is a machine learning approach for processing EEG measurements in motor imagery-based BCI (Brain Computer Interfaces). FBCSP addresses the problem of selecting an appropriate operational frequency band for extracting discriminating CSP features. FBCSP employs a feature selection algorithm to select discriminative CSP (Common Spatial Pattern) features from a bank of multiple bandpass filters and spatial filters, and a classification algorithm to classify the selected features.


http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4634130

### Approach:
<strong>There are 4 approaches designed:</strong>
- Deep ConvNet: somewhat a more generic feature extraction method
- Shallow ConvNet: somewhat similar to FBCSP, more specific feature extraction 
- Hybrid approach of both Deep and Shallow
- Residual ConvNet
    
<strong>Input representation:</strong> Generating images is not the best approach since EEG signals are assumed to approximate a linear superposition of spatially global voltage patterns caused by multiple dipolar current sources in the brain (Nunez and Srinivasan, 2006).

Unmixing of these global patterns using a number of spatial filters is therefore typically applied to the whole set of relevant electrodes as a basic step in many successful examples of EEG decoding (Ang et al., 2008; Blankertz et al., 2008; Rivet et al., 2009).

This is why <strong> raw EEG signals </strong> were used. The input is a 2D-array with the number of time steps as the width and the number of electrodes as the height. This approach also significantly reduced the input dimensionality compared with the “EEG-as-an-image” approach.

<img src="presentation_files/Schirrmeister_deep_conv_arch.png">

<img src="presentation_files/Schirrmeister_shallow_conv_arch.png">

### Design Choices for the ConvNet:
<img src="presentation_files/Schirrmeister_design_choices.png">

### Hybrid ConvNet

Hybrid ConvNet simply fuses both networks after the final layer. The softmax layer was replaced by 60(for Deep) and 40(for Shallow) ELU layers. Resulting 100 feature maps were concatenated and used as the input for new softmax classification layer. The hybrid ConvNet is retrained from scratch.  

### Residual ConvNet

Recently won several benchamarks in the Computer Vision field. ResNets typically have a very large number of layers and we wanted to investigate whether similar networks with more layers also result in good performance in EEG decoding.

### Training:

2 approaches:
- Trial-wise training: uses whole duration of the trial therefore similar to FBCSP
- Cropped training: used in onject recognition in images

### Trial-wise

Both Datasets had 4.5 second trials. This led to 288 training examples per subject for the BCI Competition Dataset and about 880 training examples per subject on the High-Gamma Dataset after their respective train-test split.

### Cropped training
Crops of about 2 seconds were used as input. Since our crops are smaller than the trials, the ConvNet input size is also smaller.

### Datasets used:
Smaller public dataset (BCI Competition IV dataset 2a) for comparing to previously published accuracies and a larger new dataset acquired in our lab for evaluating the decoding methods with a larger number of training trials (approx. 880 trials per subject, compared to 288 trials in the public set).

Brain activity while performing movement of hands, feet and at rest. 

### Results:
<img src="presentation_files/Schirrmeister_results.png">

### Comparison of 2 papers:

| Study:               | Decoding problem:                        | Input:                                   | Network Architecture:                    | Basline Methods:                         | Results:                             |
| -------------------- | ---------------------------------------- | ---------------------------------------- | ---------------------------------------- | ---------------------------------------- | ------------------------------------ |
| Schirrmeister (2017) | Imagined and executed movement classes   | Time, 0-125hz.  The inputs are 2D array of text. | 4 approaches, deep, shallow,  hybrid and Residual Convolutional Networks for raw EEG data | FBCSP                                    | Almost similar results to Baseline   |
| Bashivan (2016)      | Cognitive load (number of characters to memorize) | Frequency, mean power for 4-7Hz, 8-13Hz, 13-30Hz. The inputs are interpolated images. | 2 approaches, Single frame and multi frame with 4 approaches in each, with variation of Convolutional network with LSTM | SVM, Random Forest, sparse Logistic Regression and Deep Belief Networks | Better results than Baseline methods |