# Overview Results

We show three models:
* Filter bank common spatial patterns
* shallow square net (aka raw net)
* deep convnet (5 layer/6 layer) 

We have three datasets:

* [BCI Competition IV dataset 2a](http://www.bbci.de/competition/iv/#dataset2a) (motor imagery, 4 class)
* Our 4 sec movement set (motor execution, 4 class)
* [Kaggle Grasp Lift Set](https://www.kaggle.com/c/grasp-and-lift-eeg-detection/data) (motor execution, 6 class, multilabel)

## Models

### Filter bank common spatial patterns

Filter bank common spatial patterns is same as in master thesis, except I now use overlapping filter bands.

#### Filter bands 

#### BCI Competition

6 width and 3 overlap at low frequencies, 8 width and 4 overlap at high frequencies.

Low-high frequency border here at 22 hz.

||
|-|
|**low** | 4| 7|10|13|16|22|26|30|34|
|**high**|10|13|16|19|22|30|34|38|42|


<div style="color:darkgreen"><br/>*No overlap at 22, does it matter? Could reprogram and rerun, CSP is very fast after all...*</div>

#### Our data

Low-high frequency border here at 22 hz.

||
|-|
|**low** | 4| 7|10|13|16|19|22|25|28|34|38|42|46|50|54|58|62|66|70|74|78|82|86|
|**high**|10|13|16|19|22|22|22|31|34|42|46|50|54|58|62|66|70|74|78|82|86|09|94|




<div style="color:darkgreen">*Similarly, no overlap at 34*</div>



## Shallow Square Net

Same as in master thesis plus batch norm after convolution.

First filter size and pool size/stride depends on sampling rate (250 hz for bci competition, 500 for others).
Second filter size depends on number of sensors (22 for bci competition, 32 for kaggle, 45 for ours (for ours, using only C sensors).

"Filter size" of dense layer determines the size of the input window. 30 is for samplewise trained models, 61 for trialwise trained models. Input windows are about 2 and 4 seconds respectively. Smaller input window for samplewise trained models is because otherwise a large part of the input window would be before the trial start, where there should be no class-discriminatory information. 

Softmax for bci competition and our set, sigmoid for kaggle set.

|#Layer|Layer|Filter Size| Filter Stride|Nonlinearity/PoolingMode|
|-|
|1|Conv|(25/50)x1|1x1|Identity|
|2|Conv|1x(22/32/45)|1x1|Identity|
|3|Batch Norm|||Square|
|4|Pool|(75/150)x1|(15/30)x1|Mean->Log|
|5|Dense|(30/61)x1|1x1|Softmax/Sigmoid|

We use dropout with $p=\frac{1}{2}$ before the dense layer.

## Deep 4 Net

4 refers to number of convolutional layers

|#Layer|Layer|Filter Size| Filter Stride|Nonlinearity/PoolingMode|
|-|
|1|Conv|(10/20/30)x1|1x1|Identity|
|2|Conv|1x(22/32/45)|1x1|Identity|
|3|Batch Norm|||ELU|
|4|Pool|3x1|3x1|Max|
|5|Conv|(10/20/30)x1|1x1|Identity|
|6|Batch Norm|||ELU|
|7|Pool|3x1|3x1|Max|
|8|Conv|(10/20/30)x1|1x1|Identity|
|9|Batch Norm|||ELU|
|10|Pool|3x1|3x1|Max|
|11|Conv|(10/20/30)x1|1x1|Identity|
|12|Batch Norm|||ELU|
|13|Pool|3x1|3x1|Max|
|14|Dense|(2/4/8/15)x1|1x1|Softmax/Sigmoid|

Filter size is 10 for bci competition, 20 for ours, 30 for kaggle.

Number of filters is 40 for all layers.

Length of final dense layer filter depends on sampling rate _and_ on trial or sample wise training.

ELU is [exponential linear unit](http://arxiv.org/abs/1511.07289).


We use dropout with $p=\frac{1}{2}$ before all convolutional layers + the dense layer.

<div style="color:darkgreen"><br/>*2/4/8/**15** is probably confusing and has no real reason, should i rerun with 16? :)*</div>


<div style="color:darkgreen">*Decreasing filter length in later layers might make sense? Probably not worth trying now?*</div>

## Deep 5 Net

Same as deep 4, except one times more conv/batch norm/pool. Used on kaggle data. Combined with filter size 30, results in much larger input window (3700 samples instead of 500 (bci competition) or 1000 (our set)).

## Results

### BCI Competition

First, small explanation: Originally, the bci competition was evaluated samplewise (kappa score). In the paper we compare to (http://www.eurasip.org/Proceedings/Eusipco/Eusipco2015/papers/1570104275.pdf), they evaluate trialwise. However they only use a 0.5-2.5sec window of the 4sec trial, beacuse the winner of the original competition used this window. I think this actually decreases accuracy on the trialbased evaluation and of course limits data available to train on. So first I validated our CSP implementation with the 0.5-2.5 sec window and then proceeded with the full window (or 0.5-4sec for CSP).



|Model|Training|Window|Highpass|Accuracy|
|-|
|Their FBCSP|trial|0.5-2.5s||67.0
|Their CSP+CNN|trial|0-3s||70.6|
|Our FBSCP|trial|0.5-2.5s||67.1|
|Our FBSCP|trial|0.5-4s||68.2|
|square net|trial|0-4s|no|69.8|
|square net|trial|0-4s|4 Hz|68.4|
|square net|sample|0-4s|no|71.5|
|square net|sample|0-4s|4 Hz|69.9|
|deep 4 net|trial|0-4s|no|64.9|
|deep 4 net|trial|0-4s|4 Hz|54.4|
|deep 4 net|sample|0-4s|no|65.6|
|deep 4 net|sample|0-4s|4 Hz highpass|61.7|

CSP is using only frequencies baove 4 Hz with our filterbands so no necessity to highpass. Data is known to be affected by EOG artefacts (that was the point of the competition) and one should avoid using them for classification. So highpassed results could be more reliable.



<div style="color:darkgreen"><br/>
*I also have samplewise accuracies for the samplewise trained models, are they important? I only have them on the samples that were predicted, from 2 sec after trial start to trial end. Trialwise prediction is average of these... In general if we want to compare to samplewise kappa scores from original competition, retraining with different input windows may be necessary...*
</div>

<div style="color:darkgreen"><br/>
*All the data is uncleaned. I tried cleaning only the training data for CSP, but this has led to slightly worse overall results (though not consistent, better for some subjects, worse for others.. amount of rejected trials was typically between 5-20%, though ~50% for one subject)*
</div>


<div style="color:darkgreen"><br/>
*I had 74% before for the shallow net, without and with 4 Hz highpass, when resampling to 150 Hz (trial trained). Should I report this? As now to have less results overall, I just used no resampling for all models and all datasets.*
</div>


### Our Data

|Model|Training|Window|Highpass|Accuracy|
|-|
|Our FBSCP|trial|0.5-4s||89.7|
|square net|trial|0-4s|no|89.3|
|square net|trial|0-4s|4 Hz|92.1|
|square net|sample|0-4s|no|90.3|
|square net|sample|0-4s|4 Hz|93.1|
|deep 4 net|trial|0-4s|no|90.0|
|deep 4 net|trial|0-4s|4 Hz|84.0|
|deep 4 net|sample|0-4s|no|90.6|
|deep 4 net|sample|0-4s|4 Hz highpass|86.7|

### Kaggle

|Model|Training|Window before event|Highpass|Valid AUC| Kaggle AUC|
|-|
|Winner Ensemble|||||98.1|
|[Recurrent Convolutional Singlemodel](https://github.com/stupiding/kaggle_EEG)|sample|7.5s|no||97.7|
|square net|sample|7.5s|no|89.1||
|deep 4 net|sample|7.5s|no|96.7|96.8|

AUC means area under the [ROC curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).
Valid AUC refers to own test set, Kaggle AUC to value from submitting to Kaggle (mean over public and private leaderboard).

Would have been place 14 on Kaggle private leaderboard.


Training is always on all subjects together. Only things I have changed from before is Exponential linear units instead of leaky relu units and batch norm done correctly now (before nonlinearity, not after).

<div style="color:darkgreen"><br/>
*I had 97.0% before with a bit more complicated architecture. Probably could also be improved further, guess it is not important?*
</div>


