# Training a Simple CNN with EEG Data using fastai

This is a test project I am using to learn fastai v1 for structured data. I am using data from a past Kaggle competition to train a model that can detect certain events from EEG brainwave data. The events would then trigger certain gestures in a prosthetic device for example. My goal is to get perfect/near perfect predictions on the validation data. You can get more info on the contest/dataset [here](https://www.kaggle.com/c/grasp-and-lift-eeg-detection/)

## Download data

You will need to install kaggle-cli(via pip or something else)

In [None]:
!kg download -u [usuername] -p [password] -c grasp-and-lift-eeg-detection -f train.zip

In [None]:
!unzip train.zip data/train/

## Setup data directories

I have a train and valid folder. I chose to use the final subject(12) from the for validation.

## Making sure pytorch is recognizing my gpu

In [1]:
import torch
torch.cuda.current_device()

0

In [2]:
torch.cuda.get_device_name(0)

'GeForce GTX 1080'

## Data cleaning and organization

Basic imports and specifying data path

In [3]:
from fastai.tabular import * 
import pandas as pd
import numpy as np

path = './data/train'

The data is divided into several csv files, so first we're going to merge everything. We're specifying dtypes here because this is quite a bit of data, and we're going to run into some memory issues otherwise(I have 32GB of DDR4 and it wasn't enough before specifying dtypes). The *df* variable holds all our variables(all continuous) and the *labels* variable holds are dependent variables which were in the **events** csv files. We'll merge these into one dataframe further down.

In [None]:
subjects = 12
series = 9
df = None
labels = None
types1 = {
    'Id': 'str', 'Fp1': 'int16', 'Fp2': 'int16', 'F7': 'int16', 'F3': 'int16', 'Fz': 'int16',
    'F4': 'int16', 'F8': 'int16', 'FC5': 'int16', 'FC1': 'int16', 'FC2': 'int16', 'FC6': 'int16',
    'T7': 'int16', 'C3': 'int16', 'Cz': 'int16', 'C4': 'int16', 'T8': 'int16', 'TP9': 'int16',
    'CP5': 'int16', 'CP1': 'int16', 'CP2': 'int16', 'CP6': 'int16', 'TP10': 'int16', 'P7': 'int16',
    'P3': 'int16', 'Pz': 'int16', 'P4': 'int16', 'P8': 'int16', 'PO9': 'int16', 'O1': 'int16',
    'Oz': 'int16', 'O2': 'int16', 'PO10': 'int16'
}
types2 = {
    'Id': 'str', 'HandStart': 'int8', 'FirstDigitTouch': 'int8', 'LiftOff': 'int8', 
    'Replace': 'int8', 'BothReleased': 'int8'
}

for sj in range(1, subjects):
    for sr in range(1, series):
        c_df = pd.read_csv(f'{path}/subj{sj}_series{sr}_data.csv', dtype=types1)
        df = c_df if df is None else df.append(c_df, ignore_index=True)
        c_label = pd.read_csv(f'{path}/subj{sj}_series{sr}_events.csv', dtype=types2)
        labels = c_label if labels is None else labels.append(c_label, ignore_index=True)
df.tail()

Here we merge our training variables with the dependent variables and print the head to verify.

In [None]:
merge = pd.merge(df, labels, on='id')
merge.head()

Saving to a feather file to load later on and continue from here going forward. Data seems the same after reading from file.

In [None]:
merge.to_feather(f'{path}/merge.feather')

In [4]:
import feather
merge = feather.read_dataframe(f'{path}/merge.feather')
merge.head()

Unnamed: 0,id,Fp1,Fp2,F7,F3,Fz,F4,F8,FC5,FC1,...,O1,Oz,O2,PO10,HandStart,FirstDigitTouch,BothStartLoadPhase,LiftOff,Replace,BothReleased
0,subj1_series1_0,-31,363,211,121,211,15,717,279,35,...,459,173,120,704,0,0,0,0,0,0
1,subj1_series1_1,-29,342,216,123,222,200,595,329,43,...,409,141,83,737,0,0,0,0,0,0
2,subj1_series1_2,-172,278,105,93,222,511,471,280,12,...,440,141,62,677,0,0,0,0,0,0
3,subj1_series1_3,-272,263,-52,99,208,511,428,261,27,...,437,139,58,592,0,0,0,0,0,0
4,subj1_series1_4,-265,213,-67,99,155,380,476,353,32,...,446,171,67,581,0,0,0,0,0,0


Specifying continuous and dependent variables. As stated earlier this is a multi-classification problem so we need we need multiple dependent variables. And of course all other variables are continuous.

In [5]:
cont_vars = [
    'Fp1', 'Fp2', 'F7', 'F3', 'Fz', 'F4', 'F8', 'FC5', 'FC1', 'FC2',
    'FC6', 'T7', 'C3', 'Cz', 'C4', 'T8', 'TP9', 'CP5', 'CP1', 'CP2',
    'CP6', 'TP10', 'P7', 'P3', 'Pz', 'P4', 'P8', 'PO9', 'O1', 'Oz',
    'O2', 'PO10'
]
dept_vars = [
    'HandStart', 'FirstDigitTouch', 'LiftOff', 'Replace', 'BothReleased'
]

In [6]:
procs = [FillMissing, Categorify, Normalize]

test = TabularList.from_df(merge.iloc[800:1000].copy(), path=path, cont_names=cont_vars)
data = (TabularList.from_df(merge, path=path, cont_names=cont_vars, procs=procs)
                           .split_by_idx(list(range(800,1000)))
                           .label_from_df(cols=list(dept_vars))
                           .add_test(test)
                           .databunch())

Here is where we are having issues. Training doesn't seem to happen. I'm not really seeing gpu utilization and eventually the process silently stops but the notebook gets stuck in the running mode. I'm assuming the process just stops because it abruptly drops to 0% cpu utilization, but it also keeps a chunk of memory until I kill the kernel.

In [None]:
data.show_batch(rows=10)

In [None]:
from fastai.metrics import *

learn = tabular_learner(data, layers=[200,100], metrics=accuracy)
learn.fit(1, 1e-2)

epoch,train_loss,valid_loss,accuracy


In [None]:
learn.save('./models/eeg-1')

In [None]:
path2 = './data/valid'

df2 = None
labels2 = None

for sr in range(1, series):
    c_df2 = pd.read_csv(f'{path2}/subj12_series{sr}_data.csv', dtypes=types1)
    df2 = c_df2 if df2 is None else df2.append(c_df2, ignore_index=True)
    c_label2 = pd.read_csv(f'{path2}/subj12_series{sr}_events.csv')
    labels2 = c_label2 if labels2 is None else labels2.append(c_label2, ignore_index=True)
df2.tail()

In [None]:
merge2 = pd.merge(df2, labels2, on='id')
merge2.head()

In [None]:
learn.predict(merge2.iloc[0])