# **EEG Model with Keras and Wandb**
This is a test project I am using to learn Keras for structured data. I am using data from a past Kaggle competition to train a model that can detect certain events from EEG brainwave data. The events would then trigger certain gestures in a prosthetic device for example, using BCI technology. My goal is to get perfect/near perfect predictions on the testing data. You can get more info on the contest/dataset [here](https://www.kaggle.com/c/grasp-and-lift-eeg-detection/)

## **Install The Libraries**
First we install install all necessary Python libraries with pip.

In [None]:
%pip install scikit-learn
%pip install --upgrade keras
%pip install --upgrade tensorflow[and-cuda]
%pip install --upgrade pandas
%pip install --upgrade numpy
%pip install wandb
%pip install kaggle

## **Kaggle Environment Setup**
You will need to upload your *kaggle.json*, set the permissions so the file can be read.

In [None]:
!chmod 600 ../kaggle.json

Then we set the Kaggle configuration directory to our current working directory, as an environment variable.

In [None]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = '../'

Now we can download the data from the competition page, 

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -p ../data/kaggle-eeg/ -f train.zip

and unzip it into the data directory.

In [None]:
!unzip ../data/kaggle-eeg/train.zip -d ../data/kaggle-eeg

## **Data Processing**
First let's import all the libraries we need.

In [None]:
import numpy as np
import pandas as pd

#### **Pandas**

Next we specify the path of our training data. Then we're specifying the types of our features and labels, because otherwise pandas will use a lot of memory storing them. 

The training data is separated into files representing 12 test subjects and 8 series per subject. And the files with our labels have a suffix of `events` attached to them. 
* The `load_one_series` function takes a specific series of a subject, and merges it with the labels from the corresponding `events` csv.
* The `get_merged_series` function takes the nth series(the function parameter) 
from every subject, and merges them into one dataframe using the `load_one_series` function. We're going to use this for our validation set.

In [None]:
path = '../data/kaggle-eeg/train'

feature_types = {
    'Id': 'str', 'Fp1': 'int16', 'Fp2': 'int16', 'F7': 'int16', 'F3': 'int16', 'Fz': 'int16',
    'F4': 'int16', 'F8': 'int16', 'FC5': 'int16', 'FC1': 'int16', 'FC2': 'int16', 'FC6': 'int16',
    'T7': 'int16', 'C3': 'int16', 'Cz': 'int16', 'C4': 'int16', 'T8': 'int16', 'TP9': 'int16',
    'CP5': 'int16', 'CP1': 'int16', 'CP2': 'int16', 'CP6': 'int16', 'TP10': 'int16', 'P7': 'int16',
    'P3': 'int16', 'Pz': 'int16', 'P4': 'int16', 'P8': 'int16', 'PO9': 'int16', 'O1': 'int16',
    'Oz': 'int16', 'O2': 'int16', 'PO10': 'int16'
}

label_types = {
    'Id': 'str', 'HandStart': 'int8', 'FirstDigitTouch': 'int8', 'LiftOff': 'int8', 
    'Replace': 'int8', 'BothReleased': 'int8', 'BothStartLoadPhase': 'int8'
}

def load_one_series(sj, sr):
  df = pd.read_csv(f'{path}/subj{sj}_series{sr}_data.csv', dtype=feature_types, encoding='utf8')
  labels = pd.read_csv(f'{path}/subj{sj}_series{sr}_events.csv', dtype=label_types)
  keys = labels.keys()
  for id in range(0, len(df.index)):
    df.loc[id, 'Labels'] = 'None'
    for col in keys:
      if labels.at[id, col] == 1:
        df.loc[id, 'Labels'] = col
        break
  return df

def get_merged_series(sr):
  df = None
  lst = []
  for sj in range(1, 13):
    temp = load_one_series(sj, sr)
    lst.append(temp)
  df = pd.concat(lst)
  return df

Creating the validation set will take some time as these files are quite large. Make some coffee =). But once it's done we can save it locally to load faster in the future. So you should only need to do this once.  

In [None]:
valid = get_merged_series(8)

In [None]:
valid.to_csv('../data/kaggle-eeg/valid.csv', index=False, float_format='%.4f')

Here we load in our validation data.

In [None]:
valid = pd.read_csv('../data/kaggle-eeg/valid.csv', encoding='utf8')

Then we generate our first training set.

In [None]:
train = load_one_series(1, 1)

The following two lines just remove the `id` column from our dataframes since we don't need them.

In [None]:
train = train.drop(columns=['id'])
train['Labels'] = pd.Categorical(train['Labels'])
train['Labels'] = train['Labels'].astype('category').cat.codes
train_x = train.drop(columns=['Labels'])
train_y = train['Labels']

valid = valid.drop(columns=['id'])
valid['Labels'] = pd.Categorical(valid['Labels'])
valid['Labels'] = valid['Labels'].astype('category').cat.codes
valid_x = valid.drop(columns=['Labels'])
valid_y = valid['Labels']

We can check here to make sure our data is organized as expected.

In [None]:
train_x.info()

In [None]:
valid_y.info()

## **Training**

### **Wandb Logging**
First we're going to login to Wandb with our api key so that we can log the training. 

In [None]:
!wandb login d754544ba90d0be7ea7009afb39a9225330e6be9

Initialize Wandb and specify a project name to keep track of metrics

In [None]:
import wandb
from wandb.keras import WandbCallback

wandb.init(project="kaggle-eeg-tf", config={"hyper": "parameter"})

### **Model**

In [None]:
import tensorflow as tf

from tensorflow.keras.layers import Dense, Dropout, InputLayer, Normalization
from tensorflow.keras.models import Sequential
from tensorflow.keras.regularizers import l2

Now we can create our Keras model for training.

In [None]:
model = Sequential(
  [
    InputLayer(input_shape=(train_x.shape[1],)),
    Normalization(),
    Dense(128, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(32, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(16, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(6, activation='linear', kernel_regularizer=l2(0.01))
  ]
)

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  0.003,
  decay_steps=train.shape[0] / 5000 * 1000,
  decay_rate=1,
  staircase=False
)

model.compile(optimizer=tf.keras.optimizers.Adam(lr_schedule),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'],
              run_eagerly=True)

model.summary()

### **Fitting**
Here we convert our training and validation data frames into tensor flow datasets. 

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices((train_x.values, train_y.values)).shuffle(10000).batch(1000)
valid_ds = tf.data.Dataset.from_tensor_slices((valid_x.values, valid_y.values)).batch(1000)

In [None]:
model.fit(train_ds, validation_data=valid_ds, epochs=10, steps_per_epoch=12, callbacks=[WandbCallback()])

In [None]:
for sr in range(1, 13):
  for sj in range(1, 9):
    if sr == 1 and sj == 1:
      continue
    temp = load_one_series(sj, sr)
    temp = temp.drop(columns=['id'])
    temp['Labels'] = pd.Categorical(temp['Labels'])
    temp['Labels'] = temp['Labels'].astype('category').cat.codes
    temp_x = temp.drop(columns=['Labels'])
    temp_y = temp['Labels']
    temp_ds = tf.data.Dataset.from_tensor_slices((temp_x.values, temp_y.values)).shuffle(10000).batch(1000)
    model.fit(temp_ds, validation_data=valid_ds, epochs=10, steps_per_epoch=12, callbacks=[WandbCallback()])

## **Testing**
We're gonna download the testing data now from the Kaggle competition and unzip into the data directory.

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -f test.zip

In [None]:
!unzip ../data/kaggle-eeg/test.zip -d ../data/kaggle-eeg

Here we load the sample submission from the Kaggle competition. This gives us a pre-made dataframe and we just need to update column values with predictions from our model. 

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -f sample_submission.csv.zip

In [None]:
!unzip ../data/kaggle-eeg/sample_submission.csv.zip -d ../data/kaggle-eeg

In [None]:
sub = pd.read_csv('../data/kaggle-eeg/sample_submission.csv')

In [None]:
sub.head()

Here we create a dataframe in the same shape as the example submission on the competition page.

In [None]:
path = '../data/kaggle-eeg/test'

def get_merged_tests():
  tests = None
  for sj in range(1, 13):
    for sr in range(9, 11):
      c_tests = pd.read_csv(f'{path}/subj{sj}_series{sr}_data.csv', dtype=feature_types)
      tests = c_tests if tests is None else tests.append(c_tests, ignore_index=True)
  return tests

In [None]:
tests = get_merged_tests()

In [None]:
tests = tests.drop(columns=['id'])
tests.head()

In [None]:
model.load_weights('model-best.h5')

In [None]:
out = tests.loc[[0], :]  
out.head()

In [None]:
np.argmax(model.predict(out.to_dict()), axis=-1)

In [None]:
classes = ['HandStart', 'FirstDigitTouch', 'LiftOff', 'Replace', 'BothReleased', 'BothStartLoadPhase']
for id in range(tests.shape[0]):
    pred = classes[model.predict(tests.loc[id])[1]]
    nl = '\n'
    log = f"Current pred: {pred}. Rows left to predict: {len(tests.index) - id}...{nl}"
    print(log)
    for col in sub.keys():
      sub.at[id, col] = 1 if col == pred else 0

In [None]:
sub.to_csv('../data/kaggl-eeg/submission.csv', index=False)

In [None]:
!kaggle competitions submit grasp-and-lift-eeg-detection -f ../data/kaggle-eeg/submission.csv -m "Message"