# **EEG Model with Keras and Wandb**
This is a test project I am using to learn Keras for structured data. I am using data from a past Kaggle competition to train a model that can detect certain events from EEG brainwave data. The events would then trigger certain gestures in a prosthetic device for example, using BCI technology. My goal is to get perfect/near perfect predictions on the testing data. You can get more info on the contest/dataset [here](https://www.kaggle.com/c/grasp-and-lift-eeg-detection/)

## **Install The Libraries**
First we install install all necessary Python libraries with pip.

In [None]:
%pip install scikit-learn
%pip install --upgrade keras
%pip install --upgrade tensorflow[and-cuda]
%pip install --upgrade pandas
%pip install --upgrade numpy
%pip install wandb
%pip install kaggle

## **Kaggle Environment Setup**
You will need to upload your *kaggle.json*, set the permissions so the file can be read.

In [None]:
!chmod 600 ../kaggle.json

Then we set the Kaggle configuration directory to our current working directory, as an environment variable.

In [None]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = '../'

Now we can download the data from the competition page, 

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -p ../data/kaggle-eeg/ -f train.zip

and unzip it into the data directory.

In [None]:
!unzip ../data/kaggle-eeg/train.zip -d ../data/kaggle-eeg

## **Data Analysis**
First let's import all the libraries we need.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import wandb

from wandb.keras import WandbCallback
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.models import Sequential
from tensorflow.keras.regularizers import l2

First we load some of the training data and check the first few rows.

In [None]:
data_path = '../data/kaggle-eeg/train'
features = pd.read_csv(f'{data_path}/subj1_series1_data.csv')
labels = pd.read_csv(f'{data_path}/subj1_series1_events.csv')
features = features.drop(columns=['id'])
labels = labels.drop(columns=['id'])
features.head()


In [None]:
labels.head()

## **Training**

### **Wandb Logging**
First we're going to login to Wandb with our api key so that we can log the training. 

In [None]:
!wandb login d754544ba90d0be7ea7009afb39a9225330e6be9

Initialize Wandb and specify a project name to keep track of metrics

In [None]:
wandb.init(
    project="kaggle-eeg-tf", 
    config={
        "hyper": "parameter",
        "epochs": 17983756,
        "batch_size": 719350,
        "loss_function": "categorical_crossentropy",
        "architecture": "CNN",
        "dataset": "kaggle-eeg"
    }
)

### **TF Data Loading**
Here we convert our training and validation data frames into tensor flow datasets. 

In [None]:
train_files = [f'{data_path}/{file}' for file in os.listdir(data_path)]
feature_files, label_files = [], []

for i in range(len(train_files)):
  if i % 2 == 0:
    feature_files.append(train_files[i])
  else:
    label_files.append(train_files[i])

batch_size = 719350

def train_data_generator(feature_files, label_files, batch_size=1000):
  for i in range(len(feature_files)):
    with open(feature_files[i], 'r') as f1, open(label_files[i], 'r') as f2:
      feature_data = pd.read_csv(f1, encoding='utf8', chunksize=batch_size)
      label_data = pd.read_csv(f2, encoding='utf8', chunksize=batch_size)
      for feature_chunk, label_chunk in zip(feature_data, label_data):
        feature_chunk = feature_chunk.drop(['id'], axis=1)
        label_chunk = label_chunk.drop(['id'], axis=1)
        yield feature_chunk, label_chunk

count = 17983756 * 0.8


ds = tf.data.Dataset.from_generator(
    train_data_generator,
    args=[feature_files, label_files],
    output_signature=(
        tf.TensorSpec(shape=(None, 32), dtype=tf.int16),
        tf.TensorSpec(shape=(None, 6), dtype=tf.int8)
    )
)

ds.shuffle(17900000).padded_batch(batch_size, padded_shapes=([None, 32], [None, 6]))
train_ds = ds.take(int(count * 0.8))
valid_ds = ds.skip(int(count * 0.8))
print(train_ds.element_spec)

### **Model**

Now we can create our Keras model for training.

In [None]:
model = Sequential(
  [
    BatchNormalization(input_shape=(features.shape[1],)),
    Dense(128, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(32, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(16, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(6)
  ], name='kaggle-eeg'
)

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  0.0001,
  decay_steps= count / batch_size * 1000,
  decay_rate=1,
  staircase=False
)

model.compile(optimizer=tf.keras.optimizers.Adam(lr_schedule),
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'],
              run_eagerly=True)

model.summary()

In [None]:
model.fit(train_ds, validation_data=valid_ds, epochs=17983756, steps_per_epoch=20, callbacks=[WandbCallback()])

In [None]:
model.evaluate(valid_ds)

### **Training Loop**

## **Testing**
We're gonna download the testing data now from the Kaggle competition and unzip into the data directory.

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -f test.zip

In [None]:
!unzip ../data/kaggle-eeg/test.zip -d ../data/kaggle-eeg

Here we load the sample submission from the Kaggle competition. This gives us a pre-made dataframe and we just need to update column values with predictions from our model. 

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -f sample_submission.csv.zip

In [None]:
!unzip ../data/kaggle-eeg/sample_submission.csv.zip -d ../data/kaggle-eeg

In [None]:
sub = pd.read_csv('../data/kaggle-eeg/sample_submission.csv')

In [None]:
sub.head()

Here we create a dataframe in the same shape as the example submission on the competition page.

In [None]:
path = '../data/kaggle-eeg/test'

def get_merged_tests():
  tests = None
  for sj in range(1, 13):
    for sr in range(9, 11):
      c_tests = pd.read_csv(f'{path}/subj{sj}_series{sr}_data.csv')
      tests = c_tests if tests is None else tests.append(c_tests, ignore_index=True)
  return tests

In [None]:
tests = get_merged_tests()

In [None]:
tests = tests.drop(columns=['id'])
tests.head()

In [None]:
model.load_weights('model-best.h5')

In [None]:
out = tests.loc[[0], :]  
out.head()

In [None]:
classes = ['HandStart', 'FirstDigitTouch', 'LiftOff', 'Replace', 'BothReleased', 'BothStartLoadPhase']
for id in range(tests.shape[0]):
    pred = model.predict(tests.loc[[id], :])
    tests.loc[[id], classes] = pred

In [None]:
sub.to_csv('../data/kaggl-eeg/submission.csv', index=False)

In [None]:
!kaggle competitions submit grasp-and-lift-eeg-detection -f ../data/kaggle-eeg/submission.csv -m "Message"