# **EEG Model with Keras and Wandb**
This is a test project I am using to learn Keras for structured data. I am using data from a past Kaggle competition to train a model that can detect certain events from EEG brainwave data. The events would then trigger certain gestures in a prosthetic device for example, using BCI technology. My goal is to get perfect/near perfect predictions on the testing data. You can get more info on the contest/dataset [here](https://www.kaggle.com/c/grasp-and-lift-eeg-detection/)

## **Install The Libraries**
First we install install all necessary Python libraries with pip.

In [None]:
%pip install scikit-learn
%pip install --upgrade keras
%pip install --upgrade tensorflow[and-cuda]
%pip install --upgrade pandas
%pip install --upgrade numpy
%pip install wandb
%pip install kaggle

## **Kaggle Environment Setup**
You will need to upload your *kaggle.json*, set the permissions so the file can be read.

In [None]:
!chmod 600 ../kaggle.json

Then we set the Kaggle configuration directory to our current working directory, as an environment variable.

In [1]:
import os
os.environ['KAGGLE_CONFIG_DIR'] = '../'

Now we can download the data from the competition page, 

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -p ../data/kaggle-eeg/ -f train.zip

and unzip it into the data directory.

In [None]:
!unzip ../data/kaggle-eeg/train.zip -d ../data/kaggle-eeg

## **Data Analysis**
First let's import all the libraries we need.

In [2]:
import numpy as np
import pandas as pd

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


First we load some of the training data and check the first few rows.

In [5]:
data_path = '../data/kaggle-eeg/train'
features = pd.read_csv(f'{data_path}/subj1_series1_data.csv')
labels = pd.read_csv(f'{data_path}/subj1_series1_events.csv')
features.head()


Unnamed: 0,id,Fp1,Fp2,F7,F3,Fz,F4,F8,FC5,FC1,...,P7,P3,Pz,P4,P8,PO9,O1,Oz,O2,PO10
0,subj1_series1_0,-31,363,211,121,211,15,717,279,35,...,536,348,383,105,607,289,459,173,120,704
1,subj1_series1_1,-29,342,216,123,222,200,595,329,43,...,529,327,369,78,613,248,409,141,83,737
2,subj1_series1_2,-172,278,105,93,222,511,471,280,12,...,511,319,355,66,606,320,440,141,62,677
3,subj1_series1_3,-272,263,-52,99,208,511,428,261,27,...,521,336,356,71,568,339,437,139,58,592
4,subj1_series1_4,-265,213,-67,99,155,380,476,353,32,...,550,324,346,76,547,343,446,171,67,581


In [6]:
labels.head()

Unnamed: 0,id,HandStart,FirstDigitTouch,BothStartLoadPhase,LiftOff,Replace,BothReleased
0,subj1_series1_0,0,0,0,0,0,0
1,subj1_series1_1,0,0,0,0,0,0
2,subj1_series1_2,0,0,0,0,0,0
3,subj1_series1_3,0,0,0,0,0,0
4,subj1_series1_4,0,0,0,0,0,0


## **Training**

### **Wandb Logging**
First we're going to login to Wandb with our api key so that we can log the training. 

In [7]:
!wandb login d754544ba90d0be7ea7009afb39a9225330e6be9

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/rdugue/.netrc


Initialize Wandb and specify a project name to keep track of metrics

In [8]:
import wandb
from wandb.keras import WandbCallback

wandb.init(project="kaggle-eeg-tf", config={"hyper": "parameter"})

2024-02-23 11:14:02.089289: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-23 11:14:02.093541: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-23 11:14:02.610050: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-23 11:14:03.688638: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Failed to detect the name of this notebook, you can s

### **TF Data Loading**
Here we convert our training and validation data frames into tensor flow datasets. 

In [12]:
import tensorflow as tf

from tensorflow.keras.layers import Dense, Dropout, InputLayer, Normalization
from tensorflow.keras.models import Sequential
from tensorflow.keras.regularizers import l2

In [41]:
train_files = [f'{data_path}/{file}' for file in os.listdir(data_path)]
feature_files, label_files = [], []

for i in range(len(train_files)):
  if i % 2 == 0:
    feature_files.append(train_files[i])
  else:
    label_files.append(train_files[i])

def train_data_generator(feature_files, label_files, batch_size=1000):
  for i in range(len(feature_files)):
    with open(feature_files[i], 'r') as f1, open(label_files[i], 'r') as f2:
      feature_data = pd.read_csv(f1, encoding='utf8', chunksize=batch_size)
      label_data = pd.read_csv(f2, encoding='utf8', chunksize=batch_size)
      for feature_chunk, label_chunk in zip(feature_data, label_data):
        feature_chunk = feature_chunk.drop(['id'], axis=1)
        label_chunk = label_chunk.drop(['id'], axis=1)
        yield feature_chunk, label_chunk

ds = tf.data.Dataset.from_generator(
    train_data_generator,
    args=[feature_files, label_files],
    output_signature=(
        tf.TensorSpec(shape=(None, 32), dtype=tf.int8),
        tf.TensorSpec(shape=(None, 6), dtype=tf.int8)
    )
)

#ds = ds.padded_batch(1000, padded_shapes=([None, 32], [None, 6]))


ds.shuffle(17900000)
count = 17983756 * 0.8
batch_size = 719350
train_ds = ds.take(int(count * 0.8)).batch(batch_size)
valid_ds = ds.skip(int(count * 0.8))

### **Model**

Now we can create our Keras model for training.

In [42]:
model = Sequential(
  [
    InputLayer(input_shape=(features.shape[1],)),
    Normalization(),
    Dense(128, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(64, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(32, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(16, activation='relu', kernel_regularizer=l2(0.01)),
    Dropout(0.5),
    Dense(6, activation='linear', kernel_regularizer=l2(0.01))
  ]
)

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  0.003,
  decay_steps= count / batch_size * 1000,
  decay_rate=1,
  staircase=False
)

model.compile(optimizer=tf.keras.optimizers.Adam(lr_schedule),
              loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'],
              run_eagerly=True)

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 normalization_1 (Normaliza  (None, 33)                67        
 tion)                                                           
                                                                 
 dense_5 (Dense)             (None, 128)               4352      
                                                                 
 dropout_4 (Dropout)         (None, 128)               0         
                                                                 
 dense_6 (Dense)             (None, 64)                8256      
                                                                 
 dropout_5 (Dropout)         (None, 64)                0         
                                                                 
 dense_7 (Dense)             (None, 32)                2080      
                                                      

### **Training Loop**

In [None]:
model.fit(train_ds, validation_data=valid_ds, epochs=10, steps_per_epoch=20, callbacks=[WandbCallback()])

## **Testing**
We're gonna download the testing data now from the Kaggle competition and unzip into the data directory.

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -f test.zip

In [None]:
!unzip ../data/kaggle-eeg/test.zip -d ../data/kaggle-eeg

Here we load the sample submission from the Kaggle competition. This gives us a pre-made dataframe and we just need to update column values with predictions from our model. 

In [None]:
!kaggle competitions download grasp-and-lift-eeg-detection -f sample_submission.csv.zip

In [None]:
!unzip ../data/kaggle-eeg/sample_submission.csv.zip -d ../data/kaggle-eeg

In [None]:
sub = pd.read_csv('../data/kaggle-eeg/sample_submission.csv')

In [None]:
sub.head()

Here we create a dataframe in the same shape as the example submission on the competition page.

In [None]:
path = '../data/kaggle-eeg/test'

def get_merged_tests():
  tests = None
  for sj in range(1, 13):
    for sr in range(9, 11):
      c_tests = pd.read_csv(f'{path}/subj{sj}_series{sr}_data.csv', dtype=feature_types)
      tests = c_tests if tests is None else tests.append(c_tests, ignore_index=True)
  return tests

In [None]:
tests = get_merged_tests()

In [None]:
tests = tests.drop(columns=['id'])
tests.head()

In [None]:
model.load_weights('model-best.h5')

In [None]:
out = tests.loc[[0], :]  
out.head()

In [None]:
np.argmax(model.predict(out.to_dict()), axis=-1)

In [None]:
classes = ['HandStart', 'FirstDigitTouch', 'LiftOff', 'Replace', 'BothReleased', 'BothStartLoadPhase', 'None']
for id in range(tests.shape[0]):
    pred = classes[model.predict(tests.loc[id])[1]]
    nl = '\n'
    log = f"Current pred: {pred}. Rows left to predict: {len(tests.index) - id}...{nl}"
    print(log)
    for col in sub.keys():
      sub.at[id, col] = 1 if col == pred else 0

In [None]:
sub.to_csv('../data/kaggl-eeg/submission.csv', index=False)

In [None]:
!kaggle competitions submit grasp-and-lift-eeg-detection -f ../data/kaggle-eeg/submission.csv -m "Message"