# EEG Classification task
 In this notebook we will see how to classify Electroencephalography (EEG) or brain waves by using Deep Learning algorithms. The data set consists of EEG signals of different subjects while sleeping. Particularly, we have to identify 6 different stages of sleepings from these brain waves.


## Data processing

First of all we need to load the data set. For this project we have the possibility to work with two different formats of the same data. [Here](https://drive.google.com/drive/folders/13zpCyXK8pNuuF5Q62N2AbUvu2FuM-0tC?usp=sharing) you can have access to both.

In [0]:
import numpy as np

In [None]:
import pickle

file_spectogram = open('Data_Spectrograms.pkl', 'rb')
data_spectrogram = pickle.load(file_spectogram)
file_raw_signals = open('Data_Raw_signals.pkl', 'rb')
data_raw_signals = pickle.load(file_raw_signals)

After uploading the data. Let's have a look a little bit closer at these data sets. For both, The data-set consists of EEG sequences of 3000-time steps each and coming from two electrode locations on the head (Fpz-Cz and Pz-Oz) sampled at 100 Hz. That means that each sample con- tains two signals of 3000 samples and that those samples correspond to 30 seconds of recording. More in details:



*   **Data_Raw_signals.pkl** contains the sequences and the corresponding labels as two array [sequences, labels].
*   **Data_Spectrograms.pkl** contains the spectrograms of the sequences and the correspond- ing labels as two array [spectrograms, labels].


As you can see, we uploaded the data as lists and in each of those we have two arrays. We expected this because of the dataset description.

In [5]:
print(type(data_spectrogram))
print(len(data_spectrogram))
print(type(data_raw_signals))
print(len(data_raw_signals))

<class 'list'>
2
<class 'list'>
2


Now, we seperate the sequences and spectograms from the respective labels

In [0]:
sequences = data_raw_signals[0]
sequences_labels = data_raw_signals[1]
spectograms = data_spectrogram[0]
spectograms_labels = data_spectrogram[1]

In this way we have a much cleaner idea of our data:



*   For both types of data we have 15375 examples, it means that around 15k people took the experiment
*   In case of the raw signals, the shape is composed of 2 singals of just 3000 time steps
*   In the case of the spectogram the signals are processed and so for each of the two signals the EEG are sampled at 100 Hz for 30 seconds

From now on, it is up to us which type of data we want to work with: we can choose either one of them or both. 

In [7]:
print(sequences.shape)
print(sequences_labels.shape)
print(spectograms.shape)
print(spectograms_labels.shape)

(15375, 2, 3000)
(15375,)
(15375, 2, 100, 30)
(15375,)


According to our task we have to classify six different sleeping stages

In [8]:
print(len(np.unique(spectograms_labels)))
print(len(np.unique(sequences_labels)))

6
6


Let's cahange the shape of the sequences and the spectograms. In this way is easier to work with them

In [9]:
spectograms = spectograms.reshape(spectograms.shape[0], 100, 30, 2)
print(spectograms.shape)

(15375, 100, 30, 2)


Let's save our processed data

In [0]:
## SAVE SPECTOGRAM PROCESSED DATA
np.save('spectograms.npy', spectograms)
np.save("spectograms_labels.npy",spectograms_labels)

In [0]:
## SAVE SEQUENCES PROCESSED DATA
np.save('sequences.npy', sequences)
np.save("sequences_labels.npy",sequences_labels)

## Load data

Now you can avoid doing this data processing every time. You just need to upload the processed data

In [0]:
import numpy as np
import pickle

In [10]:
spectograms = np.load("spectograms.npy", allow_pickle = True)
spectograms_labels = np.load("spectograms_labels.npy", allow_pickle = True)
spectograms.shape,spectograms_labels.shape

((15375, 100, 30, 2), (15375,))

In [11]:
sequences = np.load("sequences.npy", allow_pickle = True)
sequences_labels = np.load("sequences_labels.npy", allow_pickle = True)
sequences.shape,sequences_labels.shape

((15375, 3000, 2), (15375,))

## CNN 1D

The first model that we are going to try is a CNN1D with the sequences data which have a shape of (3000,2)

In [12]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

import keras
from keras.utils import to_categorical
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Conv1D, MaxPooling1D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU

Using TensorFlow backend.


In [0]:
sequences_labels = to_categorical(sequences_labels)

In [0]:
X_train, X_test, y_train, y_test = train_test_split(sequences, sequences_labels, test_size=0.2)

In [15]:
X_train.shape

(12300, 3000, 2)

Let's set up the batch size, number of epochs and number of classes. As you can see, here we use 1 epoch but is is like this just for the sake of the example. Because we will use earlystopping, I suggest use to put 100 epochos

In [0]:
batch_size = 64
epochs = 1 #
num_classes = 6

In [0]:
fashion_model = Sequential()

fashion_model.add(Conv1D(64, kernel_size=3,activation='relu',padding='same',input_shape=(3000,2)))
fashion_model.add(BatchNormalization())
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling1D(pool_size=2,padding='same'))

fashion_model.add(Dropout(0.2))
fashion_model.add(Conv1D(128, kernel_size=3, activation='relu',padding='same'))
fashion_model.add(BatchNormalization())
fashion_model.add(LeakyReLU(alpha=0.1))

fashion_model.add(Flatten())
fashion_model.add(Dense(64, activation='relu'))
fashion_model.add(LeakyReLU(alpha=0.1))           
fashion_model.add(Dropout(0.2))
fashion_model.add(Dense(num_classes, activation='softmax'))

In [0]:
fashion_model.compile(loss=keras.losses.categorical_crossentropy, 
                      optimizer=keras.optimizers.Adam(),
                      metrics=['accuracy'])

In [23]:
from tensorflow.keras.callbacks import EarlyStopping
es = EarlyStopping(monitor = "val_loss", patience = 10, mode = "min", verbose = 1)
train = fashion_model.fit(X_train, y_train, batch_size=batch_size,epochs=epochs,verbose=1,callbacks = [es],
                                  validation_data=(X_test, y_test))

Train on 12300 samples, validate on 3075 samples
Epoch 1/1


When I ran this model with 100 epochs I was able to reach around 60% in validation

## CNN 1D + LSTM

Because LSTM is supposed to work very well with sequential data, here we will try to combine it with a CNN

In [0]:
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.layers import Embedding
from keras.layers import LSTM
from keras.layers import GRU

from keras.layers import LSTM, Dense, Dropout, BatchNormalization
from keras.layers import Conv1D, MaxPooling1D, Activation, Flatten

model = Sequential()

# add 1-layer cnn
model.add(Conv1D(10, kernel_size=3, padding='same', input_shape=(3000,2)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(MaxPooling1D(padding='same'))

# add 1-layer lstm
model.add(LSTM(10, return_sequences=True, stateful=False))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(6, activation='softmax'))

model.compile(optimizer=keras.optimizers.Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [25]:
from tensorflow.keras.callbacks import EarlyStopping
es = EarlyStopping(monitor = "val_loss", patience = 5, mode = "min", verbose = 1)
train = model.fit(X_train, y_train, batch_size=100,epochs=1,verbose=1,callbacks = [es],
                                  validation_data=(X_test, y_test))

Train on 12300 samples, validate on 3075 samples
Epoch 1/1


Also in this case when I ran it with 100 epochs, I reached around 50% accuracy in validation. It is definitely worst than the previous CNN 1D. Unfortunately, although there are guidelines about which model to choose and how tune it, there is not a clear answer and so the only solution is to continue trying different algorithms. 

## 2D CNN

This model tourned out to be the best among all the algortihms I tried with these data. The model you will see here is the result of different days of tuning it: change kernel size, change units, add or remove layers.

In [26]:
# back to the original shape
spectograms = spectograms.reshape(spectograms.shape[0], 2, 100, 30)
print(spectograms.shape)

(15375, 2, 100, 30)


In [0]:
from sklearn.model_selection import train_test_split

import keras
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.advanced_activations import LeakyReLU
from tensorflow.keras.optimizers import Adam

from sklearn import preprocessing

from keras.utils import to_categorical

In [28]:
labels = to_categorical(spectograms_labels)
labels

array([[0., 1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.],
       ...,
       [0., 0., 0., 0., 0., 1.],
       [0., 1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0.]], dtype=float32)

In [0]:
X_train, X_test, y_train, y_test = train_test_split(spectograms, labels, test_size=0.2)

In [30]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(12300, 2, 100, 30)
(3075, 2, 100, 30)
(12300, 6)
(3075, 6)


In [0]:
batch_size = 64
epochs = 100
num_classes = 6

In [0]:
def build_model():

  fashion_model = Sequential()
  fashion_model.add(Conv2D(64, kernel_size=(4,4),activation='linear',input_shape=(2,100,30),padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))
  fashion_model.add(MaxPooling2D((2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(64, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(64, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(64, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(128, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(128, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(128, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))                  
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(128, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))                  
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(128, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))                  
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Conv2D(256, (4,4), activation='linear',padding='same'))
  fashion_model.add(LeakyReLU(alpha=0.1))                  
  fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
  fashion_model.add(Dropout(0.1))

  fashion_model.add(Flatten())
  fashion_model.add(Dense(128, activation='linear'))
  fashion_model.add(LeakyReLU(alpha=0.1))

  fashion_model.add(Dense(num_classes, activation='softmax'))
  fashion_model.compile(optimizer=keras.optimizers.Adam(decay = 1e-15), loss='categorical_crossentropy', metrics=['accuracy'])
  
  return fashion_model

In [0]:
fashion_model = build_model()

In [35]:
from tensorflow.keras.callbacks import EarlyStopping
es = EarlyStopping(monitor = "val_loss", patience = 10, mode = "min", verbose = 1)
fashion_train = fashion_model.fit(X_train, y_train, batch_size=batch_size,epochs=epochs,verbose=1,callbacks=[es],
                                  validation_data=(X_test, y_test))

Train on 12300 samples, validate on 3075 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 00025: early stopping


This time I ran it until early-stopping blocked it. Clearly, here we reached a very high accuracy in validation in comparison with the others models. I couldn't find anything better than this, however I know that with these data, 5% more accuracy in validation data is possible. I will leave this task to you.