# Seizure Dataset with Deep Learning

### Imports
Imports to prepare for creating the Keras model.

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from keras.optimizers import Adagrad
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder

Using TensorFlow backend.


### Data Loading and Cleaning
Loading in the data. Since it's in CSV form, we read it into a DataFrame using Pandas.

In [2]:
dataset = pd.read_csv('seizure_data.csv')

Before we start performing any of the analysis on the dataset, we have to clean it up and standardize it. First, let's look at the basics of the dataset itself:

In [3]:
dataset.shape
dataset.head()

Unnamed: 0.1,Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,...,X170,X171,X172,X173,X174,X175,X176,X177,X178,y
0,X21.V1.791,135,190,229,223,192,125,55,-9,-33,...,-17,-15,-31,-77,-103,-127,-116,-83,-51,4
1,X15.V1.924,386,382,356,331,320,315,307,272,244,...,164,150,146,152,157,156,154,143,129,1
2,X8.V1.1,-32,-39,-47,-37,-32,-36,-57,-73,-85,...,57,64,48,19,-12,-30,-35,-35,-36,5
3,X16.V1.60,-105,-101,-96,-92,-89,-95,-102,-100,-87,...,-82,-81,-80,-77,-85,-77,-72,-69,-65,5
4,X20.V1.54,-9,-65,-98,-102,-78,-48,-16,0,-21,...,4,2,-12,-32,-41,-65,-83,-89,-73,5


From this basic look at the DataFrame, we can see that there are some columns we aren't interested in. 'Unnamed: 0' is non-numerical, and the y column consists of labels, which we do not want alongside our X features.

In [4]:
X = dataset.drop(columns=['Unnamed: 0', 'y'], axis=1)
X.head()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X169,X170,X171,X172,X173,X174,X175,X176,X177,X178
0,135,190,229,223,192,125,55,-9,-33,-38,...,8,-17,-15,-31,-77,-103,-127,-116,-83,-51
1,386,382,356,331,320,315,307,272,244,232,...,168,164,150,146,152,157,156,154,143,129
2,-32,-39,-47,-37,-32,-36,-57,-73,-85,-94,...,29,57,64,48,19,-12,-30,-35,-35,-36
3,-105,-101,-96,-92,-89,-95,-102,-100,-87,-79,...,-80,-82,-81,-80,-77,-85,-77,-72,-69,-65
4,-9,-65,-98,-102,-78,-48,-16,0,-21,-59,...,10,4,2,-12,-32,-41,-65,-83,-89,-73


We are left with 178 features over approximately ~11500 samples. We must still standardize the EEG values before we can use them in our neural network, however. We can do this with sklearn's StandardScaler.

In [5]:
scaler = StandardScaler()
X = scaler.fit_transform(X)
print(X)

[[ 0.88505134  1.20992878  1.46276429 ... -0.63414367 -0.43329036
  -0.23539922]
 [ 2.40057718  2.36619038  2.23944096 ...  1.02342937  0.95424076
   0.85653664]
 [-0.12328657 -0.16915405 -0.22513147 ... -0.13687176 -0.13859348
  -0.14440456]
 ...
 [ 0.1544592   0.10184476 -0.01720228 ...  0.0657205   0.07015014
   0.02545213]
 [-0.17159018 -0.08484331  0.00725997 ...  0.49546166  0.43852123
   0.40762968]
 [ 0.24502848  0.31262161  0.41088722 ...  0.0657205   0.08856869
   0.19530882]]


X is now scaled, and has also been converted to a numpy array in the process. We can now turn our attention to Y, which we must transform using one hot encoding for our multi-class problem.

In [6]:
Y = dataset['y']
Y.head()

0    4
1    1
2    5
3    5
4    5
Name: y, dtype: int64

Since Y constitutes the labels that are being used for each of the inputs, we simply take the last column from the original dataset we read in. This leaves us with a vector of labels to be used alongside our X.

Now we can covert the labels into categorical labels for our multi-class problem. This can be done with sklearn's LabelEncoder().

In [7]:
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
print(encoded_Y)

[3 0 4 ... 4 2 3]


Now we have a single vector of values corresponding to the classes we read in from our dataset. In our case, since everything was already represented as values from 1-5, the LabelEncoder shifted the values down to be from 0-4.

Next, we want to turn each of the individual class values, 0-4, into numpy arrays. These 'hot' vectors have a single 1 value at the index of the class. For example, the '3' class is now encoded as [0, 0, 0, 1, 0]. Keras has a built-in utility for doing this:

In [8]:
categorical_Y = np_utils.to_categorical(encoded_Y)
print(categorical_Y)

[[0. 0. 0. 1. 0.]
 [1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1.]
 ...
 [0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]]


### Model Definition
Now that both our X and Y variables are ready to be used in a deep learning model, we can define our baseline model for training and test:

In [9]:
def baseline_model():
    model = Sequential()
    model.add(Dense(40, input_dim=178, activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(40, activation='sigmoid'))
    model.add(Dropout(0.4))
    model.add(Dense(5, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=Adagrad(), metrics=['accuracy'])
    return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=50, batch_size=200, verbose=0)

In [9]:
def baseline_model():
    model = Sequential()
    model.add(Dense(1024, input_dim=178, activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(1024, activation='sigmoid'))
    model.add(Dropout(0.4))
    model.add(Dense(5, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=Adagrad(), metrics=['accuracy'])
    return model

In [12]:
def baseline_model():
    model = Sequential()
    model.add(Dense(1400, input_dim=178, activation='relu'))
    model.add(Dropout(0.25))
    model.add(Dense(1400, activation='sigmoid'))
    model.add(Dropout(0.4))
    model.add(Dense(5, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=Adagrad(), metrics=['accuracy'])
    return model
estimator = KerasClassifier(build_fn=baseline_model, epochs=50, batch_size=200, verbose=0)

Next, we define the classifier itself:

In [13]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=50, batch_size=200, verbose=0)

After some testing, around 50 epochs with a batch size of 64 led to the best accuracy results.

(for notebook testing, a larger batch size was used to accelerate processing)

## Testing and Accuracy
Now, to actually test our model, we use KFold cross validation to evaluate the accuracy of the model itself. KFold cross validation is a way of testing the model by dividing it into smaller subsets. After this is done, the model trains on every subset except for one, and that one is used as a test set. The accuracy is just the average of the accuracies over all the subsets.

In [14]:
kfold = KFold(n_splits=5, shuffle=True, random_state=3)

Now to the accuracy. We obtain our results using the cross_val_score provided through Keras:

In [None]:
results = cross_val_score(estimator, X, categorical_Y, cv=kfold)
print('Accuracy: {}'.format(results.mean()))

As we can see, our model achieves around 89% accuracy, which isn't bad for determining a patient's epileptic state after viewing a sequence of EEG values.