<h1 align="center"> Eliptic Seizure Detection </h1>
    
### Helping Links
- [Main Dataset](https://repositori.upf.edu/handle/10230/42894?fbclid=IwAR2YStcqFsgFCvX1vLPOILH9kAq9F6ZFj0OogHwbuDaCSHnd_LVjgQpHzFA)
- [Processed Dataset](https://www.kaggle.com/datasets/harunshimanto/epileptic-seizure-recognition?select=Epileptic+Seizure+Recognition.csv)
- [Notebook using NN](https://www.kaggle.com/code/ashishshaji/lstm-for-epileptic-seizures-prediction)
- [Notebook using SVM](https://www.kaggle.com/code/yatindeshpande/seizure-prediction-using-svm)
- [Signal Processing info](https://ieeexplore.ieee.org/document/8412847?fbclid=IwAR0UVzEA7zDQWmOu3V9Mnf6iojQKpHHc6Uxdyc7HtrKTwJK0DE_K5u3YK3s)


### Dataset Description
The original dataset from the reference consists of 5 different folders, each with 100 files, with each file representing a single subject/person. Each file is a recording of brain activity for 23.6 seconds. The corresponding time-series is sampled into 4097 data points. Each data point is the value of the EEG recording at a different point in time. So we have total 500 individuals with each has 4097 data points for 23.5 seconds

We acquire publicly accessible EEG data from Bonn University, wherein the data include five sets(A,B,C,D and E). Each set consists of 100 single EEG segments with a sampling rate of 173.6 Hz. The EEG signals were filtered using a Bandpass filter and smoothing method. The first two sets (A,B) represent healthy people, whose signals were taken with open and closed eyes. The other three sets represent eliptic persons. Set (C, D) were treated as non-seizure because the signals are captured in duration without seizures. For seizure detection, set(E) was only treated as an eliptic seizure.



### Folder Description:
Main datasets has text files in total five different folders. Text files (.txt). Files: 
- For each set (A-E) there is a ZIP-file containing 100 TXT-files. 
- Each TXT-file consists of 4096 samples of one EEG time series in ASCII code. 
- SET A in file Z.zip containing Z000.txt - Z100.txt
- SET B in file O.zip containing O000.txt - O100.txt
- SET C in file N.zip containing N000.txt - N100.txt 
- SET D in file F.zip containing F000.txt - F100.txt 
- SET E in file S.zip containing S000.txt - S100.txt



### Prepocessing
We divided and shuffled every 4097 data points into 23 chunks, each chunk contains 178 data points for 1 second, and each data point is the value of the EEG recording at a different point in time. So now we have 23 x 500 = 11500 pieces of information(row), each information contains 178 data points for 1 second(column), the last column represents the label y {1,2,3,4,5}.

The response variable is y in column 179, the Explanatory variables X1, X2, …, X178

y contains the category of the 178-dimensional input vector. Specifically y in {1, 2, 3, 4, 5}:

5 - eyes open, means when they were recording the EEG signal of the brain the patient had their eyes open

4 - eyes closed, means when they were recording the EEG signal the patient had their eyes closed

3 - Yes they identify where the region of the tumor was in the brain and recording the EEG activity from the healthy brain area

2 - They recorder the EEG from the area where the tumor was located

1 - Recording of seizure activity

All subjects falling in classes 2, 3, 4, and 5 are subjects who did not have epileptic seizure. Only subjects in class 1 have epileptic seizure. Our motivation for creating this version of the data was to simplify access to the data via the creation of a .csv version of it. Although there are 5 classes most authors have done binary classification, namely class 1 (Epileptic seizure) against the rest.ocessing


## Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Loading Dataset and check

In [16]:
df =  pd.read_csv('mergedFiles/allFilesMerged.csv')
df.shape

(11500, 179)

In [21]:
df.sample(10)

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X170,X171,X172,X173,X174,X175,X176,X177,X178,y
8512,15.389,8.501,1.4843,-5.2451,-11.295,-16.325,-20.062,-22.326,-23.032,-22.198,...,7.9756,13.304,17.744,21.233,23.757,25.337,26.035,25.937,25.158,4
886,14.492,39.348,64.574,88.882,111.04,129.92,144.56,154.19,158.27,156.53,...,20.575,44.493,67.977,90.101,109.99,126.84,139.99,148.89,153.14,1
6761,26.485,25.476,25.231,25.898,27.598,30.421,34.416,39.593,45.909,53.274,...,-42.839,-45.502,-46.905,-46.664,-44.47,-40.185,-33.695,-25.275,-14.786,3
16,174.13,166.06,155.16,142.94,131.15,121.67,116.39,117.07,125.17,141.7,...,-332.59,-326.72,-318.08,-303.88,-281.67,-249.65,-206.77,-152.92,-88.944,1
10918,-31.156,-28.936,-25.792,-22.04,-18.015,-14.059,-10.493,-7.6055,-5.627,-4.7219,...,14.145,13.334,12.766,12.463,12.439,12.698,13.233,14.028,15.062,5
7706,-36.203,-40.874,-44.691,-47.668,-49.861,-51.346,-52.199,-52.476,-52.201,-51.357,...,36.484,34.886,32.477,29.516,26.273,23.001,19.921,17.201,14.946,4
9585,39.948,34.311,27.264,19.155,10.375,1.3299,-7.5824,-16.0,-23.616,-30.198,...,-9.2105,-5.4128,-1.2758,3.0948,7.5693,11.998,16.221,20.071,23.389,5
10448,-9.4306,-7.6591,-5.6549,-3.6368,-1.8289,-0.44121,0.35021,0.41819,-0.30419,-1.8181,...,11.783,2.1149,-7.1341,-15.197,-21.411,-25.277,-26.494,-24.99,-20.915,5
294,-196.11,-122.09,-42.804,38.937,120.08,197.48,268.08,329.03,377.82,412.43,...,412.55,474.46,519.98,546.66,552.8,537.55,500.96,444.01,368.58,1
3620,-13.71,-23.376,-33.106,-42.348,-50.582,-57.365,-62.358,-65.357,-66.299,-65.273,...,-33.094,-23.676,-13.978,-4.4692,4.4062,12.254,18.75,23.661,26.854,2


In [27]:
df.describe()

Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,...,X170,X171,X172,X173,X174,X175,X176,X177,X178,y
count,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,...,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0,11500.0
mean,-3.146926,-2.49809,-1.836539,-1.190587,-0.588348,-0.051205,0.403718,0.766497,1.033416,1.207329,...,-4.101813,-4.543863,-4.858121,-5.034152,-5.068708,-4.965043,-4.734094,-4.390273,-3.957996,3.0
std,126.286788,126.343958,126.728157,127.268572,127.789612,128.147543,128.300556,128.273433,128.167036,128.103829,...,125.790762,126.06886,126.078197,125.788959,125.267571,124.676354,124.238308,124.202558,124.787514,1.414275
min,-1018.8,-1140.4,-1247.7,-1336.7,-1405.3,-1452.1,-1476.5,-1478.5,-1458.8,-1418.7,...,-1145.7,-1094.4,-1010.1,-935.09,-863.44,-800.71,-867.57,-910.89,-963.19,1.0
25%,-31.4175,-31.0955,-31.20075,-31.10025,-30.89775,-31.137,-30.565,-30.6265,-30.483,-30.46725,...,-32.53475,-32.46825,-32.78225,-33.2895,-32.5805,-32.30375,-31.696,-32.00825,-31.819,2.0
50%,-0.544975,-0.547815,-0.346575,-0.215955,-0.058026,0.080368,0.25345,-0.015608,0.076126,-0.24332,...,-1.7799,-2.03785,-2.03195,-2.04485,-1.65085,-1.79195,-1.5322,-0.96027,-0.88599,3.0
75%,29.164,29.1835,29.32925,29.01225,28.96025,29.3255,29.78125,29.818,29.96375,29.94925,...,27.47425,27.34675,27.426,27.2375,26.733,27.10525,27.365,27.924,28.427,4.0
max,1662.9,1598.9,1505.1,1425.2,1379.3,1551.9,1727.5,1875.4,1989.5,2065.4,...,1613.3,1586.3,1525.8,1436.5,1433.8,1552.8,1637.2,1684.1,1692.3,5.0


### Checking Null Values

In [25]:
df.isnull().sum()

X1      0
X2      0
X3      0
X4      0
X5      0
       ..
X175    0
X176    0
X177    0
X178    0
y       0
Length: 179, dtype: int64