## ML Papers with databases

CBH-MIT (https://physionet.org/content/chbmit/1.0.0):
- Ben Messaoud & Chavéz, "Random Forest classifier for EEG-based seizure prediction" (arXiv 2021). Need to apply random forest, no code available. Easy feasibility
- Usman et al., "Epileptic seizure prediction using scalp EEG" (2021). No code available, but libraries and implementation specified in the paper, uses SVM and Random Forest, classical feature extraction. Easy to medium feasibility.

CRCNS (https://crcns.org/data-sets/hc/hc-27 ?):
- Agarwal et al., "Spatially Distributed Local Fields in the Hippocampus" (2014, PLoS Biology). Not completely sure about which dataset from CRCNS its using. Results are rather too theoretical to be precisely assessed. Implementation not available, code not available. Very hard feasibility.


Kaggle dataset from the American Society for seizure prediction (https://www.kaggle.com/competitions/seizure-prediction/data?select=Patient_2):
- Ben Messaoud & Chavéz, "Random Forest classifier for EEG-based seizure prediction" (arXiv 2021). Need to apply random forest, no code available. Easy feasibility
- Also code available specifically for this challenge in the Kaggle


## Dataset 1: Numa

In [1]:
import mat73
import numpy as np
import pickle

# Load LFP data from DanCause Laboratory
data = mat73.loadmat('E:\\Dancause_yellow_hd\\export\\lfp_data_20180612Y.mat')

streams = data["lfp_data2"]["streams"]
# This is the clock frequency of the data from the Laboratory, you may change it to the sampling frequency of the LFP
basefs = 4.8828125*10**3

# This was used to calculate the ratio of clock and sampling frequency, it is 1 if the clock is the same as the sampling rate 
for y in streams:
    if float(streams[y]['fs']) != basefs:
        streams[y]['ratio'] = float(basefs/streams[y]['fs'])
    else:
        streams[y]['ratio'] = 1

meta = dict()
meta['ratio'] = streams['LFP1']['ratio']
meta['fs'] = streams['LFP1']['fs']

# Save both LFP as two separate files
lfp1_data = streams['LFP1']['data']
lfp2_data  = streams['LFP2']['data']


In [2]:
import neuroclean as nc

ncp = nc.NeuroClean(random_state=42)

ncp.preprocess(lfp1_data, basefs)
lfp1_data = ncp.data
ncp.preprocess(lfp2_data, basefs)
lfp2_data = ncp.data

Bandpassing: 128it [00:19,  6.61it/s] | 0/4 [00:00<?, ?it/s]
NeuroClean processing:  25%|██▌       | 1/4 [00:19<00:58, 19.38s/it]

Power of components removed by DSS: 0.04


NeuroClean processing:  75%|███████▌  | 3/4 [08:00<02:32, 152.40s/it]

...............................................................................................................................

NeuroClean processing:  75%|███████▌  | 3/4 [1:42:17<34:05, 2045.93s/it]


TypeError: DBSCAN.fit() missing 1 required positional argument: 'X'

## Dataset 2: Kaggle Competition

In [None]:
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('D:\\important_documents\\KaggleSeizure'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
#from keras import backend as K

import random
import numpy as np
import pandas as pd

import scipy.io
from scipy.signal import spectrogram
import matplotlib.pyplot as plt

In [None]:
interictal_tst = 'D:/important_documents/KaggleSeizure/Patient_1/Patient_1/Patient_1_interictal_segment_0001.mat'
preictal_tst = 'D:/important_documents/KaggleSeizure/Patient_1/Patient_1/Patient_1_preictal_segment_0001.mat'
interictal_data = scipy.io.loadmat(interictal_tst)
preictal_data = scipy.io.loadmat(preictal_tst)

In [None]:
interictal_array = interictal_data['interictal_segment_1'][0][0][0]
preictal_array = preictal_data['preictal_segment_1'][0][0][0]

In [None]:

ncp = nc.NeuroClean(random_state=42)

ncp.preprocess(interictal_array)
interictal_array = ncp.data
ncp.preprocess(preictal_array)
preictal_array = ncp.data

In [None]:
l = list(range(10000))
for i in l[::5000]:
    print('Interictal')
    i_secs = interictal_array[0][i:i+5000]
    i_f, i_t, i_Sxx = spectrogram(i_secs, fs=5000, return_onesided=False)
    i_SS = np.log1p(i_Sxx)
    plt.imshow(i_SS[:] / np.max(i_SS), cmap='gray')
    plt.show()
    print('Preictal')
    p_secs = preictal_array[0][i:i+5000]
    p_f, p_t, p_Sxx = spectrogram(p_secs, fs=5000, return_onesided=False)
    p_SS = np.log1p(p_Sxx)
    plt.imshow(p_SS[:] / np.max(p_SS), cmap='gray')
    plt.show()

In [None]:
all_X = []
all_Y = []

types = ['Patient_1_interictal_segment', 'Patient_1_preictal_segment']

for i,typ in enumerate(types):
    # Looking at 18 files for each event for a balanced dataset
    for j in range(18):
        fl = '/kaggle/input/seizure-prediction/Patient_1/Patient_1/{}_{}.mat'.format(typ, str(j + 1).zfill(4))
        data = scipy.io.loadmat(fl)
        k = typ.replace('Patient_1_', '') + '_'
        d_array = data[k + str(j + 1)][0][0][0]
        lst = list(range(3000000))  # 10 minutes
        for m in lst[::5000]:
            # Create a spectrogram every 1 second
            p_secs = d_array[0][m:m+5000]
            p_f, p_t, p_Sxx = spectrogram(p_secs, fs=5000, return_onesided=False)
            p_SS = np.log1p(p_Sxx)
            arr = p_SS[:] / np.max(p_SS)
            all_X.append(arr)
            all_Y.append(i)

In [None]:
# Shuffling the data
dataset = list(zip(all_X, all_Y))
random.shuffle(dataset)
all_X,all_Y = zip(*dataset)
print(len(all_X))

In [None]:
# Splitting data into train/test, leaving only 600 samples for testing
x_train = np.array(all_X[:21000])
y_train = np.array(all_Y[:21000])
x_test = np.array(all_X[21000:])
y_test = np.array(all_Y[21000:])

In [None]:
batch_size = 128
num_classes = 2
epochs = 30
img_rows, img_cols = 256, 22

In [None]:
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

In [None]:
y_train = tf.keras.utils.to_categorical(y_train, num_classes) 
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense, Input, GlobalAveragePooling2D, Multiply, Reshape, Permute
from tensorflow.keras import Model

# Attention mechanism block (Self-attention)
def attention_block(inputs):
    channels = inputs.shape[-1]
    attention = GlobalAveragePooling2D()(inputs)
    attention = Dense(channels // 8, activation='relu')(attention)
    attention = Dense(channels, activation='sigmoid')(attention)
    attention = Reshape((1, 1, channels))(attention)
    attention = Multiply()([inputs, attention])
    return attention

# Modify input_shape to match your data shape (with 1 channel)
input_shape = (256, 22, 1)

inputs = Input(shape=input_shape)
x = Conv2D(32, kernel_size=(5, 5), activation='relu')(inputs)
x = Conv2D(32, kernel_size=(3, 3), activation='relu')(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Dropout(0.25)(x)

# Apply attention mechanism after convolutional layers
x = attention_block(x)

# Flatten, Dense, Dropout, and Output layers (same as before)
x = Flatten()(x)
x = Dense(32, activation='relu')(x)
x = Dropout(0.5)(x)
outputs = Dense(2, activation='sigmoid')(x)

model = Model(inputs, outputs)

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])




In [None]:
# Train the model
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])