# The Plan

**1. Dataset Acquisition & Understanding**

- **Download the CHBM-MIT EEG dataset:** Go to [https://physionet.org/content/chbmit/](https://physionet.org/content/chbmit/) and follow the instructions to download the dataset.
- **Familiarize yourself with the data:**
  - Open the downloaded files in a text editor or use data exploration tools in Python like pandas.
  - Identify the format of the data (e.g., EDF, CSV).
  - Understand the channels, sampling rate, and meaning of each column in the data.
  - Explore the provided annotation files and seizure labels.
- **Choose environment:**
  - **Macbook Air M1:** Install Python (version 3.7 or above) and the necessary libraries (see **Step 4**) using tools like `pip` or `homebrew`.
  - **Google Colab:** No installation needed, access Colab notebooks through [https://research.google.com/colaboratory/](https://research.google.com/colaboratory/) and import the libraries directly in the notebook.

**2. Preprocessing & Feature Extraction**

- **Import libraries:**

```python
import mne
import numpy as np
import matplotlib.pyplot as plt
```

- **Load data:** Use mne functions like `mne.io.read_raw_edf` to load the EEG data.

```python
raw = mne.io.read_raw_edf('chb01_01.edf', preload=False)  # Replace with your filename
```

- **Cleaning and Filtering:**
  - Apply basic filtering (e.g., notch filter to remove power line noise) using `raw.filter()`.
  - Perform visual inspection (e.g., plotting the data) to identify and remove artifacts like muscle movement or equipment noise.
  - Learn about advanced cleaning techniques like Independent Component Analysis (ICA) for advanced noise reduction.
- **Resampling:**
  - If needed, resample the data to a consistent sampling rate using `raw.resample()`
- **Segmentation:**
  - Use event markers or annotations to segment the data into relevant epochs (e.g., ictal and interictal periods) using `mne.Epochs`.
  - Consider different epoch lengths based on your chosen features and seizure type.
- **Feature Extraction:**
  - Implement functions to calculate desired time-domain features (e.g., mean, variance, amplitude) and frequency-domain features (e.g., power spectral density using FFT) using libraries like NumPy or scikit-learn.
  - Explore libraries like NeuroKit2 for specific EEG feature extraction functionalities.
  - Consider advanced features like connectivity metrics (coherence, phase lag) using MNE-Python for later source localization.

**3. Binary Classification**

- **Data splitting:** Use scikit-learn's functions (e.g., `train_test_split`) to split your labeled data (epileptic vs. non-epileptic) into training (60%), validation (20%), and testing (20%) sets.
- **Model selection:**
  - **Start simple:**
    - **Logistic Regression:**
      ```python
      from sklearn.linear_model import LogisticRegression
      model = LogisticRegression(solver='lbfgs')
      model.fit(X_train, y_train)  # X_train and y_train are your training data and labels
      ```
    - **Support Vector Machine (SVM):**
      ```python
      from sklearn.svm import SVC
      model = SVC(kernel='linear')
      model.fit(X_train, y_train)
      ```
  - **Explore neural networks if needed:**
    - **Convolutional Neural Networks (CNNs):** Learn about building and training CNNs for time-series data using PyTorch tutorials and examples.
    - **Long Short-Term Memory (LSTM) networks:** Explore LSTMs for capturing temporal dependencies in EEG data if applicable.
- **Training:** Use PyTorch's functionalities to train your chosen model on the training data. Set up a training loop with loss function (e.g., binary cross-entropy), optimizer (e.g., Adam), and training epochs.

```python
# Example training loop for Logistic Regression
import torch
from torch import nn

# ... define your model and data loaders

criterion = nn.BCELoss()  # Binary cross-entropy loss
optimizer = torch.optim.Adam(model.parameters())

for epoch in range(num_epochs):
    # Train the model
    # ...
```

# Dataset

## Description of CHBM-MIT EEG Dataset
### Dataset: Scalp EEG Recordings from Children with Intractable Seizures

This dataset contains electroencephalography (EEG) recordings from **22 subjects** with intractable seizures. The data is grouped into **23 cases**, with some subjects contributing multiple recordings.

**Key Points:**

* **Subjects:** 22 (5 males, 17 females; ages 1.5-22)
* **Cases:** 23 (chb01 to chb23)
* **Sampling Frequency:** 256 Hz
* **Recordings per Subject:** 9-42 (each lasting 1-4 hours)
* **Seizures:** 198 total (182 in original set)
* **File Types:**
    * `.edf`: Raw EEG data files (664 total)
    * `.seizure`: Annotations for seizure start and end times (for files containing seizures)

**Additional Notes:**

* Case `chb21` is from the same subject as `chb01`, but recorded 1.5 years later.
* Case `chb24` is not included in the `SUBJECT-INFO` file.

In [79]:
from glob import glob
import os
import mne
import numpy as np
import matplotlib.pyplot as plt

In [81]:
chb01_01 = glob('/Users/aaryaashokk/Documents/Coding/Projects/DataSets/chb01_01.edf')
chb01_03 = glob('/Users/aaryaashokk/Documents/Coding/Projects/DataSets/chb01_03.edf')
print(f"print: {chb01_01[0], chb01_03[0]}")

print: ('/Users/aaryaashokk/Documents/Coding/Projects/DataSets/chb01_01.edf', '/Users/aaryaashokk/Documents/Coding/Projects/DataSets/chb01_03.edf')


In [82]:
def read_edf(file):
    data = mne.io.read_raw_edf(file, preload=True) #? Read the edf file
    data.set_eeg_reference() #? Set the EEG reference which is the average of all the channels
    data.filter(l_freq=0.5, h_freq=45)  #? Filter the data to remove noise
    epoch = mne.make_fixed_length_epochs(data, duration=5, overlap=2) #? Used to create epochs of the data with a fixed length of 5 seconds and an overlap of 2 seconds
    data = epoch.get_data() #? Get the data from the epochs, convert it to a numpy array
    return data

In [83]:
%%capture
data_chb01_01 = read_edf(chb01_01[0])
data_chb01_03 = read_edf(chb01_03[0])

print(data_chb01_01.shape)
print(data_chb01_03.shape)

In [85]:
print(data_chb01_01.shape)
print(data_chb01_03.shape)

(1199, 23, 1280)
(1199, 23, 1280)


In [118]:
chb01_03s = glob('/Users/aaryaashokk/Documents/Coding/Projects/DataSets/chb01_15.seizures.edf')
print(chb01_03s[0])
temp = mne.read_annotations(chb01_03s[0])
print(temp)

/Users/aaryaashokk/Documents/Coding/Projects/DataSets/chb01_15.seizures.edf
<Annotations | 0 segments>
