# Brain Solver Python Training Notebook

This notebook utilizes the custom `brain_solver` package for analyzing brain activity data. Our data sources include official datasets from Kaggle competitions and additional datasets for enhanced model training and evaluation.

This is the Training notebook.

## Data Sources

### Official:

- **HMS - Harmful Brain Activity Classification**
  - **Source:** [Kaggle Competition](https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification)
  - **Description:** This competition focuses on classifying harmful brain activity. It includes a comprehensive dataset for training and testing models.

- **Brain-Spectrograms**
  - **Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/cdeotte/brain-spectrograms)
  - **Description:** The `specs.npy` file contains all the spectrograms from the HMS competition, offering a detailed view of brain activity through visual representations.

### Additional:

- **Brain-EEG-Spectrograms**
  - **Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/cdeotte/brain-eeg-spectrograms)
  - **Description:** The `EEG_Spectrograms` folder includes one NumPy file per EEG ID, with each array shaped as (128x256x4), representing (frequency, time, montage chain). This dataset provides a more nuanced understanding of brain activity through EEG spectrograms.

- **hms_efficientnetb0_pt_ckpts**
  - **Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/crackle/hms-efficientnetb0-pt-ckpts)
  - **Description:** This dataset offers pre-trained checkpoints for EfficientNetB0 models, tailored for the HMS competition. It's intended for use in fine-tuning models on the specific task of harmful brain activity classification.


In [None]:
# !pip install d2l --no-index --find-links=file:///kaggle/input/d2l-package/d2l/
# !pip install /kaggle/input/brain-solver/brain_solver-0.9.0-py3-none-any.whl

In [None]:
import os, sys
import gc
import numpy as np
import pandas as pd
import torch
from torch.utils.data import DataLoader
import pytorch_lightning as pl
from brain_solver import (
    Helpers as hp,
    Trainer as tr,
    BrainModel as br,
    EEGDataset,
    Network,
)
from brain_solver import Wav2Vec2 as w2v
from brain_solver import Filters, FilterType
from transformers.utils import logging
from tqdm import tqdm

# Suppress warnings if desired
import warnings

warnings.filterwarnings("ignore")
logging.set_verbosity(logging.CRITICAL)

# Setup for CUDA device selection
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

# Config Class Summary

The `Config` class manages configurations for a brain activity classification project. It includes:

- **Data and Model Paths**: Centralizes paths for data (e.g., EEG, spectrograms) and model checkpoints.
- **Training Parameters**: Configures training details like epochs, batch size, and learning rate.
- **Feature Flags**: Toggles for using wavelets, spectrograms, and reading options.

Designed for easy adjustments to facilitate model development and experimentation.


In [None]:
from brain_solver import Config

full_path = "/home/osloup/NoodleNappers/data/"  # Luppo
# full_path = "C:/Users/tygof/Documents/Semester 8/MLiP/NoodleNappers/data/" # Tygo
# full_path = "C:/Users/dahbl/Documents/TrueDocs/Uni/Year 4/Semester 2/Machine Learning in Practice/brain/data/" # Dick
config = Config(
    full_path,
    full_path + "out/",
    USE_EEG_SPECTROGRAMS=True,
    USE_KAGGLE_SPECTROGRAMS=True,
    should_read_brain_spectograms=False,
    should_read_eeg_spectrogram_files=False,
    USE_PRETRAINED_MODEL=True,
    FINE_TUNE=True,
)

# Kaggle Pull
# full_path = "/kaggle/input/"
# config = Config(full_path, "/kaggle/working/", USE_EEG_SPECTROGRAMS=True, USE_KAGGLE_SPECTROGRAMS=True, should_read_brain_spectograms=False, should_read_eeg_spectrogram_files=False, USE_PRETRAINED_MODEL=False, FINE_TUNE=False)

import sys

sys.path.append(full_path + "kaggle-kl-div")
from kaggle_kl_div import score

In [None]:
# Create Output folder if does not exist
if not os.path.exists(config.output_path):
    os.makedirs(config.output_path)

# Initialize random environment
pl.seed_everything(config.seed, workers=True)

print(config.data_train_csv)

In [None]:
train_df: pd.DataFrame = hp.load_csv(config.data_train_csv)

if train_df is None:
    print("Failed to load the CSV file.")
    exit()
else:
    EEG_IDS = train_df.eeg_id.unique()
    TARGETS = train_df.columns[-6:]
    TARS = {"Seizure": 0, "LPD": 1, "GPD": 2, "LRDA": 3, "GRDA": 4, "Other": 5}
    TARS_INV = {x: y for y, x in TARS.items()}
    print("Train shape:", train_df.shape)

In [None]:
train_data_preprocessed = hp.preprocess_eeg_data(train_df, TARGETS)

In [None]:
train_data_preprocessed.head()

In [None]:
read_path_npy = config.data_w2v_specs

files_npy = os.listdir(read_path_npy)
print(f"There are {len(files_npy)} processed spectrogram npys")

In [None]:
spectrograms = hp.read_spectrograms(
    config.data_spectograms,
    config.path_to_brain_spectrograms_npy,
    config.should_read_brain_spectograms,
)

In [None]:
data_eeg_spectrograms = hp.read_eeg_spectrograms(
    train_data_preprocessed,
    config.path_to_eeg_spectrograms_folder,
    config.path_to_eeg_spectrograms_npy,
    config.should_read_eeg_spectrogram_files,
)

In [None]:
# specs_wav = w2v.wav2vec2(spectrograms)
# dataset2 = EEGDataset(
#     train_data_preprocessed, specs_wav, data_eeg_spectograms, TARGETS
# )
# dataloader2 = DataLoader(dataset2, batch_size=32, shuffle=False)
# hp.plot_spectrograms(
#     dataloader2, train_data_preprocessed, ROWS=2, COLS=3, BATCHES=2
# )

In [None]:
print(
    f"Length of spectrograms: {spectrograms.__len__()}, Length of all EEGs: {data_eeg_spectrograms.__len__()}"
)

In [None]:
dataset = EEGDataset(
    train_data_preprocessed, spectrograms, data_eeg_spectrograms, TARGETS
)
dataloader = DataLoader(dataset, batch_size=32, shuffle=False)

In [None]:
hp.plot_spectrograms(dataloader, train_data_preprocessed, ROWS=2, COLS=3, BATCHES=2)

In [None]:
del dataset, dataloader
gc.collect()

In [None]:
all_oof, all_true, valid_loaders = br.cross_validate_eeg(
    config,
    device,
    train_data_preprocessed=train_data_preprocessed,
    spectrograms=spectrograms,
    data_eeg_spectograms=data_eeg_spectrograms,
    TARGETS=TARGETS,
    n_splits=5,
    batch_size_train=32,
    batch_size_valid=64,
    num_workers=3,
)

In [None]:
all_oof, all_true = br.validate_model_across_folds(
    config, device, all_oof, all_true, valid_loaders
)

In [None]:
oof = pd.DataFrame(all_oof.copy())
oof["id"] = np.arange(len(oof))

true = pd.DataFrame(all_true.copy())
true["id"] = np.arange(len(true))

# Calculate the score
cv = score(solution=true, submission=oof, row_id_column_name="id")
print("CV Score KL-Div for EfficientNetB2 =", cv)

In [None]:
del data_eeg_spectrograms, spectrograms
gc.collect()