# Brain Solver Python Inference Notebook

This notebook utilizes the custom `brain_solver` package for analyzing brain activity data. Our data sources include official datasets from Kaggle competitions and additional datasets for enhanced model training and evaluation.

This is the Inference notebook.

**Authors: Luppo Sloup, Dick Blankvoort, Tygo Francissen (MLiP Group 9)**

## Data Sources

### Official:

- **HMS - Harmful Brain Activity Classification**
  - **Source:** [Kaggle Competition](https://www.kaggle.com/competitions/hms-harmful-brain-activity-classification)
  - **Description:** This competition focuses on classifying harmful brain activity. It includes a comprehensive dataset for training and testing models.

- **Brain-Spectrograms**
  - **Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/cdeotte/brain-spectrograms)
  - **Description:** The `specs.npy` file contains all the spectrograms from the HMS competition, offering a detailed view of brain activity through visual representations.

### Additional:

- **Brain-EEG-Spectrograms**
  - **Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/cdeotte/brain-eeg-spectrograms)
  - **Description:** The `EEG_Spectrograms` folder includes one NumPy file per EEG ID, with each array shaped as (128x256x4), representing (frequency, time, montage chain). This dataset provides a more nuanced understanding of brain activity through EEG spectrograms. They were created based on the raw data.

- **hms_efficientnetb0_pt_ckpts**
  - **Source:** [Kaggle Dataset](https://www.kaggle.com/datasets/crackle/hms-efficientnetb0-pt-ckpts)
  - **Description:** This dataset offers pre-trained checkpoints for EfficientNetB0 models, tailored for the HMS competition. It's intended for use in fine-tuning models on the specific task of harmful brain activity classification.

### Overview:

In addition to the data sources above, the following inputs are needed for this notebook:

<img src="images/overview_inference.png" alt="overview2" width="250"/>

In [None]:
# These commands install the packages that are required for the notebook, should only be used when running the notebook on Kaggle
!pip install d2l --no-index --find-links=file:///kaggle/input/d2l-package/d2l/
!pip install /kaggle/input/brain-solver/brain_solver-1.0.0-py3-none-any.whl

In [None]:
# Imports for the notebook
import os, sys, gc, torch, warnings
import numpy as np, pandas as pd, pytorch_lightning as pl
from torch.utils.data import DataLoader
from transformers.utils import logging
from brain_solver import (
    Helpers as hp,
    EEGDataset,
    Config,
    Trainer as tr,
)

# Suppress warnings
warnings.filterwarnings("ignore")
logging.set_verbosity(logging.CRITICAL)

# Setup for CUDA device selection
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

## Config Class Summary

The `Config` class manages configurations for the brain activity classification project. It includes:

- **Data and Model Paths**: Centralizes paths for data (e.g., EEG, spectrograms) and model checkpoints.
- **Training Parameters**: Configures training details like epochs, batch size, and learning rate.
- **Feature Flags**: Toggles for using model settings, wavelets, spectrograms, and reading options.

We designed this class for easy adjustments to facilitate model development and experimentation.

In [1]:
# Possibility to set a local path for the data
full_path = ""
config = Config(
    full_path,
    full_path + "out/",
    USE_EEG_SPECTROGRAMS=True,
    USE_KAGGLE_SPECTROGRAMS=True,
    should_read_brain_spectograms=False,
    should_read_eeg_spectrogram_files=False,
    USE_PRETRAINED_MODEL=False,
    FINE_TUNE=False,
)

# Path to set for Kaggle
full_path = "/kaggle/input/"
config = Config(
    full_path,
    "/kaggle/working/",
    USE_EEG_SPECTROGRAMS=True,
    USE_KAGGLE_SPECTROGRAMS=True,
    should_read_brain_spectograms=False,
    should_read_eeg_spectrogram_files=False,
    USE_PRETRAINED_MODEL=False,
    FINE_TUNE=False,
)

# Load scoring function
sys.path.append(full_path + "kaggle-kl-div")
from kaggle_kl_div import score

In [None]:
# Create output folder if it does not exist
if not os.path.exists(config.output_path):
    os.makedirs(config.output_path)

# Initialize random environment
pl.seed_everything(config.seed, workers=True)

In [None]:
# Read the train CSV file
train_df: pd.DataFrame = hp.load_csv(config.data_train_csv)

if train_df is None:
    print("Failed to load the CSV file.")
    exit()
else:
    EEG_IDS = train_df.eeg_id.unique()
    TARGETS = train_df.columns[-6:]
    TARS = {"Seizure": 0, "LPD": 1, "GPD": 2, "LRDA": 3, "GRDA": 4, "Other": 5}
    TARS_INV = {x: y for y, x in TARS.items()}
    print("Train shape:", train_df.shape)

In [None]:
# Read the test CSV file
test_df = pd.read_csv(config.data_test_csv)
print("Test shape", test_df.shape)
test_df.head()

## Unprocessed

In [None]:
# Read the Kaggle spectrograms
spectrograms2 = hp.read_spectrograms(
    path=config.data_spectograms_test,
    data_path_train_on_brain_spectograms_dataset_specs=None,
    read_files=True,
)

# Continue with renaming for DataLoader
test_df = test_df.rename({"spectrogram_id": "spec_id"}, axis=1)

In [None]:
# Read the EEG spectrograms
DISPLAY = 1
EEG_IDS2 = test_df.eeg_id.unique()
all_eegs2 = {}

print("Converting Test EEG to Spectrograms...")
print()
for i, eeg_id in enumerate(EEG_IDS2):
    # Create spectogram from EEG parquet file
    img = hp.spectrogram_from_eeg(f"{config.data_eeg_test}{eeg_id}.parquet", i < DISPLAY, config.use_wavelet)
    all_eegs2[eeg_id] = img

In [None]:
# Infer efficientNet on test data
preds = []
test_ds = EEGDataset(test_df, specs=spectrograms2, eeg_specs=all_eegs2, targets=TARGETS, mode="test")
test_loader = DataLoader(test_ds, shuffle=False, batch_size=64, num_workers=3)

for i in range(5):
    print("#" * 25)
    print(f"### Testing Fold {i+1}")

    ckpt_file = (
        f"EffNet_version{config.VER}_fold{i+1}.pth"
        if config.trained_model_path is None or config.FINE_TUNE
        else f"{config.trained_model_path}/EffNet_v{config.VER}_f{i}.ckpt"
    )

    if config.trained_model_path is None or config.FINE_TUNE:
        model = torch.load(config.full_path + "trained-model-effnet-mlip9/" + ckpt_file)
    else:
        model = tr.load_from_checkpoint(
        ckpt_file, weight_file=config.trained_weight_file,  use_kaggle_spectrograms=config.USE_KAGGLE_SPECTROGRAMS, use_eeg_spectrograms=config.USE_EEG_SPECTROGRAMS
    )
    model = model.to(device).eval()
    fold_preds = []

    with torch.inference_mode():
        for test_batch in test_loader:
            test_batch = test_batch.to(device)
            pred = torch.softmax(model(test_batch), dim=1).cpu().numpy()
            fold_preds.append(pred)

            # Delete variables not needed to free up memory
            del test_batch, pred
            gc.collect()  # Manually collect garbage

            if device.type == "cuda":  # Optionally clear CUDA cache if using GPU
                torch.cuda.empty_cache()

        fold_preds = np.concatenate(fold_preds)

    preds.append(fold_preds)

    del model
    gc.collect()
    if device.type == "cuda":
        torch.cuda.empty_cache()

# Print a prediction and the shape of the predictions
pred = np.mean(preds, axis=0)
print()
print("Test preds shape", pred.shape)
pred[0][2]

In [None]:
# Create a submission file
sub = pd.DataFrame({"eeg_id": test_df.eeg_id.values})
sub[TARGETS] = pred
sub.to_csv("submission.csv", index=False)
print("Submissionn shape", sub.shape)
sub.head()

In [None]:
# Sanity check to confirm that the sum of the predictions is 1
sub.iloc[:, -6:].sum(axis=1)