Welcome to the Noodle Nappers project repository, dedicated to the 2024 Machine Learning in Practice (MLiP) course at Radboud University. This repository focuses solely on the Kaggle competition aimed at detecting and classifying seizures and other types of harmful brain activity using electroencephalography (EEG) signals. Our work here aims to contribute to the advancement of neurocritical care, epilepsy treatment, and drug development through improved EEG pattern classification accuracy.
Authors: Luppo Sloup, Dick Blankvoort, Tygo Francissen (MLiP Group 9)
The goal of this competition is to develop a model trained on EEG signals recorded from critically ill hospital patients. By accurately detecting and classifying seizures and other harmful brain activity, this project can aid doctors and brain researchers in providing faster and more accurate treatments, potentially unlocking transformative benefits for neurocritical care, epilepsy management, and drug development.
Our work includes multiple notebooks, files, and data sets, which are explained in more detail below. This also includes an explanation of this Github repository's structure.
In this Github repository, four notebooks are present that were the main notebooks in this project. They are self-explanatory and will be shortly mentioned in the project structure section below.
However, in our project, many more notebooks were created. We list the most important ones from our accounts on Kaggle below.
We utilized these notebooks to modify the EfficientNet B0 starter for training and inference:
Furthermore, we modified the HMS ensemble of EfficientNet and ResNet to fit our ensemble model with these notebooks:
As we added more models to the ensemble, we needed to modify and search through more notebooks:
- IIACT Ensamble Features Head Starter
- Wavenet Starter
- WaveNet Training
- WaveNet Inference
- Catboost Starter
Finally, this notebook we created combines all models to create an ensemble and the submission:
Note that this notebook has not been made public in order to keep our best-working code private, but it is available in this repository as ensemble-7-models.ipynb.
To be able to store augmented data, packages, and models, we created several data sets on Kaggle:
- Brain Solver: This data set contains the whole Python package with all the code being pushed into this using GitHub workflows. There are almost 125 versions available of this package.
- D2L Package: This data set contains the necessary files to properly import the d2l package, as it is not available in Kaggle using offline mode.
- Trained Model EfficientNet: This data set contains all versions of trained models for our modified EfficientNet notebook.
- Wav2vec/Filter EEGs and Spectograms: This data set contains the raw EEG data and spectograms after being processed with wav2vec, filters, or a combination.
- Models Wav2Vec/Filter Training: This data set contains a wide range of models that were stored after being trained with the
Wav2vec/Filter EEGs and Spectogramsdata set mentioned above. - Catboost Model: This data set stores the trained models for CatBoost.
- Dilated WaveNet: This data set stores the trained models for WaveNet.
This is the core structure of our Github repository:
NoodleBrainActivityClassification/
├── brain_solver/
│ ├── brain_model.py
│ ├── config.py
│ ├── eeg_dataset.py
│ ├── filters.py
│ ├── helpers.py
│ ├── network.py
│ ├── trainer.py
│ └── wav2vec2.py
├── data/
├── images/
├── dataset-metadata.json
├── ensemble-7-models.ipynb
├── inference.ipynb
├── preprocessing.ipynb
├── pyproject.toml
├── README.md
├── requirements.txt
├── setup.py
└── training.ipynbBelow is a small explanation for the usage of the most important files:
- brain_solver/brain_model.py: Model definition for the trained EfficientNet, including training functions and other model-specific operations.
- brain_solver/config.py: Configuration file to save variables and paths used throughout the project.
- brain_solver/eeg_dataset.py: DataLoader compatibility layer, providing a dataset class that enables efficient data handling and preprocessing for PyTorch models.
- brain_solver/filters.py: Implementation of preprocessing filters (low-pass, high-pass, band-pass, and band-stop) used for data preprocessing before feeding it into the model.
- brain_solver/helpers.py: A collection of miscellaneous functions that serve as a utility library for various tasks throughout the project.
- brain_solver/network.py: Network class that stores the structure of the model.
- brain_solver/trainer.py: Trainer class that encapsulates the training logic, used to manage the training process of models.
- brain_solver/wav2vec2.py: Class designed for data preprocessing, leveraging the Wav2Vec 2.0 model to process and transform data before model training or inference.
- data/: Input data files for the project.
- ensemble-7-models.ipynb: Notebook for ensembling 7 modes into one final submission.
- inference.ipynb: Notebook for conducting inference using our trained EfficientNet model, part of our ensemble approach in the competition.
- preprocessing.ipynb: Notebook that can be run locally to preprocess and store raw EEG data and spectograms.
- setup.py: Standard setup script to manage project dependencies and environment setup.
- training.ipynb: Notebook dedicated to the training process of our EfficientNet model, which is later utilized within our ensemble for the competition.
In order to reproduce our results and run our code, you should use Python 3.10 and install the requirements as specified with pip:
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txtWe tried to stick to these rules during our project:
- "Protected" master branch: Direct commits to the master branch are not recommended. The branch is not locked, but do not do it.
- Branching: Always create a new branch for your changes. Name your branch based on the feature or fix you're working on, preferably linking to an issue number. E.g.,
feature-12-add-new-filterorfix-15-resolve-this-really-annoying-bug. - Commit Messages:
- Commits should be categorized using prefixes like
feat:,fix:,chore:,docs:,style:,refactor:,perf:, andtest:. - Use meaningful commit messages that clearly describe the change.
- Commits should be categorized using prefixes like
- Pull Requests (PRs):
- PRs should have descriptive titles and should explain the purpose and content of the changes.
- Each PR must have at least one review before merging.
- After reviews and any necessary adjustments, the PR can be merged into the master branch.