PharmaHacks 2026 — Challenge 3: AD vs CN EEG Classification

This project implements a leakage-safe machine learning pipeline to distinguish between Alzheimer’s Disease (AD) and Healthy Control (CN) subjects using resting-state EEG data.

🧠 Project Overview

The pipeline processes 19-channel EEG recordings to extract clinically relevant spectral and connectivity biomarkers. Unlike standard deep learning approaches that may overfit on small clinical samples, this project utilizes a biomarker-rich feature engineering strategy combined with regularized linear models to achieve high interpretability and robust generalization.

🛠 Technical Approach

Data Preprocessing

Signal Specs: 19-channel resting-state EEG (International 10-20 system).
Filtering: 0.5–45 Hz bandpass (4th order Butterworth).
Resampling: Downsampled from 500 Hz to 128 Hz for computational efficiency.
Artifact Rejection: Automatic rejection of windows with amplitudes exceeding ±500 µV or flatline signals.
Segmentation: 30-second sliding windows with a 15-second (50%) overlap.

Feature Engineering (240 Features)

The pipeline extracts features across five frequency bands (Delta, Theta, Alpha, Beta, Gamma) and five brain regions (Frontal, Temporal, Central, Parietal, Occipital):

Relative Band Power (RBP): Regional and global distribution of spectral energy.
Connectivity: Spectral Coherence (SCC) between all regional pairs (e.g., Frontal-Occipital coherence).
Signal Complexity: Hjorth Parameters (Activity, Mobility, and Complexity) and Spectral Entropy.
Clinical Ratios: Specialized biomarkers including the Slowing Index and Theta/Alpha ratios.

🔬 Validation Strategy

To prevent Data Leakage, this pipeline enforces a strict Subject-Wise Validation protocol:

StratifiedGroupKFold: Subject IDs are used as groups to ensure that windows from the same participant never appear in both training and validation folds simultaneously.
Subject-Level Inference: Probabilities from individual 30s windows are aggregated at the subject level to provide a single diagnostic prediction per patient.
Threshold Tuning: The decision threshold is optimized on the validation set to maximize the Balanced Accuracy, addressing the class imbalance between AD and CN samples.

📈 Performance & Findings

Best Model

The top-performing architecture was a Regularized Logistic Regression (ElasticNet) using the top 60 features selected via ANOVA F-value ranking.

Metric	Cross-Validation Score (Dev)
Balanced Accuracy	0.8215
F1 Score	0.8182
ROC-AUC	0.8154
Optimized Threshold	0.51

Key Findings

Feature-based vs. Deep Learning: The regularized XGBoost and Logistic Regression baselines consistently outperformed end-to-end models like EEGNet. On this clinical dataset, handcrafted spectral features proved more data-efficient than raw signal learning.
Spectral Slowing: Feature importance analysis confirmed "Spectral Slowing" as a primary biomarker; AD subjects exhibited significantly higher Global Slowing Indices and reduced Occipital Alpha power compared to controls.
Connectivity Deficits: Reduced coherence in the Alpha band between temporal and parietal regions was a high-ranking predictor for AD classification.

📂 Project Structure

pharmahacks.ipynb: The complete end-to-end pipeline (Preprocessing -> Feature Extraction -> Training -> Evaluation).
dev_cv_results_full.csv: Detailed metric breakdown for all tested models.
best_dev_model_full.pkl: The frozen, trained pipeline for inference.
presentation_figures/: Automatically generated visualizations including ROC curves, confusion matrices, and biomarker boxplots.

🚀 Usage

The notebook is designed for a Google Colab environment with data hosted on Google Drive.

Place .npy EEG files in /Train/AD/ and /Train/CN/.
Update the BASE_DIR in the config cell.
Run the pipeline to regenerate features or perform inference on the /testing/ folder.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
PharmaHacks2026_EEG_Presentation.pptx		PharmaHacks2026_EEG_Presentation.pptx
Pharmahacks2026_v1.ipynb		Pharmahacks2026_v1.ipynb
README.md		README.md
eeg_alzheimer_presentation.pptx		eeg_alzheimer_presentation.pptx
pharmahacks.ipynb		pharmahacks.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PharmaHacks 2026 — Challenge 3: AD vs CN EEG Classification

🧠 Project Overview

🛠 Technical Approach

Data Preprocessing

Feature Engineering (240 Features)

🔬 Validation Strategy

📈 Performance & Findings

Best Model

Key Findings

📂 Project Structure

🚀 Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PharmaHacks 2026 — Challenge 3: AD vs CN EEG Classification

🧠 Project Overview

🛠 Technical Approach

Data Preprocessing

Feature Engineering (240 Features)

🔬 Validation Strategy

📈 Performance & Findings

Best Model

Key Findings

📂 Project Structure

🚀 Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages