Model-Based Clustering of Multivariate EMA Time-Series Data

Bachelor's thesis project exploring clustering approaches for multivariate Ecological Momentary Assessment (EMA) time-series data.

Overview

This project implements a pipeline for analyzing EMA time-series data through feature extraction, dimensionality reduction, and clustering. The goal is to identify distinct behavioral patterns in longitudinal self-report data.

Pipeline

Preprocessing — Data cleaning, transformation, and structured/unstructured representations of EMA time series
Feature Extraction — Coefficient extraction using multiple regression models (Linear, Polynomial, Lasso, Ridge, XGBoost, Random Forest, SVR, MLP, VAR)
Dimensionality Reduction — PCA and PLSR with cross-validation
Clustering — KMeans, KMedoids, Agglomerative Clustering, Gaussian Mixture Models, and DTW-based time-series clustering
Evaluation — Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index, BIC/AIC

Repository Structure

├── Clustering.ipynb       # Main analysis notebook (feature extraction, reduction, clustering)
├── Preprocessing.ipynb    # Data preprocessing pipeline
├── clustering.py          # Reusable clustering utility functions
└── Data/
    ├── Data/              # Raw data (codebook + original datasets)
    ├── preprocessed/      # Cleaned and transformed datasets
    ├── coefficients/      # Extracted regression coefficients
    └── cluster/           # Clustering results

Tech Stack

Python 3
scikit-learn / scikit-learn-extra
tslearn (DTW clustering)
pandas / NumPy
matplotlib

Usage

Open the Jupyter notebooks in order:

Preprocessing.ipynb — Run preprocessing steps
Clustering.ipynb — Run the full analysis pipeline (feature extraction → clustering)

The clustering.py module provides helper functions used by the notebooks.

License

This project is part of an academic thesis. Please cite appropriately if you use this work.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Data		Data
.gitignore		.gitignore
Clustering.ipynb		Clustering.ipynb
Preprocessing.ipynb		Preprocessing.ipynb
README.md		README.md
clustering.py		clustering.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model-Based Clustering of Multivariate EMA Time-Series Data

Overview

Pipeline

Repository Structure

Tech Stack

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Model-Based Clustering of Multivariate EMA Time-Series Data

Overview

Pipeline

Repository Structure

Tech Stack

Usage

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages