Skip to content

DikaVer/bsc-thesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model-Based Clustering of Multivariate EMA Time-Series Data

Bachelor's thesis project exploring clustering approaches for multivariate Ecological Momentary Assessment (EMA) time-series data.

Overview

This project implements a pipeline for analyzing EMA time-series data through feature extraction, dimensionality reduction, and clustering. The goal is to identify distinct behavioral patterns in longitudinal self-report data.

Pipeline

  1. Preprocessing — Data cleaning, transformation, and structured/unstructured representations of EMA time series
  2. Feature Extraction — Coefficient extraction using multiple regression models (Linear, Polynomial, Lasso, Ridge, XGBoost, Random Forest, SVR, MLP, VAR)
  3. Dimensionality Reduction — PCA and PLSR with cross-validation
  4. Clustering — KMeans, KMedoids, Agglomerative Clustering, Gaussian Mixture Models, and DTW-based time-series clustering
  5. Evaluation — Silhouette Score, Calinski-Harabasz Index, Davies-Bouldin Index, BIC/AIC

Repository Structure

├── Clustering.ipynb       # Main analysis notebook (feature extraction, reduction, clustering)
├── Preprocessing.ipynb    # Data preprocessing pipeline
├── clustering.py          # Reusable clustering utility functions
└── Data/
    ├── Data/              # Raw data (codebook + original datasets)
    ├── preprocessed/      # Cleaned and transformed datasets
    ├── coefficients/      # Extracted regression coefficients
    └── cluster/           # Clustering results

Tech Stack

  • Python 3
  • scikit-learn / scikit-learn-extra
  • tslearn (DTW clustering)
  • pandas / NumPy
  • matplotlib

Usage

Open the Jupyter notebooks in order:

  1. Preprocessing.ipynb — Run preprocessing steps
  2. Clustering.ipynb — Run the full analysis pipeline (feature extraction → clustering)

The clustering.py module provides helper functions used by the notebooks.

License

This project is part of an academic thesis. Please cite appropriately if you use this work.

About

Bachelor's thesis: Model-based clustering of multivariate EMA time-series data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors