Here you can find the code related to the paper "MMM - Clustering Multivariate Longitudinal Mixed-type Data".
This work introduces the Mixture of Mixed-Matrices (MMM) model for clustering multivariate longitudinal data with mixed variable types (continuous, ordinal, binary, nominal, count). The model assumes underlying latent continuous variables and uses matrix-variate normal distributions to handle temporal dependencies without conditional independence assumptions.
Preprint: https://hal.science/hal-04807626v1 https://arxiv.org/abs/2509.12166v1
├── README.md
├── Real_data/ # S&P 500 real data analysis
│ ├── Data.zip # Compressed raw data - UNZIP BEFORE USE
│ ├── Fitting & Analysis/ # Model fitting and results analysis scripts
│ ├── Images/ # Generated plots and visualizations
│ └── Results/ # Fitted model outputs (.RData)
├── renv/ # renv package library (auto-managed)
├── renv.lock # Lockfile with exact package versions
├── SETUP.md # Detailed environment setup instructions
├── Simulations/ # Simulation studies
│ ├── Data/ # Synthetic datasets and generation scripts
│ ├── Results/ # Simulation outputs and performance metrics
└── Software_tools/ # Core algorithm implementations
├── EM_mixed_par.R # Main MMM algorithm (MCMC-EM)
├── EM_MMN.R # EM algorithm for Matrix-variate Normal mixtures
└── PerfEval.R # Performance evaluation functions
This project uses renv to ensure reproducible package dependencies. To set up the exact environment:
# 1. Install renv (if not already installed)
install.packages("renv")
# 2. Restore all dependencies
renv::restore()For detailed setup instructions, troubleshooting, and system requirements, see SETUP.md.
Real_data/ folder is compressed. You must unzip Data.zip before running the analysis scripts.
The main MMM algorithm is implemented in Software_tools/EM_mixed_par.R. This file contains the MCMC-EM algorithm for fitting the Mixture of Mixed-Matrices model.
- LogReturns: Continuous (yearly log-returns)
- Grades: Ordinal (Underperform/Neutral/Buy from Bank of America)
- Dividends: Binary (dividend paid or not)
- Volume: Count (millions of shares traded)
- Period: 2019-2023 (330 companies × 5 years)
- Mixed-type matrices: continuous, ordinal (5 levels), binary, count
- Sample sizes: N ∈ {100, 500, 1000}
- True clusters: K = 2
- Noise scenarios: 0%, 10%, 20%