For StepMix, please refer to this repository.
This is an R interface to Stepmix, a Python package following the scikit-learn API for model-based clustering and generalized mixture modeling (latent class/profile analysis) of continuous and categorical data. StepMix handles missing values through Full Information Maximum Likelihood (FIML) and provides multiple stepwise Expectation-Maximization (EM) estimation methods based on pseudolikelihood theory. Additional features include support for covariates and distal outcomes, various simulation utilities, and non-parametric bootstrapping, which allows inference in semi-supervised and unsupervised settings.
If you find StepMix useful, please consider citing our arXiv preprint:
@article{morin2023stepmix,
title={StepMix: A Python Package for Pseudo-Likelihood Estimation of Generalized Mixture Models with External Variables},
author={Morin, Sacha and Legault, Robin and Bakk, Zsuzsa and Gigu{\`e}re, Charles-{\'E}douard and de la Sablonni{\`e}re, Roxane and Lacourse, {\'E}ric},
journal={arXiv preprint arXiv:2304.03853},
year={2023}
}
You can install StepMixR from CRAN inside r using the function install.packages:
install.packages("stepmixr")
To install directly from github you need to have the package devtools
installed. Once it is installed, you can use the following syntax.
devtools::install_github("Labo-Lacourse/stepmixr")
- A notebook available from google colab gives a detailed tutorials based on the iris dataset. This notebook is a R adaptation of a similar Python notebook which can be found here. This tutorial covers :
- Continuous LCA models;
- Binary LCA models;
- Categorical LCA models;
- Mixed LCA models (continuous and categorical data);
- Missing values.
Here is a quick example from R documentation.
model1 <- stepmix(n_components = 3, n_steps = 2, measurement = "continuous")
X <- iris[, 1:4]
fit1 <- fit(model1, X)
pr1 <- predict(fit1, X)