Paper:
Phase-Type Variational Autoencoder for Heavy-Tailed Data
A PyTorch implementation of a multivariate Phase-Type Variational Autoencoder (PH-VAE) for modeling positive-valued continuous and heavy-tailed data.
This work bridges:
- Deep generative modeling
- Applied probability
- Phase-Type distributions
- Heavy-tail statistical modeling
and explores their integration within modern latent-variable learning frameworks.
PH-VAE architecture with latent-conditioned Phase-Type decoder.
PH-VAE replaces the classical Gaussian decoder of a Variational Autoencoder with a latent-conditioned Phase-Type (PH) distribution. Instead of assuming a fixed parametric likelihood, the decoder learns flexible stochastic processes capable of modeling:
- Heavy-tailed behavior
- Skewed distributions
- Extreme quantiles
- Multivariate dependence through shared latent variables
The repository contains:
- A complete PyTorch implementation of PH-VAE
- Phase-Type likelihood evaluation utilities
- ELBO-based training
- PH sampling routines
- Exploratory notebooks on synthetic and real-world datasets
- Multivariate PH-VAE implementation in PyTorch
- Latent-conditioned Phase-Type decoder
- Exact PH likelihood computation
- Matrix exponential + uniformization methods
- ELBO optimization with KL regularization
- Sampling utilities for Phase-Type distributions
- Heavy-tail modeling for positive-valued data
Python 3.10+ is recommended.
python -m venv ph_env
source ph_env/bin/activateInstall the project dependencies from the repository root:
pip install -r requirements.txtIf you plan to run the notebooks, you may also want:
pip install jupyter ipykernelOpen the repository in:
- Jupyter Notebook
- VS Code
- JupyterLab
and run one of the notebooks inside notebooks/.
Important: run notebooks from the repository root so imports resolve correctly.
Example imports:
from models import MultiDimPHVAE
from utils import *If running from inside notebooks/, change the working directory to the repository root first. First cell does that...
from models import MultiDimPHVAE
from models.ph_vae.ph_vae_trainer import train_phvae
model = MultiDimPHVAE(
input_dim=dim,
latent_dim=10,
n_phases=15
)
train_phvae(
model,
train_loader,
epochs=100,
learning_rate=1e-3
)A pretrained checkpoint with Weibull data is available in:
models/saved/phvae_weibull.pt
It can be used for:
- Inference
- Sampling
- Evaluation
- Visualization
- Financial risk modeling
- Insurance losses
- Reliability analysis
- Word-frequency modeling
- Rare-event generative modeling
@article{ziani2026phvae,
title={Phase-Type Variational Autoencoders for Heavy-Tailed Data},
author={Ziani, Abdelhakim and Horvath, Andras and Ballarini, Paolo},
journal={arXiv preprint arXiv:2603.01800},
year={2026}
}