Revive learns a continuous vector field of epigenetic aging from a large compendium of blood DNA methylation data. It then uses this learned model to design minimal, targeted CpG interventions for in silico rejuvenation.
The entire pipeline is implemented with streaming data loaders and cached intermediate steps, enabling it to run on standard CPU or GPU hardware.
Follow these three steps to set up the environment, download the data, and run the model.
The complete blood DNAm compendium is available on Zenodo. Download and extract it into your project directory.
- Zenodo: REVIVE-Flow Blood DNAm Aging Compendium
- DOI:
10.5281/zenodo.17118272
The expected directory structure is:
revive-flow/
├── revive_dataset/
│ ├── GSE51032/
│ │ ├── GSE51032_count_matrix.csv
│ │ └── GSE51032_meta.csv
│ ├── GSE61496/
│ │ ├── GSE61496_count_matrix.csv
│ │ └── GSE61496_meta.csv
│ └── ... (other studies)
└── ... (source code)
Each study requires a _count_matrix.csv file (CpGs × samples) and a _meta.csv file containing at least Age and Gender columns.
This project uses uv for fast and reliable dependency management.
# Install uv if you haven't already
# pip install uv
# Sync the environment
uv syncFor GPU acceleration (recommended): Install a CUDA-enabled version of PyTorch by following the official instructions at pytorch.org.
You can now train the model on the full compendium, holding out a specific study for testing. The following command reproduces the main results for the EPIC-Italy cohort (GSE51032).
# Run training, holding out GSE51032 and using a GPU
uv run revive/train \
--root_dir revive_dataset \
--outputs_dir outputs \
--holdout_study GSE51032 \
--device cudaTo verify that the model learns a true chronological signal, you can run the negative control experiment where sample ages are permuted.
uv run revive/train \
--root_dir revive_dataset \
--outputs_dir outputs_shuffled \
--holdout_study GSE51032 \
--neg_control_shuffled_age --neg_control_seed 1337The train.sh script is a convenience wrapper to run the training and evaluation pipeline across all studies in a leave-one-out fashion, for both the main model and the shuffled-age control.
# This will generate results for all holdout folds
./train.sh| Argument | Description | Default |
|---|---|---|
--root_dir |
(Required) Path to the dataset directory (e.g., revive_dataset). |
|
--outputs_dir |
Directory to save models, metrics, and results. | outputs |
--holdout_study |
GSE ID of the study to hold out for testing. Use "" to train on all data. |
GSE51032 |
--device |
Compute device to use. | cuda |
--latent_dim |
Dimension of the PCA latent space. If <=0, auto-selects by explained variance. |
1024 |
--delta_years |
List of commanded rejuvenation years (Δa) to evaluate. | 2 5 7 10 |
--lambda1_list |
List of L1 sparsity penalties (λ₁) for the ADMM controller. | 0.0 0.001 0.003 0.01 |
--ipca_batch, --admm_batch |
Batch sizes for memory-efficient PCA and ADMM steps. | 512, 64 |
Results are organized by the holdout study. For a run with --outputs_dir outputs --holdout_study GSE51032, you will find:
outputs/
└── GSE51032/
├── edited_beta_d10_lam10.001.csv # Edited beta-values for Δa=10, λ₁=0.001
├── metrics_lodo_summary.csv # Summary statistics (R², slope) for the holdout
├── metrics_lodo_deltas.csv # Per-sample age predictions before/after edits
├── pca_model.npz # Saved PCA loadings and mean
└── ridge_judge_uvec.npz # Saved Ridge regressor model ("age judge")
If you use Revive-Flow in your research, please cite:
Paper:
@article{litman2025reviveflow,
title = {{Revive-Flow}: A Foundation Model for Blood {DNAm} Aging},
author = {Litman, Elon and others},
journal = {bioRxiv},
year = {2025},
doi = {https://doi.org/10.1101/2025.09.18.677241}
}Dataset:
Litman, Elon, et al. (2025). REVIVE-Flow Blood DNAm Aging Compendium [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17118272
