Revive-Flow: A Foundation Model for Blood DNAm Aging

Revive learns a continuous vector field of epigenetic aging from a large compendium of blood DNA methylation data. It then uses this learned model to design minimal, targeted CpG interventions for in silico rejuvenation.

The entire pipeline is implemented with streaming data loaders and cached intermediate steps, enabling it to run on standard CPU or GPU hardware.

Getting Started

Follow these three steps to set up the environment, download the data, and run the model.

1. Download the Data

The complete blood DNAm compendium is available on Zenodo. Download and extract it into your project directory.

Zenodo: REVIVE-Flow Blood DNAm Aging Compendium
DOI: 10.5281/zenodo.17118272

The expected directory structure is:

revive-flow/
├── revive_dataset/
│   ├── GSE51032/
│   │   ├── GSE51032_count_matrix.csv
│   │   └── GSE51032_meta.csv
│   ├── GSE61496/
│   │   ├── GSE61496_count_matrix.csv
│   │   └── GSE61496_meta.csv
│   └── ... (other studies)
└── ... (source code)

Each study requires a _count_matrix.csv file (CpGs × samples) and a _meta.csv file containing at least Age and Gender columns.

2. Set Up the Environment

This project uses uv for fast and reliable dependency management.

# Install uv if you haven't already
# pip install uv

# Sync the environment
uv sync

For GPU acceleration (recommended): Install a CUDA-enabled version of PyTorch by following the official instructions at pytorch.org.

3. Run the Model

You can now train the model on the full compendium, holding out a specific study for testing. The following command reproduces the main results for the EPIC-Italy cohort (GSE51032).

# Run training, holding out GSE51032 and using a GPU
uv run revive/train \
  --root_dir revive_dataset \
  --outputs_dir outputs \
  --holdout_study GSE51032 \
  --device cuda

Usage Examples

Negative Control (Shuffled Ages)

To verify that the model learns a true chronological signal, you can run the negative control experiment where sample ages are permuted.

uv run revive/train \
  --root_dir revive_dataset \
  --outputs_dir outputs_shuffled \
  --holdout_study GSE51032 \
  --neg_control_shuffled_age --neg_control_seed 1337

Reproducing All Paper Experiments

The train.sh script is a convenience wrapper to run the training and evaluation pipeline across all studies in a leave-one-out fashion, for both the main model and the shuffled-age control.

# This will generate results for all holdout folds
./train.sh

Reference

Key Command-Line Arguments

Argument	Description	Default
`--root_dir`	(Required) Path to the dataset directory (e.g., `revive_dataset`).
`--outputs_dir`	Directory to save models, metrics, and results.	`outputs`
`--holdout_study`	GSE ID of the study to hold out for testing. Use `""` to train on all data.	`GSE51032`
`--device`	Compute device to use.	`cuda`
`--latent_dim`	Dimension of the PCA latent space. If `<=0`, auto-selects by explained variance.	`1024`
`--delta_years`	List of commanded rejuvenation years (Δa) to evaluate.	`2 5 7 10`
`--lambda1_list`	List of L1 sparsity penalties (λ₁) for the ADMM controller.	`0.0 0.001 0.003 0.01`
`--ipca_batch`, `--admm_batch`	Batch sizes for memory-efficient PCA and ADMM steps.	`512`, `64`

Output Directory Structure

Results are organized by the holdout study. For a run with --outputs_dir outputs --holdout_study GSE51032, you will find:

outputs/
└── GSE51032/
    ├── edited_beta_d10_lam10.001.csv # Edited beta-values for Δa=10, λ₁=0.001
    ├── metrics_lodo_summary.csv      # Summary statistics (R², slope) for the holdout
    ├── metrics_lodo_deltas.csv       # Per-sample age predictions before/after edits
    ├── pca_model.npz                 # Saved PCA loadings and mean
    └── ridge_judge_uvec.npz          # Saved Ridge regressor model ("age judge")

Citation

If you use Revive-Flow in your research, please cite:

Paper:

@article{litman2025reviveflow,
  title   = {{Revive-Flow}: A Foundation Model for Blood {DNAm} Aging},
  author  = {Litman, Elon and others},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {https://doi.org/10.1101/2025.09.18.677241}
}

Dataset:

Litman, Elon, et al. (2025). REVIVE-Flow Blood DNAm Aging Compendium [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17118272

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
figs		figs
revive		revive
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
train.sh		train.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Revive-Flow: A Foundation Model for Blood DNAm Aging

Getting Started

1. Download the Data

2. Set Up the Environment

3. Run the Model

Usage Examples

Negative Control (Shuffled Ages)

Reproducing All Paper Experiments

Reference

Key Command-Line Arguments

Output Directory Structure

Citation

About

Uh oh!

Releases

Packages

Languages

BiostateAI/Revive

Folders and files

Latest commit

History

Repository files navigation

Revive-Flow: A Foundation Model for Blood DNAm Aging

Getting Started

1. Download the Data

2. Set Up the Environment

3. Run the Model

Usage Examples

Negative Control (Shuffled Ages)

Reproducing All Paper Experiments

Reference

Key Command-Line Arguments

Output Directory Structure

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages