Skip to content

BiostateAI/Revive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Revive-Flow: A Foundation Model for Blood DNAm Aging

Revive learns a continuous vector field of epigenetic aging from a large compendium of blood DNA methylation data. It then uses this learned model to design minimal, targeted CpG interventions for in silico rejuvenation.

The entire pipeline is implemented with streaming data loaders and cached intermediate steps, enabling it to run on standard CPU or GPU hardware.

Revive-Flow Pipeline

Getting Started

Follow these three steps to set up the environment, download the data, and run the model.

1. Download the Data

The complete blood DNAm compendium is available on Zenodo. Download and extract it into your project directory.

The expected directory structure is:

revive-flow/
├── revive_dataset/
│   ├── GSE51032/
│   │   ├── GSE51032_count_matrix.csv
│   │   └── GSE51032_meta.csv
│   ├── GSE61496/
│   │   ├── GSE61496_count_matrix.csv
│   │   └── GSE61496_meta.csv
│   └── ... (other studies)
└── ... (source code)

Each study requires a _count_matrix.csv file (CpGs × samples) and a _meta.csv file containing at least Age and Gender columns.

2. Set Up the Environment

This project uses uv for fast and reliable dependency management.

# Install uv if you haven't already
# pip install uv

# Sync the environment
uv sync

For GPU acceleration (recommended): Install a CUDA-enabled version of PyTorch by following the official instructions at pytorch.org.

3. Run the Model

You can now train the model on the full compendium, holding out a specific study for testing. The following command reproduces the main results for the EPIC-Italy cohort (GSE51032).

# Run training, holding out GSE51032 and using a GPU
uv run revive/train \
  --root_dir revive_dataset \
  --outputs_dir outputs \
  --holdout_study GSE51032 \
  --device cuda

Usage Examples

Negative Control (Shuffled Ages)

To verify that the model learns a true chronological signal, you can run the negative control experiment where sample ages are permuted.

uv run revive/train \
  --root_dir revive_dataset \
  --outputs_dir outputs_shuffled \
  --holdout_study GSE51032 \
  --neg_control_shuffled_age --neg_control_seed 1337

Reproducing All Paper Experiments

The train.sh script is a convenience wrapper to run the training and evaluation pipeline across all studies in a leave-one-out fashion, for both the main model and the shuffled-age control.

# This will generate results for all holdout folds
./train.sh

Reference

Key Command-Line Arguments

Argument Description Default
--root_dir (Required) Path to the dataset directory (e.g., revive_dataset).
--outputs_dir Directory to save models, metrics, and results. outputs
--holdout_study GSE ID of the study to hold out for testing. Use "" to train on all data. GSE51032
--device Compute device to use. cuda
--latent_dim Dimension of the PCA latent space. If <=0, auto-selects by explained variance. 1024
--delta_years List of commanded rejuvenation years (Δa) to evaluate. 2 5 7 10
--lambda1_list List of L1 sparsity penalties (λ₁) for the ADMM controller. 0.0 0.001 0.003 0.01
--ipca_batch, --admm_batch Batch sizes for memory-efficient PCA and ADMM steps. 512, 64

Output Directory Structure

Results are organized by the holdout study. For a run with --outputs_dir outputs --holdout_study GSE51032, you will find:

outputs/
└── GSE51032/
    ├── edited_beta_d10_lam10.001.csv # Edited beta-values for Δa=10, λ₁=0.001
    ├── metrics_lodo_summary.csv      # Summary statistics (R², slope) for the holdout
    ├── metrics_lodo_deltas.csv       # Per-sample age predictions before/after edits
    ├── pca_model.npz                 # Saved PCA loadings and mean
    └── ridge_judge_uvec.npz          # Saved Ridge regressor model ("age judge")

Citation

If you use Revive-Flow in your research, please cite:

Paper:

@article{litman2025reviveflow,
  title   = {{Revive-Flow}: A Foundation Model for Blood {DNAm} Aging},
  author  = {Litman, Elon and others},
  journal = {bioRxiv},
  year    = {2025},
  doi     = {https://doi.org/10.1101/2025.09.18.677241}
}

Dataset:

Litman, Elon, et al. (2025). REVIVE-Flow Blood DNAm Aging Compendium [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17118272

About

A foundation model for blood DNAm aging

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published