Skip to content

A continuous state-space diffusion model for sparse data.

License

Notifications You must be signed in to change notification settings

PhilSid/sparse-data-diffusion

Repository files navigation

Sparse Data Diffusion

Sparse Data Diffusion (SDD) is a sparsity-aware diffusion model that explicitly models exact zeros by jointly diffusing discrete Sparsity Bits with continuous variables, enabling more realistic generation of inherently sparse data.

This work also won the Best Paper Award at the SimBioChem workshop at EurIPS.

This repo provides code for training DDPM, DDIM, and SDD with DDPM and DDIM sampling.


Installation

1. Create a virtual environment

# Python 3.10 recommended
conda create -n sdd python=3.10 -y
conda activate sdd

2.Install dependencies

pip install -r requirements.txt

Training

For training, call main.py. Depending on your data type, change --input_mode. Right now, "calo_image", "scrna", and "image" are supported. --channels and --image_size must match the input (1 channel for scrna, image_size is the number of genes in this case.). --init_dimand ---dim_mults change the architecture. If you set the flag --with_sparsity_bits, SDD will be trained, otherwise DDPM/DDIM.

python main.py --input_mode calo_image --channels 1 --image_size 32 --init_dim 256 --dim_mults 1 1 1 --output_path /your/path/to/output --data_path /path/to/calo_images --with_sparsity_bits

Sampling

For sampling, call sample_main.py with the coresponding --args_file. Select the --milestone and decide on the --num_samples and whether to sample using DDIM or DDPM --use_ddim.

python sample_main.py --args_file /path/to/args.json --milestones 1 --num_samples 100 --use_ddim True --sampling_steps 1000 --time_difference 0.0

Evaluation

Calorimeter images

Call calculate_physics_metrics.py:

python calculate_scrna_metrics.py --dataset_path /path/to/dataset --generated_path /path/to/generated --num_samples 10000

scRNA

Call calculate_scrna_metrics.py:

python calculate_scrna_metrics.py --dataset_path /path/to/dataset --generated_path /path/to/generated

Logging

By default, everything is logged to wandb. Make sure to log in with your API key.

How to Cite

When using SDD in your research, please cite the relevant work

@inproceedings{
    ostheimer2025sparse,
    title={Sparse Data Diffusion for Scientific Simulations in Biology and Physics},
    author={Phil Ostheimer and Mayank Nagda and Andriy Balinskyy and Jean Radig and Carl Herrmann and Stephan Mandt and Marius Kloft and Sophie Fellenz},
    booktitle={EurIPS 2025 Workshop on SIMBIOCHEM},
    year={2025},
    url={https://openreview.net/forum?id=O3OVn3NSRE}
}

License

This project is MIT licenced. See LICENSE.

About

A continuous state-space diffusion model for sparse data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages