Sparse Data Diffusion (SDD) is a sparsity-aware diffusion model that explicitly models exact zeros by jointly diffusing discrete Sparsity Bits with continuous variables, enabling more realistic generation of inherently sparse data.
This work also won the Best Paper Award at the SimBioChem workshop at EurIPS.
This repo provides code for training DDPM, DDIM, and SDD with DDPM and DDIM sampling.
# Python 3.10 recommended
conda create -n sdd python=3.10 -y
conda activate sdd
pip install -r requirements.txt
For training, call main.py. Depending on your data type, change --input_mode. Right now, "calo_image", "scrna", and "image" are supported. --channels and --image_size must match the input (1 channel for scrna, image_size is the number of genes in this case.). --init_dimand ---dim_mults change the architecture. If you set the flag --with_sparsity_bits, SDD will be trained, otherwise DDPM/DDIM.
python main.py --input_mode calo_image --channels 1 --image_size 32 --init_dim 256 --dim_mults 1 1 1 --output_path /your/path/to/output --data_path /path/to/calo_images --with_sparsity_bits
For sampling, call sample_main.py with the coresponding --args_file. Select the --milestone and decide on the --num_samples and whether to sample using DDIM or DDPM --use_ddim.
python sample_main.py --args_file /path/to/args.json --milestones 1 --num_samples 100 --use_ddim True --sampling_steps 1000 --time_difference 0.0
Call calculate_physics_metrics.py:
python calculate_scrna_metrics.py --dataset_path /path/to/dataset --generated_path /path/to/generated --num_samples 10000
Call calculate_scrna_metrics.py:
python calculate_scrna_metrics.py --dataset_path /path/to/dataset --generated_path /path/to/generated
By default, everything is logged to wandb. Make sure to log in with your API key.
When using SDD in your research, please cite the relevant work
@inproceedings{
ostheimer2025sparse,
title={Sparse Data Diffusion for Scientific Simulations in Biology and Physics},
author={Phil Ostheimer and Mayank Nagda and Andriy Balinskyy and Jean Radig and Carl Herrmann and Stephan Mandt and Marius Kloft and Sophie Fellenz},
booktitle={EurIPS 2025 Workshop on SIMBIOCHEM},
year={2025},
url={https://openreview.net/forum?id=O3OVn3NSRE}
}
This project is MIT licenced. See LICENSE.