GitHub - LiuhanChen-github/VDiS: use mamba to video generation

Scalable Diffusion Models with State Space Backbone （DiS）
_{Official PyTorch Implementation}

This repo contains PyTorch model definitions, pre-trained weights and training/sampling code for our paper exploring diffusion models with state space backbones (DiSs). Our model treats all inputs including the time, condition and noisy image patches as tokens and employs skip connections between shallow and deep layers. Different from original Mamba for text sequence modeling, our SSM block process the hidden states sequence with both forward and backward directions

🪐 A PyTorch implementation of DiS
⚡️ Pre-trained checkpoints in paper
💥 A sampling script for running pre-trained DiS
🛸 A DiS training script using PyTorch DDP

1. Environments

Python 3.10
- conda create -n your_env_name python=3.10
Requirements file
- pip install -r requirements.txt
Install causal_conv1d and mamba
- pip install -e causal_conv1d
- pip install -e mamba

2. Training

We provide a training script for DiS in train.py. This script can be used to train unconditional, class-conditional DiS models, it can be easily modified to support other types of conditioning.

To launch DiS-H/2 (512x512) in the latent space training with N GPUs on one node:

torchrun --nnodes=1 --nproc_per_node=8 train.py \
--model DiS-H/2 \
--dataset-type imagenet \
--data-path imageNet1k \
--image-size 64 \
--task-type class-cond \
--num-classes 999

To launch DiS-S/2 (32x32) in the pixel space training with N GPUs on one node:

torchrun --nnodes=1 --nproc_per_node=8 train.py \
--model DiS-S/2 \
--data-path cifar10_data \
--dataset-type cifar-10 \
--image-size 32 \
--task-type uncond

There are several additional options; see train.py for details. All experiments in our work of training script can be found in file direction script.

For convenience, the pre-trained DiS models can be downloaded directly here as well:

DiT Model	Image Resolution	FID-50K
DiS-H/2	256x256	2.10
DiS-H/2	512x512	2.88

3. Evaluation

We include a sample.py script which samples images from a DiS model. Besides, we support other metrics evaluation in test.py script.

python sample.py \
--model DiS-L/2 \
--dataset-type imagenet \
--ckpt /path/to/model \
--image-size 256 \
--num-classes 1000 \
--cfg-scale 1.5

4. BibTeX

@article{FeiDiS2024,
  title={Scalable Diffusion Models with State Space Backbone},
  author={Zhengcong Fei, Mingyuan Fan, Changqian Yu, Jusnshi Huang},
  year={2024},
  journal={arXiv preprint},
}

5. Acknowledgments

The codebase is based on the awesome DiT, U-ViT, and Vim repos.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
causal-conv1d		causal-conv1d
diffusion		diffusion
results/DiS-H-2-cifar-10-class-cond		results/DiS-H-2-cifar-10-class-cond
tools		tools
visuals		visuals
README.md		README.md
models_dis.py		models_dis.py
requirements.txt		requirements.txt
sample.py		sample.py
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pycache

pycache

causal-conv1d

causal-conv1d

diffusion

diffusion

results/DiS-H-2-cifar-10-class-cond

results/DiS-H-2-cifar-10-class-cond

tools

tools

visuals

visuals

README.md

README.md

models_dis.py

models_dis.py

requirements.txt

requirements.txt

sample.py

sample.py

test.py

test.py

train.py

train.py

Repository files navigation

Scalable Diffusion Models with State Space Backbone （DiS）
_{Official PyTorch Implementation}

1. Environments

2. Training

3. Evaluation

4. BibTeX

5. Acknowledgments

About

Releases

Packages

Languages

LiuhanChen-github/VDiS

Folders and files

Latest commit

History

Repository files navigation

Scalable Diffusion Models with State Space Backbone （DiS）Official PyTorch Implementation

1. Environments

2. Training

3. Evaluation

4. BibTeX

5. Acknowledgments

About

Resources

Stars

Watchers

Forks

Languages

Scalable Diffusion Models with State Space Backbone （DiS）
_{Official PyTorch Implementation}