Min-Max-Imagenet DiT

In a similar spirit to the Keller Jordan's Fastest CIFAR-10 training, I want to be the fastest diffusion trainer in the east. I'll keep the progress here. Currently very much WIP.

Featuring:

DeepSpeed training of Diffusion Transformer. Supports Zero-1,2,3.
CPU-offloaded, skipped EMA trick for Karras' Post-hoc EMA analysis, where you EMA once in every N steps instead. You have to adjust beta_1 and beta_2 so they are properly accounting for the fact you skipped last N-1 steps. Of course, saving codes are there.
Featuring Streaming Dataset, specially my quantized imagenet.int8 for insanely lightweight imagenet training.

Dataset

Since this dataset is so small, you don't need to setup massive remote data setup stuff, just point to the local_dir, set remote_dir to None.

Running

For single-node setup, just

run.sh

Whats the goal here?

My goal is to get FID score of 30 under 20 hours of training. I'll keep updating this README as I make progress.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
ddpm.py		ddpm.py
dit_model.py		dit_model.py
ema.png		ema.png
inference.py		inference.py
main.py		main.py
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Min-Max-Imagenet DiT

Dataset

Running

Whats the goal here?

About

Releases

Packages

Languages

cloneofsimo/min-max-in-dit

Folders and files

Latest commit

History

Repository files navigation

Min-Max-Imagenet DiT

Dataset

Running

Whats the goal here?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages