VAR: a new visual generation method elevates GPT-style models beyond diffusion🚀 & Scaling laws observed📈
This is the official PyTorch implementation of Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction.
We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling!
We also provide demo_sample.ipynb for you to see more technical details about VAR.
Visual Autoregressive Modeling (VAR) redefines the autoregressive learning on images as coarse-to-fine "next-scale prediction" or "next-resolution prediction", diverging from the standard raster-scan "next-token prediction".
For a deep dive into our analyses, discussions, and evaluations, check out our paper.
We provide VAR models for you to play with, which are on or can be downloaded from the following links:
model | reso. | FID | rel. cost | #params | HF weights🤗 |
---|---|---|---|---|---|
VAR-d16 | 256 | 3.55 | 0.4 | 310M | var_d16.pth |
VAR-d20 | 256 | 2.95 | 0.5 | 600M | var_d20.pth |
VAR-d24 | 256 | 2.33 | 0.6 | 1.0B | var_d24.pth |
VAR-d30 | 256 | 1.97 | 1 | 2.0B | var_d30.pth |
VAR-d30-re | 256 | 1.80 | 1 | 2.0B | var_d30.pth |
You can load these models to generate images via the codes in demo_sample.ipynb. Note: you need to download vae_ch160v4096z32.pth first.
This project is licensed under the MIT License - see the LICENSE file for details.
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@Article{VAR,
title={Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction},
author={Keyu Tian and Yi Jiang and Zehuan Yuan and Bingyue Peng and Liwei Wang},
year={2024},
eprint={2404.02905},
archivePrefix={arXiv},
primaryClass={cs.CV}
}