WikiArt-Latent-Diffusion

Conditional denoising diffusion probabilistic model trained in latent space to generate paintings by famous artists. See the animation of the latent diffusion process in the figure below.

Fig. 1. The animation of the latent diffusion process.

Generalization to Different Sizes

The model is able to generalize to different image sizes. See generated examples below.

Fig. 2. Generated painting in the style of Ivan Aivazovsky.

Fig. 3. Generated painting in the style of Ivan Aivazovsky.

Fig. 4. Generated painting in the style of Ivan Aivazovsky.

Fig. 5. Generated painting in the style of Martiros Saryan.

Fig. 6. Generated painting in the style of Camille Pissarro.

Fig. 7. Generated painting in the style of Pyotr Konchalovsky.

Fig. 8. Generated painting in the style of Pierre Auguste Renoir.

Repository structure:

config.py is a file with model hyperparameters.
dataset.py contains dataset class.
generate_features.py contains functions to prepare dataset.
models.py contains implementations of the latent UNet model.
pipeline.py is a latent diffusion pipeline.
train.py performs training of the LatentUNet model using a single GPU instance.
evaluate.py performs evaluation of trained pipeline.
the notebook inference_example includes inference examples of the developed pieline.

Dataset

We used the WikiArt dataset containing 81444 pieces of visual art from various artists. All images were cropped and resized to 512x512 resolution. To convert images into latent representation we apply the pretrained VQ-VAE from the Stable Diffusion model implemented by StabilityAI.

Diffusion Model

We adapted 2D UNet model from Hugging Face diffusers package by adding three additional embedding layers to control paining style, including artist name, genre name and style name. Before adding the style embedding to time embedding, we pass each type of style embedding through PreNet modules.

The network is trained to predict the unscaled noise component using Huber loss function (it produces better results on this dataset compared to L2 loss). During evaluation, the generated latent representations are decoded into images using the pretrained VQ-VAE.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
evaluate.py		evaluate.py
generate_features.py		generate_features.py
inference_example.ipynb		inference_example.ipynb
model.py		model.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

.gitignore

.gitignore

README.md

README.md

config.py

config.py

dataset.py

dataset.py

evaluate.py

evaluate.py

generate_features.py

generate_features.py

inference_example.ipynb

inference_example.ipynb

model.py

model.py

pipeline.py

pipeline.py

requirements.txt

requirements.txt

train.py

train.py

Repository files navigation

WikiArt-Latent-Diffusion

Generalization to Different Sizes

Repository structure:

Dataset

Diffusion Model

About

Releases

Packages

Languages

artem-gorodetskii/WikiArt-Latent-Diffusion

Folders and files

Latest commit

History

Repository files navigation

WikiArt-Latent-Diffusion

Generalization to Different Sizes

Repository structure:

Dataset

Diffusion Model

About

Topics

Resources

Stars

Watchers

Forks

Languages