# [Getting Started with Diffusers for Text-to-Image](https://pyimagesearch.com/2024/01/22/getting-started-with-diffusers-for-text-to-image/)

Using Diffusers library from Hugging Face, we'll generate images from text.

Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. These models work by destroying training data through the successive addition of Gaussian noise and then learning to recover the data by reversing this noising process.

Diffusion probabilistic models are parameterized Markov chains trained to gradually denoise data. 

<!-- O modelo inicialmente é treinado para adicionar ruído nas imagens. Após isso, ele faz o processo inverso, retirando a imagem adicionada e reconstruindo-a.
A partir disso, o modelo de difusão aprende a gerar imagens (estas baseadas nas usadas no seu treinamento) com base em ruído. -->

- What we'll use: The Diffusers library, developed by Hugging Face

In [2]:
# !pip install diffusers accelerate

## Imports

In [3]:
import tqdm
import torch
import PIL
import PIL.Image
import numpy as np
import diffusers
from PIL import Image
from diffusers import UNet2DModel
from diffusers import AutoPipelineForText2Image
from diffusers import DDPMPipeline
from diffusers import DDPMScheduler

2025-08-08 14:45:43.448172: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-08 14:45:43.638955: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1754675143.714626   19528 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754675143.744619   19528 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1754675143.909264   19528 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

## Diffusers

- Model: determine the residual noise;
- Scheduler: it's a noise scheduling algorithm (define how many diffusion steps are needed for inference);
- Pipeline: it amalgamates a model with a scheduler.

In [4]:
# automatically discern the most suitable pipeline class for a task
pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16").to("cuda")

# tests with
# stabilityai/stable-diffusion-xl-base-1.0
# kandinsky-community/kandinsky-2-2-decoder


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 15 files:   0%|          | 0/15 [00:00<?, ?it/s]

config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/308 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/617 [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

safety_checker/model.fp16.safetensors:   0%|          | 0.00/608M [00:00<?, ?B/s]

text_encoder/model.fp16.safetensors:   0%|          | 0.00/246M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/547 [00:00<?, ?B/s]

unet/diffusion_pytorch_model.fp16.safete(…):   0%|          | 0.00/1.72G [00:00<?, ?B/s]

vae/diffusion_pytorch_model.fp16.safeten(…):   0%|          | 0.00/167M [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

In [None]:
image = pipeline("impressionist oil painting of pikachu, backlight, centered composition, masterpiece, photorealistic, 8k").images[0]

image

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
# prompt = "Astronaut in a 1700 New York, cold color palette, muted colors, detailed, 8k"

# image = pipeline(prompt = prompt, height=768, width=512).images[0]
# image

In [None]:
# parameters for flexibility accommodating to specific user needs 

# image = pipeline(prompt,
#                  height=768,
#                  width=512,
#                  num_inference_steps=70,
#                  guidance_scale=10.5,
#                  ).images[0]
# image

## Denoising Diffusion Probabilistic Models (DDPM)

- Model
  - Instances of the model class are neural networks that take a noisy sample as well as a timestep as inputs to predict a less noisy output sample.

- Schedulers
  - Act as the algorithmic backbone that guides both the training and inference processes (it dictates how noise is added to the model; after this, they take the model output, typically the noisy_residual, and compute a slightly less noisy sample from it).
    - Unlike models, which have trainable weights, schedulers usually do not possess any trainable parameters.
    - Despite not inheriting from torch.nn.Module like typical neural network models, schedulers are still instantiated based on a configuration.
- [DDPM Paper](https://arxiv.org/abs/2006.11239)