<a href="https://colab.research.google.com/github/Praveen76/Introduction-to-Stable-Diffusion/blob/main/Introduction_to_StableDiffusion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Learning Objectives

At the end of the experiment, you will be able to:

* understand the concept of stable diffusion
* implement the pre-trained model to generate images from the text prompt
* understand the different parameters to generate refined images

## Stable Diffusion

Stable Diffusion is a deep learning model for image generation,
released in 2022. It is based on a particular type of diffusion model called `Latent Diffusion`, proposed in the paper [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/pdf/2112.10752.pdf). They are generative models, designed to  create new data similar to the data it was trained on. In the case of Stable Diffusion, the data is images and trained to reduce random Gaussian noise.

The Stable Diffusion algorithm was developed by Compvis (the Computer Vision research group at the Ludwig Maximilian University of Munich) and sponsored primarily by the startup Stability AI.

The algorithm is based on ideas from DALL-E 2 (developed by Open AI, which is also the creator of ChatGPT), Imagen (from Google), and others image generation models.

Types of image generation through diffusion models:

* **Unconditional image generation:** the model generates images without any additional condition like text or image. You will get images similar to those provided in the training set.

* The **generation of images conditioned** by text is known as `text-to-image`, or `text2img`. The prompt is converted into embeddings that are used to condition the model to
generate an image from noise.

* The **generation of images based on other image** is known as `image-to-image`, or `img2img`. In addition to the text prompt, it allows sending an initial image to condition the  generation of new images. You have more control over the final composition.

* **`Inpainting`** allows selecting a specific part of the image to change  the class/concept, or even  removing it from the scene.


### Main components of Stable diffusion

Main components include:

1. Autoencoder (VAE)
2. U-Net
3. Text-encoder
4. CLIP (Contrastive Language–Image Pre-training

Figure below shows the conceptual diagram of stable diffusion architecture/inference steps:

<br>
<center>
<img src = "https://datascienceimages.s3.eu-north-1.amazonaws.com/Introduction_to_StableDiffusion/stable-diffusion.webp?response-content-disposition=inline&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEHQaCmV1LW5vcnRoLTEiRzBFAiEAmOZQhT%2BYM40tIrgSCC0juQ3jYb2AAM%2F1Fbz6dD8McLICIHg%2Fvrzjk8lvADAhmWpZrYL9TF3PwvkzKpdWA8hZeRQoKu0CCM3%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQABoMOTc1MDUwMDY4NjU4IgyeGfj2LlL0RNsVYPMqwQJM4d9h4CmJwEyTBC7qZEzc29kl0a%2FrmErIOHAztB%2FrvmsLwfZSviafkN0S0Rgnb9ZUlQLlOLgmJAh%2Boo5wqCAD6Vph5HKDP0UmiithH%2BlXOUIiDE51ykTUEA3w1iSSHMyOs7vDBsLp%2FMFpvpCG3V2XnlQ5Iqgo422ee763aDs%2FzuanCRZpfGDaLTo%2BgTWY0vQntjWJILYrZeOw0tpLpzZArag7KROagc0J8JjBvK0M%2Fz02ApRdrd4sspTDVFraDlGzFJekxjfCDEkf0gSQy7gutoZ2hujhd%2BonPc2Sob988wEY1BUTJWathwBoXphrbMZIS6LhUp1rTnDl8w7hv54XZ4ORo9w4oXVsZdG%2FDbp2dUX6yVHFY3mjMGdU6fEQ1j2AlzySCTm4T%2FzJpXst4CisnbG6opNd6Os2b78VkYfQnKAwtdPmsQY6swKV5d%2BjoEU9poWR0HUudhbWdbIOfv1xKG64htIRW3NpNgi0OonJi5zOPTw5aWoAvi5hbnA7akONMJtwKRTgl9t30qtpFA8DwvmNL%2BUt%2FNsayqUy%2BrKPtD%2FtfVYyu82iXyxz4EyCSXPTgcCREW61a0z%2F8HtRH76JayEUAc7PAypjS861xn7SFWDzldlYWE8H2bDvjOQdf82VBKm67cP5CPWaxKpPkpTJFAR82hOQeFordG1wctY1YO5AOZnqv8Wd%2B7Tz9rQYNYk9xUihZZHgrvYzMuQWKcj%2By2eMBR7uWxP5LR48KBnL6edR4ZCv5H5LYCGjIREzJAImRO9qbDptQLHW%2B943UkCxOSqNdCIQjgRoWe84M93lB6rTQjLowcqWYU9YJDY9V73Jfw8Qz3mx6HxtnhdV&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20240507T042806Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIA6GBMDKKZK2QSB44A%2F20240507%2Feu-north-1%2Fs3%2Faws4_request&X-Amz-Signature=ccb1d3a0a3f6763b0b1aa2d4e21c7863ccee13dcb192f1574cf9208460f92f37" width=700px/>
</center>

* SD receives a latent seed and a text prompt as input.
* The seed is used to generate random representations of
latent images of size 64x64.
* The text prompt is transformed into text embeddings of
size 77x768 using CLIP text encoder.
* U-Net iteratively reduces noise from the random latent
image representations while conditioning on the text
embeddings.
* The U-Net output (noise residual) is used to compute a
denoised latent image representation using a scheduler
algorithm.
* The denoising process is repeated $x$ times to recover the
best latent image representations.
* Once completed, the latent image representation is
decoded by the decoder part of the VAE.


### Setup Steps:

## Installing the libraries

- [xformers](https://github.com/facebookresearch/xformers) for memory optimization



In [None]:
!pip install -q diffusers==0.11.1
!pip install -q accelerate transformers ftfy bitsandbytes==0.35.0 gradio natsort safetensors xformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m524.9/524.9 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.6/297.6 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.4/54.4 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 MB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.2/12.2 MB[0m [31m38.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m222.7/222.7 MB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.8/91.8 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.4/314.4 kB[0m [31m25.6 

### Import required packages

In [None]:
from PIL import Image
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

## Load Pipeline for Image generation

- We can define with little effort a pipeline to use the Stable Diffusion model, through the [StableDiffusionPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py).

- The checkpoint used here is the [`'CompVis/stable-diffusion-v1-4'`](https://huggingface.co/CompVis/stable-diffusion-v1-4)

In [None]:
import torch    #PyTorch
import jax
jax.random.KeyArray = jax.Array           # since jax.random.KeyArray is deprecated
from diffusers import StableDiffusionPipeline

In [None]:
# Load pipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)   # float32

In [None]:
# Connecting and using GPU
pipe = pipe.to('cuda')

In [None]:
# Memory optimization, specially on colab
pipe.enable_attention_slicing()
pipe.enable_xformers_memory_efficient_attention()

## Generating the image

**Example 1**

In [None]:
# Creating the prompt

prompt = 'a tiger standing on grass land'

# eg. 'a photograph of an astronaut riding a horse

In [None]:
# Inference from pipeline

img = pipe(prompt).images[0]
print(type(img), img.size)
img

**Example2**

In [None]:
prompt = 'a tiger sitting on tree'
img = pipe(prompt).images[0]
img

Saving the result

In [None]:
# Save your image as png file
img.save('result.png')

## Generating multiple images

Provide the value of parameter `num_images_per_prompt` inside pipeline

In [None]:
# Function for plotting multiple images as one

def grid_img(imgs, rows=1, cols=3, scale=1):
  assert len(imgs) == rows * cols

  w, h = imgs[0].size
  w, h = int(w * scale), int(h * scale) # reduce the size : when s<1 ; 0.75 * 512 = 384

  grid = Image.new('RGB', size = (cols * w, rows * h)) # creating gird (like blank canvas)
  grid_w, grid_h = grid.size

  for i, img in enumerate(imgs): # looping throuhg images
    img = img.resize((w, h), Image.ANTIALIAS) # second parameter preserve the image appearance
    grid.paste(img, box=(i % cols * w, i // cols * h)) # pasting the image on grids

  return grid

**Example3**

In [None]:
num_imgs = 3
prompt = 'photograph of a royal enfield'
imgs = pipe(prompt, num_images_per_prompt= num_imgs).images
grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

## Getting the same image on every run

* Provide **`generator` parameter** with fixed value of `seed`.
* seed is used to initialize the generation of random numbers. It is used to create the initial latent noise.
* By setting seed, we can reproduce the results.



In [None]:
# Single image
seed = 777
generator = torch.Generator('cuda').manual_seed(seed)
img = pipe(prompt, generator=generator).images[0]
img

In [None]:
# Multiple image --> using above defined grid_img() function

prompt = "photograph of a royal enfield'"
seed = 777
generator = torch.Generator("cuda").manual_seed(seed)
imgs = pipe(prompt, num_images_per_prompt=num_imgs, generator=generator).images
grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

**Example4**

In [None]:
prompt = "a photograph of an astronaut riding a horse"
generator = torch.Generator("cuda").manual_seed(seed)
imgs = pipe(prompt, num_images_per_prompt=num_imgs, generator=generator).images
grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

## Parameters

There are many other parameters like `generator` that can be passed inside the StableDiffusionPipeline to improve the result.

For example:
- Inference steps (`num_inference_steps`)
- Guidance scale (`guidance_scale`)
- Image size (`height` and `width` dimensions)
- Negative prompt (`negative_prompt`)

#### **Inference steps (`num_inference_steps`)**

The more the steps, the better the results but the longer it takes to generate the image. It is also known as denoising steps, as it indicates the number of steps required to turn the image from complete noise (initial state) into the result.

* Stable Diffusion works very well with a relatively small number of steps, so we recommend using the default value of 50 inference steps. The more steps the better the result, but there comes a point where  the image stops improving.
* For faster results, use a smaller number.
* The defaut number of steps varies according to the scheduler algorithm

In [None]:
prompt = "a photograph of an astronaut riding a horse"
generator = torch.Generator("cuda").manual_seed(seed)
img = pipe(prompt, num_inference_steps=20, generator=generator).images[0]
img

Generating image by passing different inference steps from 10 to 50:

In [None]:
plt.figure(figsize=(18,8))

for i in range(1, 6):
    n_steps = i * 10
    #print(n_steps)
    generator = torch.Generator('cuda').manual_seed(seed)
    img = pipe(prompt, num_inference_steps=n_steps, generator=generator).images[0]

    plt.subplot(1, 5, i)
    plt.title('num_inference_steps: {}'.format(n_steps))
    plt.imshow(img)
    plt.axis('off')

plt.show()

#### **Guidance scale (CFG)**

The classifier-free guidance (CFG - also known as guidance scale) is a way to increase the adherence to  the conditional prompt that guides the generation, as well as the overall quality of the image.
It controls how much the prompt will be taken into account for conditioning the diffusion process.

* **Smaller values:** the more the prompt is ignored. For example, if the value is set to 0 then the image
generation is unconditioned.
* **Higher values:** returns images that better represent the prompt



Choosing the best value

* Values between 7 and 8.5 are generally good choices. The default value is 7.5

* In general, you can keep the value in the range from 5 to 9. Less realistic images will be returned if the values are too low or too high

* The best value depends on the desired results and the complexity of the prompt

In [None]:
prompt = "a photograph of an astronaut riding a horse"

generator = torch.Generator("cuda").manual_seed(seed)
img = pipe(prompt, guidance_scale=10, generator=generator).images[0]
img

Generating image by passing different CFG values from  from 5 to 9:

In [None]:
plt.figure(figsize=(18,8))

for i in range(1, 6):
    n_guidance = i + 4
    generator = torch.Generator("cuda").manual_seed(seed)
    img = pipe(prompt, guidance_scale=n_guidance, generator=generator).images[0]

    plt.subplot(1,5,i)
    plt.title('guidance_scale: {}'.format(n_guidance))
    plt.imshow(img)
    plt.axis('off')

plt.show()

#### **Image size (dimensions)**

The generated images are 512 x 512 pixels

Recommendations in case you want other dimensions:

* make sure the height and width are multiples of 8
* less than 512 will result in lower quality images
* exceeding 512 in both directions (width and height) will repeat areas of the image ("global coherence" is lost)

> **Landscape mode**


In [None]:
seed = 777
prompt = "photograph of a mountain landscape during sunset, stars in the sky"
generator = torch.Generator("cuda").manual_seed(seed)
h, w = 512, 768
img = pipe(prompt, height=h, width=w, generator=generator).images[0]
img

> **Portrait mode**



In [None]:
generator = torch.Generator("cuda").manual_seed(seed)
h, w = 768, 512
img = pipe(prompt, height=h, width=w, generator=generator).images[0]
img

#### **Negative prompt (`negative_prompt`)**

The negative prompt is an
additional way of telling what you
don't want in the image, what
you'd like to avoid.
* It could be used to remove objects
from the image or fix defects.
* It is optional in the first versions of
Stable Diffusion, however, in the
latest versions it has become
important to generate
quality images.
* Some images can only be
generated using negative prompts.

During the text-to-image conditioning step, the prompt is converted into embedding
vectors which will feed the U-Net noise predictor.
* There are two sets of embedding vectors: one for the positive prompt and one for the
negative prompt.
* Both the positive and negative prompts have 77 tokens.

In [None]:
num_images = 3

prompt = 'photograph of an old car'
neg_prompt = 'bw photo'                  # black and white photo

imgs = pipe(prompt, negative_prompt = neg_prompt, num_images_per_prompt= num_images).images

grid = grid_img(imgs, rows= 1, cols= 3, scale=0.75)
grid

## Other models

Stable Diffusion models (sometimes called as checkpoint files) are pre-trained  weights used to generate images, for general or specific purposes.
* The images that a model can generate depend on the data used to train it. For  example, a model will not be able to generate an image of a cat if there are no  cats in the training data.
* If you train a model with images of cats, it should be able to only generate cats.

* The official models released by Stability AI and its partners are called base models,  or general-purpose models.


Diffusion model that was publicly available:

* v1.4 - Released in August 2022 by Stability AI. It is considered the first Stable [https://huggingface.co/CompVis/stable-diffusion-v-1-4-original]

* v1.5 - released in October 2022 by Stability AI and Runway
ML. It is based on its  predecessor, with additional training.
[https://huggingface.co/runwayml/stable-diffusion-v1-5]



* v2.0 - Released in November 2022 by Stability AI
In addition to 512 × 512 dimensions, a higher resolution version of 768 × 768 pixels is  available.

* v2.1 - Released in December 2022 by Stability AI

* Differences in v2

  Stable Diffusion v1 uses CLIP Open AI for text embedding.

  Stable Diffusion v2 uses OpenClip for text embedding.

Main reasons for the change: OpenClip is currently five times bigger. A larger text encoder model improves  image quality. It is able to better “understand” the prompt.

All are considered general-purpose models.


## Trying different versions

### **SD v1.5**

In [None]:
sd15 = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
sd15 = sd15.to('cuda')
sd15.enable_attention_slicing()
sd15.enable_xformers_memory_efficient_attention()

In [None]:
num_imgs = 3

prompt = "photograph of an old car"
neg_prompt = "bw photo"

imgs = sd15(prompt, negative_prompt=neg_prompt, num_images_per_prompt=num_imgs).images

grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

In [None]:
prompt = "photo of a futuristic city on another planet, realistic, full hd"
neg_prompt = 'buildings'

imgs = sd15(prompt, negative_prompt = neg_prompt, num_images_per_prompt=num_imgs).images

grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

### **SD v2.1**

**<font color="#C24641">
NOTE : To run SD v2.1 given below, first *Disconnect and delete the runtime* then connect to it again, otherwise the colab notebook will run out of space.</font>**

In [None]:
# Install dependencies
!pip install -q diffusers==0.11.1
!pip install -q accelerate transformers ftfy bitsandbytes==0.35.0 gradio natsort safetensors xformers

# Import required packages
from PIL import Image
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
import torch
import jax
jax.random.KeyArray = jax.Array
from diffusers import StableDiffusionPipeline
from IPython.display import clear_output
clear_output()

In [None]:
sd2 = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
sd2 = sd2.to("cuda")
sd2.enable_attention_slicing()
sd2.enable_xformers_memory_efficient_attention()

In [None]:
# Function for plotting multiple images as one

def grid_img(imgs, rows=1, cols=3, scale=1):
  assert len(imgs) == rows * cols

  w, h = imgs[0].size
  w, h = int(w * scale), int(h * scale) # reduce the size : when s<1 ; 0.75 * 512 = 384

  grid = Image.new('RGB', size = (cols * w, rows * h)) # creating gird (like blank canvas)
  grid_w, grid_h = grid.size

  for i, img in enumerate(imgs): # looping throuhg images
    img = img.resize((w, h), Image.ANTIALIAS) # second parameter preserve the image appearance
    grid.paste(img, box=(i % cols * w, i // cols * h)) # pasting the image on grids

  return grid

In [None]:
num_imgs=3
prompt = "photo of a futuristic city on another planet, realistic, full hd"
neg_prompt = 'buildings'

imgs = sd2(prompt, negative_prompt=neg_prompt, num_images_per_prompt=num_imgs).images

grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

### Fine-tuned models with specific styles

* **Additional training** – Training a base model with an additional dataset. For example, you can train Stable Diffusion with an additional old car dataset to orient the aesthetics of the  cars to that specific type.

* [**Dreambooth**](https://dreambooth.github.io/) – Initially developed by Google, it is a technique for injecting custom  subjects into the models. Due to its architecture, it is possible to achieve great results  using only 3/5 custom images.

* **Textual inversion** (also called Embedding) – It injects a custom subject into the model  with just a few examples. A new keyword is created specifically for the new object and  the training is executed only in the embedding neural network.


Lets see some fine tuned models available :

* Modern Disney style- https://huggingface.co/nitrosocke/mo-di-diffusion

* Classic Disney Style - https://huggingface.co/nitrosocke/classic-anim-diffusion

* High resolution 3D animation - https://huggingface.co/nitrosocke/redshift-diffusion

* Futuristic images - https://huggingface.co/nitrosocke/Future-Diffusion

* Other animation styles:
 * https://huggingface.co/nitrosocke/Ghibli-Diffusion
 * https://huggingface.co/nitrosocke/spider-verse-diffusion
* More models https://huggingface.co/models?other=stable-diffusion-diffusers



In [None]:
# Modern Disney style

modi = StableDiffusionPipeline.from_pretrained("nitrosocke/mo-di-diffusion", torch_dtype=torch.float16)
modi = modi.to("cuda")
modi.enable_attention_slicing()
modi.enable_xformers_memory_efficient_attention()

In [None]:
prompt = "a photograph of an astronaut riding a horse, modern disney style"

seed = 777
generator = torch.Generator("cuda").manual_seed(seed)

imgs = modi(prompt, generator=generator, num_images_per_prompt=num_imgs).images

grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

In [None]:
prompt = "panda, modern disney style"

generator = torch.Generator("cuda").manual_seed(seed)
imgs = modi(prompt, generator=generator, num_images_per_prompt=3).images

grid = grid_img(imgs, rows=1, cols=3, scale=0.5)
grid

In [None]:
prompt = ["albert einstein, modern disney style",
          "modern disney style rolls-royce in the desert, golden hour",
          "modern disney style helicopter"]

seed = 42
print("Seed: ".format(str(seed)))
generator = torch.Generator("cuda").manual_seed(seed)
imgs = modi(prompt, generator=generator).images

grid = grid_img(imgs, rows=1, cols=3, scale=0.75)
grid

## Changing the Scheduler (sampler)

The model is usually not trained to directly predict a slightly less noisy image, but rather to predict the '***noise residual***' which is the difference between a less noisy image and the input image or, similarly, the gradient between the two time steps.

To do the denoising process, a specific ***noise scheduling algorithm*** is thus necessary and "wrap" the model to define how many diffusion steps are needed for inference as well as how to *compute* a less noisy image from the model's output. Here is where the different ***schedulers*** of the diffusers library come into play.

Scheduler algorithms (also called samplers) calculates the predicted denoised image  representation from the previous noise representation and the predicted noise residual. Determines how the image is calculated. There are several different algorithms.

Some examples commonly used with Stable Diffusion :

* PNDM (default)
* DDIM Scheduler
* K-LMS Scheduler
* Euler Ancestral Discrete Scheduler (Euler A)
* DPM Scheduler

For technical details on theory and mathematics of the algorithms, refer to [this](https://arxiv.org/abs/2206.00364) paper: Elucidating the Design Space of Diffusion-Based Generative Models.

For comparison of different samplers, refer [here](https://www.artstation.com/blogs/kaddoura/pBPo/stable-diffusion-samplers).

To know more about available schedulers on HuggingFace, refer [here](https://huggingface.co/docs/diffusers/using-diffusers/schedulers#schedulers-summary).

Default is [PNDMScheduler](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py).


In [None]:
# Trying with verion 1.5
sd15 = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
sd15 = sd15.to("cuda")
sd15.enable_attention_slicing()
sd15.enable_xformers_memory_efficient_attention()

In [None]:
# Displaying the scheduler attributes
sd15.scheduler

In [None]:
seed = 555
prompt = "photo of a cute baby girl wearing sunglasses, on the beach, ocean in the background"
generator = torch.Generator('cuda').manual_seed(seed)
img = sd15(prompt, generator=generator).images[0]
img

In [None]:
# Getting other compatible scheduler with the algorithm
sd15.scheduler.compatibles

In [None]:
# Checking the current config
sd15.scheduler.config

#### Changing the scheduler to `DDIMScheduler`

In [None]:
from diffusers import DDIMScheduler
sd15.scheduler = DDIMScheduler.from_config(sd15.scheduler.config)

In [None]:
generator = torch.Generator(device = 'cuda').manual_seed(seed)
img = sd15(prompt, generator=generator).images[0]
img

#### Changing the scheduler to `LMSDiscreteScheduler`

In [None]:
from diffusers import LMSDiscreteScheduler

sd15.scheduler = LMSDiscreteScheduler.from_config(sd15.scheduler.config)

generator = torch.Generator(device = 'cuda').manual_seed(seed)
img = sd15(prompt, num_inference_steps = 50, generator=generator).images[0]
# num_inference_steps = 60, try with different value
img

#### Changing the scheduler to `EulerAncestralDiscreteScheduler`

In [None]:
from diffusers import EulerAncestralDiscreteScheduler

sd15.scheduler = EulerAncestralDiscreteScheduler.from_config(sd15.scheduler.config)

generator = torch.Generator(device="cuda").manual_seed(seed)
img = sd15(prompt, generator=generator, num_inference_steps=60).images[0]
img

#### Changing the scheduler to `EulerDiscreteScheduler`

In [None]:
from diffusers import EulerDiscreteScheduler

sd15.scheduler = EulerDiscreteScheduler.from_config(sd15.scheduler.config)

generator = torch.Generator(device="cuda").manual_seed(seed)
img = sd15(prompt, generator=generator, num_inference_steps=50).images[0]
img

In all the above examples, try with negative prompt to improve the results.

### **NOTE**:
It is important to check the license of the models, especially if you intend to make some commercial use with the obtained results.

1. You can't use the model to deliberately produce nor share illegal or harmful outputs or content,
2. We claim no rights on the outputs you generate, you are free to use them and are accountable for their use which should not go against the provisions set in the license, and
3. You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users.
```
(Licence of v1.4 e v1.5 https://huggingface.co/spaces/CompVis/stable-diffusion-license)


### References:

1. [How diffusion models work: the math from scratch](https://theaisummer.com/diffusion-models/)

2. [The Illustrated Stable Diffusion](https://jalammar.github.io/illustrated-stable-diffusion/)

3. [Stable Diffusion using diffusers](https://www.cnblogs.com/flyingsir/p/17175876.html)

4. [Stable Diffusion Paper](https://arxiv.org/abs/2112.10752) - High-Resolution Image Synthesis with Latent Diffusion Models

5. [DreamBooth Paper](https://arxiv.org/pdf/2208.12242.pdf) - DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
6. [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf)
7. [Classifier-Free Diffusion Guidance Paper](https://arxiv.org/pdf/2207.12598.pdf)
8. [Generating images with Stable Diffusion](https://blog.paperspace.com/generating-images-with-stable-diffusion/)

9. [DreamBooth](https://huggingface.co/docs/diffusers/training/dreambooth)
10. [Stable Diffusion prompt: a definitive guide](https://stable-diffusion-art.com/prompt-guide/)
11. [Text-guided image-to-image generation](https://huggingface.co/docs/diffusers/using-diffusers/img2img)