<a href="https://colab.research.google.com/github/TariqLisse/SequentialArt_StableDiffusion/blob/main/stablediffusion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install --upgrade diffusers accelerate transformers

Collecting diffusers
  Downloading diffusers-0.27.2-py3-none-any.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.29.2-py3-none-any.whl (297 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.4/297.4 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
Collecting transformers
  Downloading transformers-4.39.3-py3-none-any.whl (8.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m27.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1

In [None]:
'''The first step is to utilize a text-to-image pipeline to generate an image from a text description
A diffusion model take a prompt (text description) and some random initial noise,
and iteratively removes the noise to contruct an image.

The denoising process is is guided by the prompt, and once the denoising process ends after a
predetermined number of time steps, the image representation is decoded into an image

1. Load a checkpoint into the AutoPipelineForText2Image class, which automatically detects the appropriate
pipeline class to use based on the checkpoint:'''

from diffusers import AutoPipelineForText2Image
import torch

pipeline = AutoPipelineForText2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
).to("cuda")

'''2. Pass a prompt to the pipeline to generate an image:

The height and width parameters control the height and width (in pixels) of the generated image

Guidance Scale parameter affects how much the prompt influences image generation.
A lower value gives the model "creativity" to generate images that are more loosely related to the prompt.
Higher guidance_scale values push the model to follow the prompt more closely,
If the value is too high, some artifacts are observable in the generated image.

Negative Prompt: A negative prompt steers the model away from things you don't want the model to generate.
This is used to improve overall quality by removing poor or bad image features such as "low resolution" or "bad details".
You can also use a negative prompt to remove or modify the content and style of an image.

A torch.Generator object enables reproducibility in a pipeline by setting a manual seed. You can use a Generator
to generate batches of images and iteratively improve on an image generated from a seed. You can set a seed and
creating an image with a Generator should return the same result each time instead of randomly generating a new image
'''

generator = torch.Generator(device="cuda").manual_seed(30)

image = pipeline(
    prompt="a squirrel in the field in the style of picasso",
    height=768, width=512, guidance_scale=12.5,
    negative_prompt="bad anatomy, bad composition, ugly, abnormal, unrealistic, double, contorted, disfigured, malformed, amateur, extra, duplicate",
    generator=generator,
).images[0]
image
image.save("squirrel.png")

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
'''Extending the pipeline to generate a new image
based on the saved image and a new prompt by using an Image2Image pipeline'''

import torch
from diffusers import AutoPipelineForImage2Image
from diffusers.utils import make_image_grid, load_image

pipeline = AutoPipelineForImage2Image.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipeline.enable_model_cpu_offload()

# prepare image
saved_image = "squirrel.png"
init_image = load_image(saved_image)

'''1. Strength is an important paramater to consider as it has a huge impact on the generated image.
Strength determines how much the generated image resembles the initial image.
A higher strength value gives the model more "creativity" to generate an image that's different from the initial image;
A strength value of 1.0 means the initial image is more or less ignored

A lower strength value means the generated image is more similar to the initial image

strength and num_inference_steps parameters are related because strength determines the number of noise steps to add.
For example, if the num_inference_steps is 50 and strength is 0.8, then this means adding 40 (50 * 0.8)
steps of noise to the initial image and then denoising for 40 steps to get the newly generated image

The guidance_scale parameter is used to control how closely aligned the generated image and text prompt are.
A higher guidance_scale value means your generated image is more aligned with the prompt
A lower guidance_scale value means your generated image has more space to deviate from the prompt

You can combine guidance_scale with strength for even more precise control over how expressive the model is.
For example, combining a high strength + guidance_scale for maximum creativity or use a combination of low strength
and low guidance_scale to generate an image that resembles the initial image but is not as strictly bound to the prompt.

A negative prompt conditions the model to NOT include things in an image, and can be used to improve image quality or modify an image'''

prompt = " squirrel in the field in the style of picasso"
negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"
generator = torch.Generator(device="cuda").manual_seed(30)

# Pass Prompt and image to pipeline
image = pipeline(prompt, negative_prompt=negative_prompt, image=init_image, strength=0.75, guidance_scale=12.5, generator=generator).images[0]
make_image_grid([init_image, image], rows=1, cols=2)
image.save("new_squirrel.png")



Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/37 [00:00<?, ?it/s]