## Play with Diffusion Model

This section is for people who haven't played with diffusion model before. Have fun!

Materials which are helpful during this process. Here are helpful materials:

- Math ideas: [Diffusion Models](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/)
- Follow the tutorial in Hugging Face: [Using Diffusers](https://huggingface.co/docs/diffusers/using-diffusers/write_own_pipeline)
- Web UI: [A Flexible Platform](https://github.com/AUTOMATIC1111/stable-diffusion-webui)


In [None]:
# Packages required
import torch
from PIL import Image
# Load the diffusion model for text2img and img2img
from diffusers import StableDiffusionPipeline,StableDiffusionImg2ImgPipeline

In [None]:
# run on GPU
device = "cuda"
# Set a seed for PyTorch
torch.manual_seed(2024)

In [None]:
# Load stable-diffusion v1.5
model_id_or_path = "runwayml/stable-diffusion-v1-5"
# Feel free to turn on safety_checker
pipe = StableDiffusionPipeline.from_pretrained(model_id_or_path, safety_checker=None,torch_dtype=torch.float16)
pipe.to(device);

### The first part is to create two base images for our analysis. Feel free to change prompts you prefer.

In [None]:
# 
prompt = ['a dog','a cat']
# How much we weight for text2img, default: 7.5
# To which extent, we hope that our image is closed to text measrued by CLIP
# Feel free to play to play with guidance scale between $[1,\infty]$
guidance_scale = 7.5

In [None]:
# images generated by pipe
# from a pure gaussian noise to images
images = pipe(prompt=prompt,guidance_scale=guidance_scale).images

In [None]:
# Save images, feel free to comment out if you want to use image provided
images[0].save('dog+.jpg')
images[1].save('cat+.jpg')

In [None]:
# play with negative guidance scale
images_negative = pipe(prompt=prompt,guidance_scale=-guidance_scale).images

In [None]:
# Save images, feel free to comment out if you want to use image provided
images_negative[0].save('dog-.jpg')
images_negative[1].save('cat-.jpg')

In the view of CLIP model, you can find that it doesn't capture semantic opposite structure such as cat and dog. It seems that CLIP captures covariance instead of some strict work embedding relationship.

### The second part is to do img2img variations on base images

In [None]:
# Load the base images
dog_image = Image.open('dog+.jpg')
cat_image = Image.open('cat+.jpg')

In [None]:
dog_image

In [None]:
# Load img2img pipeline
torch.cuda.empty_cache()
pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, safety_checker=None,torch_dtype=torch.float16)
pipe_img2img.to(device);

In [None]:
# Set where we start to remove noises,
# The value is between 0 and 1. If 1, we totally ignore the image
# default: 0.8
strength = 0.8
# Set how strong we condition on prompt
guidance_scale = 7.5

In [None]:
# generate image variations for dog
# + Image + Text
dog_images_pp = pipe_img2img(prompt = 'a dog',image = dog_image, strength =strength,
                                     guidance_scale = guidance_scale,num_images_per_prompt=1).images[0]

In [None]:
# You can see some random change has been updated
# Feel free to change the value of strength to see what happens
dog_images_pp 

In [None]:
# + Image - Text
dog_images_pn = pipe_img2img(prompt='',negative_prompt = 'a dog',image = dog_image, strength =strength,
                                     guidance_scale = guidance_scale,num_images_per_prompt=1).images[0]

In [None]:
dog_images_pn

In [None]:
# Imagine that we don't use the default negative_prompt in diffusion
# However, we offer a prompt which is negatove in human perspective
dog_images_pn_human = pipe_img2img(prompt='a cat',image = dog_image, strength =0.8,
                             guidance_scale = guidance_scale,num_images_per_prompt=1).images[0]

In [None]:
dog_images_pn_human

In [None]:
# -Image +prompt
dog_images_pn = pipe_img2img(prompt = 'a dog',image = cat_image, strength = strength,
                                     guidance_scale = guidance_scale,num_images_per_prompt=1).images[0]

In [None]:
dog_images_pn

In [None]:
# - Image -Prompt
dog_images_nn = pipe_img2img(prompt ='',negative_prompt='a dog',image = cat_image, strength = strength,
                                     guidance_scale = guidance_scale,num_images_per_prompt=1).images[0]

In [None]:
dog_images_nn

### The third part is to add a controlnet section to see how we can preserve particular information from original image and do some random change. Here, we take edge as an example

In [None]:
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
import cv2
import numpy as np

In [None]:
# Get edge by canny edge detection ad conditions
low_threshold = 100
high_threshold = 200

canny_edge = cv2.Canny(np.array(dog_image), low_threshold, high_threshold)
canny_edge = canny_edge[:, :, None]
canny_edge  = np.concatenate([canny_edge,canny_edge,canny_edge], axis=2)
canny_edge = Image.fromarray(canny_edge)

In [None]:
canny_edge

In [None]:
torch.cuda.empty_cache()

controlnet = ControlNetModel.from_pretrained(
        "lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16
)
pipe_canny = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, safety_checker=None, torch_dtype=torch.float16
)
pipe_canny.scheduler = UniPCMultistepScheduler.from_config(pipe_canny.scheduler.config)
pipe_canny.enable_xformers_memory_efficient_attention()
pipe_canny.enable_model_cpu_offload()

In [None]:
# We off both canny edge and text
dog_images_canny = pipe_canny('a dog',image=canny_edge,num_inference_steps=20,num_images_per_prompt=1).images[0]

In [None]:
dog_images_canny

In [None]:
# Based on edge information, let the model to guess
dog_images_canny_guess = pipe_canny("",image=canny_edge,num_inference_steps=20,num_images_per_prompt=1,guess_mode=True, guidance_scale=3.0).images[0]

In [None]:
dog_images_canny_guess 