## DDIM with similar noise

In this notebook, we will conduct experiments where we will use two noise images as input, which differ only slightly or in small areas. Subsequently, we will examine the extent and location of the differences in the generated clear images and attempt to interpret the results.

As you will see, we often display images alternately and provide both **heatmaps** and **Euclidean distances** to help you precisely identify the degree and location of image differences.

## Setup

First, we initialize our model. Note that we have chosen a DDIM scheduler. This is crucial. Unlike Denoising Diffusion Probabilistic Models (DDPM), DDIMs work deterministically, i.e., from one specific full noise image, they will always generate the same clear image. If we used a probabilistic model, a small amount of noise would be added back into the image after every backward step, which would cause the results to vary significantly, even with the same starting point.

**Note on performance:** If the image generation process is too slow, you can decrease the number of sampling steps. This will reduce computation time but may also lower the quality of the generated images. To do this, choose a lower value in `ddim_scheduler.set_timesteps(50)` (see code cell 3).

In [None]:
from functions import *

In [None]:
import torch
import diffusers
from PIL import Image
from tqdm import tqdm
import os

In [None]:
# setup
model_id = "google/ddpm-bedroom-256" # "google/ddpm-celebahq-256"
model = diffusers.UNet2DModel.from_pretrained(model_id)
ddim_scheduler = diffusers.DDIMScheduler.from_pretrained(model_id)
# time steps for generation process, can be decreased to save computation time, images might be of lower quality
ddim_scheduler.set_timesteps(50)

In [None]:
# input prepraration
image_size = model.config.sample_size # get image size
noise = torch.randn((1, 3, image_size, image_size)) # sample random noise

## 1. Experiment

In this experiment, we will modify a small patch within a full noise image and then generate two distinct images from it. You can customize the patch's size and position by adjusting the `patch_size`, `patch_position_x`, and `patch_position_y` variables in `/src/config.yml`. As you run the code, pay close attention to the extent and, more importantly, **the location** of the differences in the generated images.

We will begin by generating two noise images that vary in a small region; these will subsequently serve as inputs for our decoder, which is the image generation process.

In [None]:
# Letting the user choose patchsize and position
patch_size, patch_position_x, patch_position_y = user_input_patch()

# prepare input
noise = torch.randn((1, 3, image_size, image_size)) # sample random noise
noises = [noise.clone() for _ in range(2)] # duplicate noise
noises[1][:,:,patch_position_y:patch_position_y+patch_size,patch_position_x:patch_position_x+patch_size] = torch.randn((1, 3, patch_size, patch_size)) # change a small patch in one of the full noise pictures
euclidean = torch.norm((noises[0]-noises[1])).item() # calculate euclidean distance
show_table([[tensor_as_html(noises[0]), tensor_as_html(noises[1]), tensor_as_html(torch.abs(noises[0]-noises[1]).mean(dim=1,keepdim=True).repeat(1,3,1,1))], ["", f"Euclidean distance: {euclidean}", "Heatmap"]])#display images, euclidean, heatmap

In [None]:
# display the images alternately
from time import sleep
for i in range(10):
    show_images(noises[i % 2])
    sleep(0.5)

Now, the image generation process is executed. Please examine the outputs closely. You will observe that altering a small patch in the input noise image affects not only that specific patch in the generated image but the entire image. This demonstrates how diffusion models capture long-range dependencies between pixels by learning the probability distributions from the training dataset.

In [None]:
# output generation
images = list()
for current in noises:
    for t in tqdm(ddim_scheduler.timesteps):
        with torch.no_grad():
            predicted_noise = model(current, t).sample
            current = ddim_scheduler.step(predicted_noise, t, current).prev_sample
    images.append(current)

In [None]:
# show output
euclidean = torch.norm((images[0]-images[1])).item() # calculate euclidean distance
show_table([[tensor_as_html(images[0]), tensor_as_html(images[1]), tensor_as_html(torch.abs(images[0]-images[1]).mean(dim=1,keepdim=True).repeat(1,3,1,1))], ["", f"Euclidean distance: {euclidean}", "Heatmap"]])#display images, euclidean, heatmap

In [None]:
# save input and output
for i in range(len(noises)):
    tensor_as_image(noises[i]).save(f"../output/similar_ddim_noise_{i}.png")
    tensor_as_image(images[i]).save(f"../output/similar_ddim_image_{i}.png")

## 2. Experiment

In this experiment, we will modify the full noise images such that the mathematical distance between the two input noise images is small. You can control the degree to which the noises differ by adjusting the `noise_scaling_factor` variable in `/src/config.yml`.

We will again begin by generating two noise images, but unlike the first experiment, they will differ slightly across the entire image rather than heavily in a small patch.

In [None]:
#Letting the user choose a scaling factor
noise_scaling_factor = user_input_noise_scaling_factor()     

# prepare input
noise = torch.randn((1, 3, image_size, image_size)) # sample random noise
noises = [noise.clone() for _ in range(2)] # duplicate noise
noises[1] = ((1-noise_scaling_factor**2)**0.5) * noises[1] + noise_scaling_factor * torch.randn((1, 3, image_size, image_size)) # change one of the full noise pictures by adding newly generated noise scaled down heavily
euclidean = torch.norm((noises[0]-noises[1])).item() # calculate euclidean distance
show_table([[tensor_as_html(noises[0]), tensor_as_html(noises[1]), tensor_as_html(torch.abs(noises[0]-noises[1]).mean(dim=1,keepdim=True).repeat(1,3,1,1))], ["", f"Euclidean distance: {euclidean}", "Heatmap"]])#display images, euclidean, heatmap

Now, the actual image generation is executed. Please examine the outputs closely. As you will see, slightly changing the input noise leads to only small changes in the resulting clear image.

In [None]:
# output generation
images = list()
for current in noises:
    for t in tqdm(ddim_scheduler.timesteps):
        with torch.no_grad():
            predicted_noise = model(current, t).sample
            current = ddim_scheduler.step(predicted_noise, t, current).prev_sample
    images.append(current)

In [None]:
# show output
euclidean = torch.norm((images[0]-images[1])).item() # calculate euclidean distance
show_table([[tensor_as_html(images[0]), tensor_as_html(images[1]), tensor_as_html(torch.abs(images[0]-images[1]).mean(dim=1,keepdim=True).repeat(1,3,1,1))], ["", f"Euclidean distance: {euclidean}", "Heatmap"]])#display images, euclidean, heatmap

Now that you're familiar with the two experiments, you can perform your own by changing the relevant variables in `/src/config.yml`.

Some interesting research questions you might explore include:

* Is there a correlation between the Euclidean distance of the noise images and that of the clear images?
* Is the Euclidean distance of the noise images generally smaller or larger than that of the clear images? Try to explain this based on how the diffusion process works.
* Given a similar Euclidean distance of noise images in both experiments, is the result altered more significantly in the patch condition or the full noise condition?
* In Experiment 2: How much do you need to change the noise image to obtain substantially different results?