<a href="https://colab.research.google.com/github/ZINGALOOME/DiffModel/blob/main/ImageGenerationPersonalProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
""" This google colab notebook is going to be dedicated towards an image generation project"""

' This google colab notebook is going to be dedicated towards an image generation project'

A diffusion model works by taking an input x0 and gradually adding gaussian noise to it for T steps. This is called the forward process.

After this a neural network is trained to recover an image from said noise by performing a reversion of the noise process. By being able to model this reverse process we are able to generate new data

# Step 1: Forward diffusion

Diffusion models can be seen as latent variable models. Latent means we are reffering to a hidden feature space which makes them look similar to variational autoencoders.

They are formulated by using a Markov Chain of T steps. In this context the markov chain just means that each step depends on the previous step.

Given a data point x0 sampled from the real data distribution q(x) (x ~  q(x)).

At each step of the Markov Chain we add Gaussian noise with variance with variance Beta_t to x_t-1, producing a new latent variable x_t with data distribution q(x_t | x_t-1)

q(x_t | x_t-1) = N(x_t; mu_t = sqrt(1-Beta_t)x_t-1, sigma_t = Beta_t)

q(x|x_t-1) is still a normal distribution defined by the mean "mu" and the variance "sigma" where mu_t = sqrt(1 - Beta_t)x_t-1 and sigma_t = Beta_t * I * sigma will always be a diagonal matrix of variances (here Beta_t)

Thus we can go in a closed form from the input data x_0 to x_T in a tractable way. Mathematically, this is the posterior probability and is defined as:

q(x_1:T | x_0) = cumulative_prod(from t=1 to T) q(x_t| x_t-1)

The symbol: in q(x_1:T) states that we apply q repeatedly from timestep 1 to T. It's also called the trajectory.

In order to avoid having to avoid having to repeatedly apply q for T steps we can perform the reparameterization trick

# Step 2: Tractable closed-form sampling at any timestep

Assuming we define alpha_t = 1 - beta_t, cumulative_alpha_t = cumulative_prod(from s = 0 to t)alpha_s where e_0, ...., e_t-2, e_t-1 ~ N(O, I), one can use the reparameterization trick in a recursive manner to obtain x_t = sqrt(cum_prod(a_t))x_0 + sqrt(1-cum_prod(a_t)e_0)

x_t ~ q(x_t|x_0) = N(x_t; sqrt(cum_prod(a_t))x_0, (1-cum_prod(a_t)I)

Since Beta_t is a hyperparameter, we can precompute a_t and cum_prod(a_t) for all timesteps t and get x_t in one go.

# Step 3: Variance Schedule

The variance parameter Beta_t can be fixed to a constant or chosen asa schedule over T timesteps. The schedule can be linear, quadratic, cosine, etc. The original DDPM authors used a linear schedule increasing from Beta_1 = 10^-4 to B_T = 0.02.

# Step 4: Reverse diffusion

As T approaches infinity, the latent x_T is nearly an isotropic Gaussian distribution. Which means if we manage to learn the distribution q(x_t-1| x_t), we can sample x_T from N(0,I), run the reverse process and acquire a sample from q(x_0) generatinga novel data point from the orignal data distribution, but how do we model the reverse distribution process?

# Step 5: Approximating the reverse process with a neural network

In practical terms, we don't know q(x_t-1|x_t) it's intractable since statistical estimates of q(x_t-1|x_t) require computations involving the data distribution.

Instead we approximate q(x_t-1| x_t) with a parameterized model p_0 (e.g. a neural network). Since q(x_t-1|x_t) will also be Gaussian, for small enough Beta_t, wecan choose p_0 to be Gaussian and just parameterize the mean and variance:

In [None]:
import torch

class Diffusion:

  def __init__(self , timesteps = 1000 ,img_shape = (3, 128, 128) ,device = "gpu", ):

    self.timesteps = timesteps
    self.img_shape = img_shape
    self.device = device
    self.initialise()

  def initialise(self):
    self.beta = self.get_betas()
    self.alpha = 1 - self.beta


    self.sqrt_beta = torch.sqrt(self.beta)
    self.sqrt_alpha = torch.sqrt(self.alpha)

    self.cumprod_alpha = torch.cumprod(self.alpha, dim=0)
    self.cumprod_beta = torch.cumprod(self.beta, dim=0)


    self.sqrt_alpha_cumulative = torch.sqrt(self.cumprod_alpha)
    self.sqrt_beta_cumulative = torch.sqrt(self.cumprod_beta)

    self.one_over_sqrt_alpha = 1/torch.sqrt(self.alpha)
    self.sqrt_one_minus_alpha_cumulative = torch.sqrt(1 - self.cumprod_alpha)


  def get_betas(self):

    scale = 1000/self.timesteps
    beta_start = scale * 1e-4
    beta_end = scale * 0.02

    return torch.linspace(
        beta_start,
        beta_end,
        self.timesteps,
        dtype=torch.float32,
        device=self.device,
    )

def forward_diffusion(sd: Diffusion, x0: torch.Tensor, timesteps: torch.Tensor):

  eps = torch.randn_like(x0)
  mean = get(sd.sqrt_alpha_cumulative, t=timesteps) * x0
  std_dev = get(sd.sqrt_one_minus_alpha_cumulative, t=timesteps)
  sample = mean * std_dev * eps



