# Diffusion Models 

## Introduction and Formulation

A Diffusion model are a class of **Generative AI** models that generates high resolution images of varying quality. They work by gradually adding *Gaussian* noise (forward diffusion process) and then learning to remove the noise (reverse diffusion process). They are similar to VAEs (Variational Autoencoders).

<figure>
    <center><img src="img/diffusion.png" width="700" height="200">
    <figcaption>Fig: Forward and Backward Diffusion Process</figcaption></center>
</figure>

## Forward Diffusion

Let $x_0$ the input image. We start by sampling a point $x_0$ from the real distribution $q(x)$ ($x_0 \sim q(x)$) and then adding *Gaussian Noise* with variance $\beta_t$ by 
$$ q(x_t|x_{t-1}) := \mathcal{N}(x_t ; \sqrt{1-\beta_t} x_{t-1} , \beta_t I) \quad ; \quad q(x_{1:T} | x_0) = \prod_{t=1}^{T}q(x_t|x_{t-1})$$ 

Here $\sqrt{1-\beta_t}$ is a scaling factor for values of $\beta_t$ between 0 and 1 that modulates the impact of the previous state in the diffusion process. 

Note that $x_t = \sqrt{(1-\beta_t)}x_{t-1} + \sqrt{\beta_t}\epsilon_{t-1}$ where $\epsilon_{t} \sim \mathcal{N}(0, I)$. If we define $\alpha_t = 1-\beta_t$ and $\overline{\alpha}_t = \prod_{s=0}^{t}\alpha_s$ then 
$$x_t = \sqrt{\alpha_t}x_{t-1} + \sqrt{1-\alpha_t}\epsilon_{t-1} = \sqrt{\alpha_t\alpha_{t-1}}x_{t-2} + \sqrt{\alpha_t}\sqrt{1-\alpha_{t-1}}\epsilon_{t-2} + \sqrt{1-\alpha_t}\epsilon_{t-1}$$ 
But $ \sqrt{\alpha_t}\sqrt{1-\alpha_{t-1}}\epsilon_{t-2} + \sqrt{1-\alpha_t}\epsilon_{t-1}$ is the noise added by the sum of two gaussian noises which is the same as the noise added by $\sqrt{1-\alpha_t \alpha_{t-1}}\epsilon$. Following the recursion, we obtain
$$x_t = \sqrt{\overline{\alpha}_t}x_0 + \sqrt{1-\overline{\alpha}_t}\epsilon_0$$

That is, we can sample in a more efficient way from $q(x_t | x_0)$ with this **reparametrization trick**

## Backward Difussion

## Implementation

### Random Noise

In [1]:
from datasets import load_dataset

dataset = load_dataset("huggan/smithsonian_butterflies_subset", split="test")
dataset

  from .autonotebook import tqdm as notebook_tqdm
Downloading data:   2%|▏         | 4.19M/237M [00:14<13:26, 289kB/s]

KeyboardInterrupt: 

Downloading data:   4%|▎         | 8.39M/237M [00:25<13:12, 289kB/s]