# <center>Introduction to Image Generation</center>


***This notebook is to provide a introduction to Diffusion Models.***


While many approaches have been implemented for image generation some of the more promising ones over time have been model families like:

- **Variational Autoencoders (VAEs)**:
    - *Encode images to a compressed size, then decode them back to the original size while learning the distribution of the data itself.*

- **Generative Adversarial Models (GANs):**
    - *Pit two neural networks against each other: One neural network (**The generator**) creates images and the other neural network (**The discriminator**) predicts if the image is real or fake.*
    - *Over time, the discriminator gets better at distinguishing between real and fake and the generator gets better at creating real looking fakes*

- **Autoregressive Models**:
    - *Generate images by treating an image as a sequence of pixels.*
    - The modern approach with autoregressive models actually draws much of its inspiration from how llm's handle text.

One of the newer Generation Model Families is <span style="font-size: 1.05em;">**Diffusion Models**</span>.

Diffusion Models draw inspiration from physics, specifically, thermodynamics, showing promise across a number of different use cases.

- **Unconditioned Diffusion Models**:
    - Models have no additional input or instruction can be trained on images of a specific thing to generate new images of that thing. 
        - Ex: *Human faces Systhesis*,*Super resolution increase image quality*

- **Conditioned Diffusion Models**:
    - Give us things like text-to-image, generating an image from a text prompt.
        - Ex: "*Monalise with cat face*"
    - Image editing or customizing an image with a text prompt.
        - Ex: "*Remove the woman from the image*"
    - Text-guided image to image.
        - Ex: "*Disco dancer with colorful lights*"

### <center>**Diffusion Model: What is it?**</center>

It's quite different than any other image generation approaches:
- Destroy structure in a data distribution through an **iterative forward diffusion process**.
- **Learn a reverse diffusion process** that restores structure in data.

![image.png](attachment:image.png)

### **Denoising Diffusion Probabilistic Models (DDPM)**

The goal is that by training to denoise, that model will be able to take in *Pure Noise* and from it syntesize a novel image. 

I know there's a bit of math notation, so let's break it down a little bit.

#### **Diffusion Process adds noise to images**

We start a large dataeset of images, let's take a single image shown here on the right. We start the ***foward diffusion*** process to go from $X_0$(the initial image) to $X_1$ (the original image with a little bit of noise). 

We can do this over and onver again iteratively to add more and more noise to the image.

This distribution we call **q** only depends on the previous step and we can apply it over and over adding more noise and, ideally, once we do this for high enough T, we have reached a state of **Pure Noise**. (*The initial research paper implemented this with T=1000*)


![image.png](attachment:image.png)

Now, we want to do this in **reverse**. 

**How do we go from $X_T$ (noisy image) to $X_{t-1}$ (slightly less noisy image)?**

We train a machine learning model that takes in as input the noisy image and T and predicts the noise.

### **DDPM Training**

We can visualize a training step of this model as:

- We have our initial image X and we sample at time step T to create a noisy image.
- Then, we train a denoising model to predict the noise. 
    - This model is trained to minimize the difference between the predicted noise and the actual noise added to the image.
    > In other words, this model is able to remove noise from real images.

![image.png](attachment:image.png)

### **DDPM Generation**

- TO generate an image, we can start with **Pure Noise** and send it through our denoising model.
- We can take, then, the predicted noise and subtract it from the initial noise. If we do this iteratively over and over, we end up with a generated image.
    > **Another way to look at this**: *The model is able to learn the real data distribuition of images it has seen and sample from that learned distribution to create new novel images*


## **Recent Advancements**

- THere have been Many advancements in the space in just a few years while many exciting new technologies on VertexAI for image generation are underpinned with diffusion models, lots of work has been done to generate images *faster* and with *more control*.

- We have also seen wonderful results combining the power of diffusion models with the power of LLM's for incredible context aware photorealistc image generation.
    - One great example of this: Imagine from Google Research.
        - Even that it is more complicated than we discussed here, at its core it's a composition of a LLM and a few diffusion based models.

![image.png](attachment:image.png)