# Huggingface diffusers

The Huggungfce Diffusers library is designed to be a user-friendly and flexible toolbox for building diffusion systems tailored to your use-case. At the core of the toolbox are models and schedulers. 

The [DiffusionPipeline](https://huggingface.co/docs/diffusers/main/en/api/pipelines/overview#diffusers.DiffusionPipeline) bundles these components together for convenience, which makes it super easy. 

But we will also unbundle the pipeline and use the models and schedulers separately to create new diffusion systems.

We will use the google/ddpm-cat-256 model which is about only 450 Mb.

### Contents
0. Installs and imports
1. Use pipeline
2. Break down the denoising process


!CUDA: If you want, you can speed up your inference by using NVIDIA's CUDA. See https://developer.nvidia.com/cuda-toolkit. But for now I have assumed we do not use it and I've outcommented the cuda code.

Acknowledgements: This notebook is based on a Huggingface tutorial that can be found here: https://huggingface.co/docs/diffusers/using-diffusers/write_own_pipeline 

## 0. Installs and imports

In [None]:
!pip install diffusers --upgrade

In [None]:
!pip install torch --upgrade

In [None]:
import diffusers
diffusers.__version__

In [None]:
import torch
torch.__version__

## 1. Use pipeline to download model & run inference
We will download a model called ddpm-cat-256. 
This is the model developed by the authors of the Diffusion model paper.
It is ±450 Mb in size, so for a diffusion model very, very small.

Check the model here: https://huggingface.co/google/ddpm-cat-256 

A pipeline is a quick and easy way to run a model for inference, requiring no more than four lines of code to generate an image:

In [None]:
from diffusers import DDPMPipeline

#Download the model from Huggingface 
#I've commented out the CUDA part as I assume most of you have not enabled cuda
ddpm = DDPMPipeline.from_pretrained("google/ddpm-cat-256")#.to("cuda")

#Run inference and display the image
image = ddpm(num_inference_steps=25).images[0]
image

That was super easy, but how did the pipeline do that? Let's breakdown the pipeline and take a look at what's happening under the hood.

In the example above, the pipeline contains a [UNet2DModel](https://huggingface.co/docs/diffusers/main/en/api/models/unet2d#diffusers.UNet2DModel) model and a [DDPMScheduler](https://huggingface.co/docs/diffusers/main/en/api/schedulers/ddpm#diffusers.DDPMScheduler). The pipeline denoises an image by taking random noise the size of the desired output and passing it through the model several times. At each timestep, the model predicts the *noise residual* and the scheduler uses it to predict a less noisy image. The pipeline repeats this process until it reaches the end of the specified number of inference steps.


## 2. Break down the denoising process

To recreate the pipeline with the model and scheduler separately, let's write our own denoising process.

1. Load the model and scheduler:

In [16]:
from diffusers import DDPMScheduler, UNet2DModel

scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256")#.to("cuda")

2. Set the number of timesteps to run the denoising process for:

In [17]:
scheduler.set_timesteps(50)

3. Setting the scheduler timesteps creates a tensor with evenly spaced elements in it, 50 in this example. Each element corresponds to a timestep at which the model denoises an image. When you create the denoising loop later, you'll iterate over this tensor to denoise an image:

In [None]:
scheduler.timesteps

4. Create some random noise with the same shape as the desired output:

In [19]:
import torch

sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size))#.to("cuda")

5. Now write a loop to iterate over the timesteps. At each timestep, the model does a [UNet2DModel.forward()](https://huggingface.co/docs/diffusers/main/en/api/models/unet2d#diffusers.UNet2DModel.forward) pass and returns the noisy residual. The scheduler's [step()](https://huggingface.co/docs/diffusers/main/en/api/schedulers/ddpm#diffusers.DDPMScheduler.step) method takes the noisy residual, timestep, and input and it predicts the image at the previous timestep. This output becomes the next input to the model in the denoising loop, and it'll repeat until it reaches the end of the `timesteps` array.

In [20]:
input = noise

for t in scheduler.timesteps:
    with torch.no_grad():
        noisy_residual = model(input, t).sample
    previous_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
    input = previous_noisy_sample

This is the entire denoising process, and you can use this same pattern to write any diffusion system.

6. The last step is to convert the denoised output into an image:

In [None]:
from PIL import Image
import numpy as np

image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image

That was it! You will see: the quality of the images is mediocre given the size of the model "google/ddpm-cat-256" that is just 450 Mb. By using a bigger model we can increase quality of the images. 

## Exercise: beantwoord de vragen

1. Waarom is het gebruik van pipelines handig?
2. Wat is een UNet?
3. beschrijf het denoising process in je eigen woorden


</Tip>