
## Overview

This project aims to delve into the workings of diffusion models, a class of generative models, and their application in image generation, explicitly using the CelebA dataset. A significant focus will be on understanding U-Net architectures and their role in diffusion models.

## Objectives

1. Understanding U-Net and Diffusion Models:
   - Begin with a comprehensive exploration of U-Net architectures, referencing the U-Net Paper: https://arxiv.org/abs/1505.04597.
   - Dive into the theory and mechanics of diffusion models, using resources like the Score-Based Generative Modeling Paper: https://arxiv.org/abs/2011.13456 and relevant blog posts.
   

2. Dataset Preparation:
   - Utilize the CelebA dataset, ensuring appropriate preprocessing for diffusion models.


3. Implementing Diffusion Models:
   - Implement a diffusion model suitable for generating images from the CelebA dataset. Focus on integrating U-Net as a component of the diffusion model.
   - Utilize tutorials and resources such as Denoising Diffusion Probabilistic Models: https://arxiv.org/abs/2006.11239 and the Tutorial on Diffusion Model: https://github.com/d9w/gen_models/blob/main/Score_Based_Generative_Modeling.ipynb.


4. Generating Images with Diffusion Models:
   - Apply the implemented diffusion model to generate images from the CelebA dataset. Evaluate the quality and fidelity of the generated images.


5. Documentation:
   - Create a detailed README file outlining the project's goals, methods, and key findings.
   - Write a `REPORT.MD` that provides an in-depth explanation of U-Net architectures and diffusion models, including their mathematical and practical aspects in image generation.


6. Analysis and Results:
   - Analyze the performance of the diffusion model in generating images. Discuss any challenges encountered and how they were addressed.


## Expected Deliverables

1. Codebase: Complete Python code for implementing and training the diffusion model with U-Net architecture.
2. Generated Images: A collection of images generated from the CelebA dataset using the diffusion model.
3. Documentation:
   - `README.md`: A concise document summarizing the project’s objectives, methodology, and outcomes.
   - `REPORT.MD`: A comprehensive report detailing the theoretical background and practical implementation of U-Net and diffusion models in image generation.

## Resources and References

- U-Net Paper: https://arxiv.org/abs/1505.04597
- Tutorial on Diffusion Model: https://github.com/d9w/gen_models/blob/main/Score_Based_Generative_Modeling.ipynb
- Score-Based Generative Modeling through Stochastic Differential Equations: https://arxiv.org/abs/2011.13456
- Denoising Diffusion Probabilistic Models: https://arxiv.org/abs/2006.11239
- Blog posts on diffusion models: Post 1: https://yang-song.net/blog/2021/score/, Post 2: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/


## 1. Understanding U-Net and Diffusion Models:

Begin with a comprehensive exploration of U-Net architectures, referencing the U-Net Paper: https://arxiv.org/abs/1505.04597.
Dive into the theory and mechanics of diffusion models, using resources like the Score-Based Generative Modeling Paper: https://arxiv.org/abs/2011.13456 and relevant blog posts.

### What are U-Net architectures

The U-net architecture is a specific architecture introduced by Ronneberger et al. (2015) for biological image segmentation and was lately adopted for all other kinds of tasks. It is particularly useful for image segmentations, hip resolution, and diffusion models. It's called U-net because of the U-shape shape of its architecture. 

In the original paper, the architecture is presented as follows:

<div>
<img src="attachment:U-Net%20paper.png" width="500"/>
</div>


This network architecture is a convolutional neural network model that can be devided into two different phases: a contractig path and an expansive path. It is imporant to point out that both paths are symmetric.
    
The contracting path (left side of the previous image), also known as encoder, is the "descending" part. During this phase, the model extracts information about what is present in the image, to the detriment of spatial and contextual information. It follows the typical architecture of a convolutional neural network. It is made up of repeated 3x3 convolutional layers at each of the stages, each followed by a ReLU activation function that is applied to each element for each of the features.  Between the stages, a 2x2 max pooling operation is used for downsampling the features. This max pooling operation reduces the spatial dimensions of the features, so to compensate for it the channels are doubled after each downsampling.

The expansive path (right side of the previous image), also known as decode, is the "ascending" part after the contraction point of the model. The objective is to reconstruct the input using the information extracted during the contracting path. It is also composed of repeated 3x3 convolutional layers followed by a ReLU activation function. Instead of downsampling with max pooling, the decoder upsamples the current set of features. It then applies a 2x2 convolutional layer that halves the number of channels. The up sampling operation is used to restore the spational resolution of the features that were lost during the contracting phase.

Besides the encoder and decoder, thera are two other elements that are important in the U-net architecture: the bottlenet and the connecting paths. 

The Connecting paths take a copy of the features from the symmetrical part of the encoder and concatenate them onto their opposing stage in the decoder. 

The bottleneck is where the encoder switches into the decoder. First, we downsample the features with a 2x2 max pooling operation. Then we pass them through the repeated 3x3 convolutional layers followed by a ReLU activation function and we double the channels. Finally, we upsample them again to their previous resolution.

Here there is a simplified U-Net drawing that I made to make the distinction between this four element clearer:





<div>
<img src="attachment:U-Net-parts.png" width="500"/>
</div>

### Theory and Mechanics of Diffusion Models

Dive into the theory and mechanics of diffusion models, using resources like the Score-Based Generative Modeling Paper: https://arxiv.org/abs/2011.13456 and relevant blog posts.
