### PSA:

This notebook is going to have a laid-back tone so as to ensure that whoever goes through this not only understands what is happening, but also knows of my personal thoughts on the paper as I read through it. A good way to think of this is me giving a semi-live commentary on the paper as I read through it while coding it up. I will not be covering every single detail since it would lead to too much distraction, but do expect a few out-of-context comments here and there. 

# U-Nets

U-Nets are one of the first few segmentation algorithms, after the fully convolutional networks; thet pioneered this particular subdomain of image segmentation. Initially created for biomedical image segmentation, this algorithm is a staple in computer vision and has found many uses over the years ranging from autonomous vehicles and (the obvious) biomedical segmentation, to diffusion frameworks like Dall-E and Midjourney.

This model makes use of the **encoder-decoder** architecture alongside **residual connections**, leading to an impressive level of clarity in the segmentation maps. I really want to see this in action here which is why I will be doing what I usually do with deep learning networks, that is, breaking it open to see how it functions between layers.

In this notebook I will be looking at how this model works following the paper as I go along with the code. I will be looking at segmenting images, and down the line, will eventually explore the generative properties of this model in another project. Going by the theme of the original paper, I will be experimenting with the [DRIVE](https://drive.grand-challenge.org/DRIVE/) dataset, which is openly accessible (you will need to sign up in the link above to be able to download the dataset).

Unzip the training and test datasets from the link and add them into the `/data` folder for this code to work.

Now, let's finally get to it! We load up the packages we will need for the project.

In [1]:
import numpy as np
import gc
import torch
import torch.nn as nn
from torch.utils.data import Dataset
from torchvision import transforms
from matplotlib import pyplot as plt 

From the paper by [Ronneberger et al. (2015)](https://arxiv.org/pdf/1505.04597) we are aware that historically CNNs have been good at classification tasks. However, in uses cases like biomedical imaging we might need to:
- Localize parts our image
- Take into account the fact that we might not have a huge dataset to work with.

The U-Net builds upon the fully convolutional network to ensure that it can work with a very small dataset and yield precise segmentations.

**Personal Note:** This paper has a lot of exposition describing efforts by researchers to get to this point. I personally really like this approach, and it seems to put things in a very easy-to-follow manner. I will, however be avoiding minutiae from this point forward unless it is relevant to our use case.

## U-Net Architecture

The most common diagram of the U-Net is taken directly from the paper as below

![image.png](images/image.png)

Here I notice a few main patters that will help code up this model.
- The first pattern is
  $$2 \times (\text{conv} + \text{RELU}) + \text{Maxpool}$$
  This is repeated four times to bring the dimensions of the original image down from