In [0]:
import requests
from IPython.core.display import HTML
HTML(f"""
<style>
@import "https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css";
</style>
""")

# Image colorization network
<span style="color:red; font-size:20px;">Notice that this exercise does not explicitly appear on the exam questions, however you may use it as part of the architecture.</span>
The objective of this exercise is to implement a Unet CNN  model for image colorization. Solving the problem is an active field of research, so do not expect to obtain optimal results. Achieving perfect results is not the primary goal of this exercise. Instead, it is more important to reflect on the challenges and potential improvements in the various tasks. 
<article class="message is-danger">
  <div class="message-header">Important</div>
  <div class="message-body">

  Image generation/colorization is more memory and computation intensive than previous tasks and is also harder to train. While it is still possible to run the training for the current assignment on a CPU (an epoch will take around 20 min), it is recommended to use Google Colab, where an epoch takes around 6-7 minutes to complete. Only attempt this exercise, if you have plenty of time to complete the training.


  </div>
</article>
The following cell loads the libraries.


In [0]:
### required libraries
import os
import torch
import torchvision as tv
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import torchvision.transforms as transforms
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
# Add additional import to ignore warnings
import warnings
import torch.nn as nn

## Data and data loader
For this exercise you will use the Imagenette  dataset, which is a subset of Imagenet, containing 10 classes (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute). 
The labels are not needed, as the task is to learn a mapping from grayscale to color images.
The dataloader is already created for you below and it incrementally loads the dataset in batches to lessen memory requirements, returning the original and grayscale-converted images.


In [0]:
# Define the dataset class
class CustomImageDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = Path(root_dir)
        self.transform = transform
        # Recursively find all image files in root_dir
        self.img_files = [p for p in self.root_dir.rglob('*') if p.is_file() and not p.name.startswith('.')]

    def __len__(self):
        return len(self.img_files)

    def __getitem__(self, idx):
        img_path = self.img_files[idx]
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            image = Image.open(img_path).convert('RGB')

        if self.transform:
            image = self.transform(image)

        # Convert the transformed image to grayscale
        image_gray = rgb_to_grayscale(image)

        return image_gray, image
    
def rgb_to_grayscale(img_tensor):
    # Define the weights for the RGB channels, which need to match the input tensor shape
    r, g, b = img_tensor[0:1, :, :], img_tensor[1:2, :, :], img_tensor[2:3, :, :]
    # Apply the weights
    grayscale = 0.2989 * r + 0.5870 * g + 0.1140 * b
    # Repeat the grayscale image across 3 channels to maintain the original tensor shape
    return grayscale.repeat(3, 1, 1)

The cell below configures the dataloader for both training and validation data. 


In [0]:
# Define the transform
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor()
])

# Create the datasets
train_dataset = CustomImageDataset(root_dir='imagenette2-160/train', transform=transform)
val_dataset = CustomImageDataset(root_dir='imagenette2-160/val', transform=transform)

# Create the DataLoaders
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=4)

for i,(gray,org) in enumerate(train_loader):
    print("Grayscale images (the intensity is repeated across 3 channels to retain shape of original image)",gray.shape)
    print("Original data ",org.shape)
    
    if i>5:
        break

<article class="message is-info">
  <div class="message-header">Info</div>
  <div class="message-body">

  If you struggle with running the training, the first thing to try is decreasing the batch size.


  </div>
</article>
The cell below contains plotting functions for visualizing the original and grayscale images.


In [0]:
def show_pairs(dataloader, n_pairs=4,model=None,device='cpu'):
    for grayscale_images,original_images in dataloader:
        # Show the first 'n_pairs' images from the batch
        if model is None:
            fig, axs = plt.subplots(n_pairs, 2, figsize=(6, 3 * n_pairs))

            for i in range(n_pairs):
                # Original Image
                original_img = np.transpose(original_images[i].numpy(), (1, 2, 0))
                axs[i, 0].imshow(original_img)
                axs[i, 0].set_title('Original Image')
                axs[i, 0].axis('off')

                # Grayscale (or colorized) Image
                grayscale_img = np.transpose(grayscale_images[i].numpy(), (1, 2, 0))
                axs[i, 1].imshow(grayscale_img, cmap='gray' if grayscale_img.shape[2] == 1 else None)
                axs[i, 1].set_title('Transformed Image')
                axs[i, 1].axis('off')

            plt.tight_layout()
            plt.show()
        else:
            fig, axs = plt.subplots(n_pairs, 3, figsize=(6, 3 * n_pairs))
            model.to(device)
            colored_images = model(grayscale_images).detach().numpy()
            for i in range(n_pairs):

                # Grayscale (or colorized) Image
                transformed_img = np.transpose(grayscale_images[i].numpy(), (1, 2, 0))
                axs[i, 0].imshow(transformed_img, cmap='gray' if transformed_img.shape[2] == 1 else None)
                axs[i, 0].set_title('Transformed Image')
                axs[i, 0].axis('off')
                # Original Image
                original_img = np.transpose(original_images[i].numpy(), (1, 2, 0))
                axs[i, 1].imshow(original_img)
                axs[i, 1].set_title('Original Image')
                axs[i, 1].axis('off')

                # Grayscale (or colorized) Image
                colored_img = np.transpose(colored_images[i], (1, 2, 0))
                axs[i, 2].imshow(colored_img)
                axs[i, 2].set_title('NN Colorized Image')
                axs[i, 2].axis('off')

            plt.tight_layout()
            plt.show() 
        break  # Only show the first batch

In [0]:
# Show a batch of images
show_pairs(train_loader)

<article class="message task"><a class="anchor" id="run"></a>
    <div class="message-header">
        <span>Task 1: Setup</span>
        <span class="has-text-right">
          <i class="bi bi-stoplights easy"></i>
        </span>
    </div>
<div class="message-body">


Run all of the cells above. 


</div></article>

The following sections cover the functions, classes, models you need to implement and train. 
## Unet
[Unet](https://arxiv.org/abs/1505.04597)
 is the one of the most popular and well-performing architectures for image translation (segmentation, colorization, restoration, etc.) tasks. It consists of an encoder (compression), and a decoder (decompression). In addition, each encoder and decoder layer includes a skip-connection.
This task guides you through implementation of a standard Unet architechture.
**Unet building blocks**
Each block in Unet handles a specific task for the network. 
### 1. Implementing the DoubleConv Block:
A Double Convolution Block refers to a sequence of two consecutive convolutional layers within a neural network architecture. Double convolution block helps the model capture both low-level and high-level features through the two convolutional layers, while batch normalization and ReLU activation contribute to the stability and non-linearity of the network.
<article class="message task"><a class="anchor" id="convblock"></a>
    <div class="message-header">
        <span>Task 2: Unet</span>
        <span class="has-text-right">
          <i class="bi bi-code"></i><i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


Create a Double Convolution Block, known as `DoubleConv`
 in the U-Net architecture. It performs the following tasks sequentially:
1. **Convolution**: Applies a 2D convolution to the input tensor with a certain number of input channels (`in_channels`
) and a specified number of output channels ( or `out_channels`
 is not provided). The kernel size is set to 3x3, and padding is applied to maintain the spatial dimensions of the input.

2. **Batch Normalization**: Normalize the output of the convolution layer with `nn.BatchNorm2d`
.

3. **ReLU Activation**: Apply a non-linear ReLU activation function (`nn.ReLU`
)  in place to introduce non-linearity into the model.

4. **Double up**: Repeat step 1-3 again: Another convolution is applied, followed by batch normalization and ReLU activation. Use sequential container `nn.Sequential`
 to stack the layers in the correct order for the forward pass.




</div></article>



In [0]:
class DoubleConv(nn.Module):
"""(convolution => [BN] => ReLU) * 2"""

def __init__(self, in_channels, out_channels,kernel_size=3):
super().__init__()
self.double_conv = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=(1,1)),
...
)

def forward(self, x):
return self.double_conv(x)

### 2.  Downsampling Block:
The downsampling block:
- Initialize a Max Pool layer with a kernel size of 2 to downsample the input tensor.
- Uses the `DoubleConv`
 block to process the feature map post downsampling.

### 3.  Upsampling Block:
The upsampling block:
- Bilinear interpolation/upsampling to the spatial dimensions of the feature maps by a scale factor of 2.
- Concatenates the upsampled output with the corresponding feature map from the downsampling path (skip connection).
- Uses the `DoubleConv`
 block to process the concatenated feature map.

The downsampling and upsampling blocks play an important part of the of the Unet. However, to make the exercise more manageable, they are given in the code below.
### 4 Assembling the U-Net Architecture:
<article class="message task"><a class="anchor" id="create_unet"></a>
    <div class="message-header">
        <span>Task 3: Unet</span>
        <span class="has-text-right">
          <i class="bi bi-code"></i><i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


Implement the entire U-Net architecture by combining the initial convolution (`DoubleConv`
), the downsampling blocks (`Down`
), the upsampling blocks (`Up`
) in the network's forward pass. In the following sequence:
1. **Initial Double Convolution**: The network begins with an initial double convolution applied to the input images.

2. **Downsampling Path**: This is followed by a series of downsampling blocks that apply max pooling and double convolutions, reducing the spatial dimensions and increasing the depth of feature maps.

3. **Bottleneck**: After the last downsampling block, the feature map reaches the bottleneck, which is the lowest resolution with the highest feature depth.

4. **Upsampling Path**: Then, the network expands through a series of upsampling blocks that upsample the feature maps and combine them with the corresponding feature maps from the downsampling path through skip connections.

5. **Final Convolution**: At the top of the upsampling path, the network applies a final convolution to map the deep feature maps to the desired number of output channels (here the 3 color channels).




</div></article>



In [0]:
#Downsampling
class Down(nn.Module):
    """Downscaling with maxpool then double conv"""

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.maxpool_conv = nn.Sequential(
            nn.MaxPool2d(2),
            DoubleConv(in_channels, out_channels)
        )

    def forward(self, x):
        return self.maxpool_conv(x)

#Upsampling    
class Up(nn.Module):
    """Upscaling then double conv"""

    def __init__(self, in_channels, out_channels):
        super().__init__()


        self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)

        self.conv = DoubleConv(in_channels, out_channels, in_channels // 2)

    def forward(self, x1, x2):
        x1 = self.up(x1)
        # input is CHW
        diffY = x2.size()[2] - x1.size()[2]
        diffX = x2.size()[3] - x1.size()[3]

        x1 = nn.functional.pad(x1, [diffX // 2, diffX - diffX // 2,
                                    diffY // 2, diffY - diffY // 2])
        x = torch.cat([x2, x1], dim=1)
        return self.conv(x)

class UNet(nn.Module):
    def __init__(self, n_channels, n_classes, bilinear=True,start_dim=16):
        super(UNet, self).__init__()
        self.n_channels = n_channels
        self.n_classes = n_classes
        self.bilinear = bilinear
        
    # #initialize the different building blocks according to the following structure
    #start: Dim
    #Down: increase channels 
    #Down: 
    #Down: 
    #Down: 
    #Up: decrease channels
    #up
    #up
    #up
    #end
    def forward(self, x):
    # write code for forward pass

<article class="message task"><a class="anchor" id="train_epoch"></a>
    <div class="message-header">
        <span>Task 4: Train one epoch function</span>
        <span class="has-text-right">
          <i class="bi bi-code"></i><i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


Implement the function `train_epoch`
 that performs a training loop for a neural network, (in this case Unet). The training follows the gradient descent optimization from the previous week, and consists of the following steps:
1. **Set Model to Training Mode**: Activate training mode to enable layers like Dropout and BatchNorm to behave correctly during the training.

2. **Initialize Running Loss**: Set up a variable to accumulate the loss over all batches within the epoch.

3. **Iterate Over DataLoader**: Loop through batches of data provided by the DataLoader, which yields pairs of grayscale input images and original target images.

4. **Transfer Data to Device**: Move input and target tensors to the configured computing device (CPU or GPU) to match the device where the model is located.

5. **Execute Forward Pass**: Feed the input batch through the model to obtain predicted outputs.

6. **Compute Loss**: Use the criterion to calculate the loss by comparing the model predictions with the actual target values.

7. **Zero Gradients**: Clear old gradients; otherwise, they will accumulate with gradients of the current batch.

8. **Perform Backward Pass**: Backpropagate the loss by computing its gradient with respect to each model parameter.

9. **Update Model Parameters**: Adjust the model weights by performing a single optimization step using the gradients calculated during backpropagation.

10. **Aggregate Loss**: Add the loss (multiplied by the batch size) to the running total to keep track of the total loss for the epoch.

11. **Calculate Average Loss**: At the end of the epoch, divide the running loss by the number of samples in the dataset to get the average loss.

12. **Print Epoch Loss**: Output the average loss for the epoch to monitor training progress.

13. **Return Updated Model**




</div></article>



In [0]:
def train_epoch(model, dataloader, criterion, optimizer, device):
    """
    Run one training epoch.

    Parameters:
    - model (torch.nn.Module): The neural network model to be trained.
    - dataloader (torch.utils.data.DataLoader): DataLoader for the training dataset.
    - criterion (torch.nn.modules.loss): The loss function.
    - optimizer (torch.optim.Optimizer): The optimization algorithm.
    - device (torch.device): The device to train on ('cuda' or 'cpu').

    Returns:
    - model (torch.nn.Module): The trained model.
    """
    # #initialize the different building blocks according to the following structure

If the `Unet`
 model class and `train_epoch`
 is implemented correctly the cells below can be executed for traning the Unet `model`
.


In [0]:
# Example usage:
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
model = UNet(n_channels=3, n_classes=3,start_dim=64).to(device) ##
criterion = torch.nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001)

In [0]:
# Call the function to run a single training epoch
model = train_epoch(model, train_loader, criterion, optimizer, device)

In [0]:
num_epochs = 10 
for epoch in range(num_epochs):
    train_epoch(model, train_loader, criterion, optimizer, device)
    print(f'showing samples from epoch number {epoch}')
    show_pairs(dataloader=train_loader,n_pairs=2,model=model)
    show_pairs(dataloader=val_loader,n_pairs=2,model=model)
    model.to(device)
    print('------------------------------------------')

<article class="message task"><a class="anchor" id="reflections"></a>
    <div class="message-header">
        <span>Task 5: Reflection</span>
        <span class="has-text-right">
          <i class="bi bi-lightbulb-fill"></i><i class="bi bi-stoplights medium"></i>
        </span>
    </div>
<div class="message-body">


After completing a traning routine, consider the following questions:
1. How does the U-Net architecture, particularly its use of skip connections, specifically aid in the task of image colorization? Reflect on how this architecture might perform differently compared to others you are familiar with for the same task.

2. List atleast 3 significant limitations of the current model implementation? 

3. Provide suggestions for adressing each of the limitations:
    - Keywords: Loss function, color spaces, hyperparameters, pretraining.


4. Why is the task of colorization convinient from the perspective of data annotations?

5. Could colorization serve as a good pretraining task?
    - What must a model know about an image to color it?





</div></article>

