# CV Project 3 - Image Colourization on pets dataset

Prepared by:
 - Marianna Myszkowska 156041
 - Jakub Liszyński 156060

## Data set
The Oxford-IIIT Pet Dataset is a comprehensive collection of images featuring 37 distinct pet breeds, including both cats and dogs, with approximately 200 images per breed. The dataset offers significant diversity in terms of scale, pose, and lighting conditions, making it valuable for various computer vision tasks.

### Key Features:
- **Breed Annotations:** Each image is labeled with its respective breed, facilitating classification tasks.
- **Head Region of Interest (ROI):** Annotations include bounding boxes around the pet's head, aiding in localization studies.
- **Pixel-Level Trimap Segmentation:** Detailed annotations provide pixel-wise segmentation, distinguishing between the pet, foreground, and background, which is particularly useful for segmentation tasks.

## Sample Images:
Here are a few examples from the dataset:


![card](raport_sources\dataset-card.png) 


In order to use this dataset for our project we wil first convert images to grayscale and only then we wil use them. The Original images will be used as a reference while coloring the images.

![colored](raport_sources\Abyssinian_1.jpg) 
![gray](raport_sources\Abyssinian_1_gray.jpg) 



In [None]:
import os
from PIL import Image
def convert_images_to_grayscale(input_folder, output_folder):
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
    if not os.listdir(input_folder):
        raise ValueError('Input folder is empty. Please ensure it contains images.')
    for filename in os.listdir(input_folder):
        if filename.endswith(('.png', '.jpg', '.jpeg')):
            img_path = os.path.join(input_folder, filename)
            img = Image.open(img_path).convert('L')
            gray_img_path = os.path.join(output_folder, f"{os.path.splitext(filename)[0]}_gray{os.path.splitext(filename)[1]}")
            img.save(gray_img_path)
            print(f"Converted {filename} to grayscale and saved as {gray_img_path}")



For more information and to access the dataset, please visit the [official website](https://www.robots.ox.ac.uk/~vgg/data/pets/).

## The problem - Image colorization
Image colorization is the process of adding plausible color information to grayscale images or videos. This task is inherently challenging due to its ill-posed nature; a single grayscale image can correspond to multiple valid color interpretations. Consequently, the model must infer and predict realistic colors for each pixel, often relying on learned patterns and contextual cues. 

Traditionally, colorization was performed manually by artists, which was time-consuming and required significant expertise. With advancements in deep learning, automated approaches have been developed to tackle this problem. These methods typically involve training convolutional neural networks (CNNs) on large datasets of color images. During training, the models learn to map grayscale inputs to their corresponding color outputs, capturing semantic and contextual information to produce realistic colorizations.

Despite these advancements, challenges remain. The ambiguity of the task means that multiple color outputs can be correct for a single grayscale input. For instance, a grayscale image of a car could be red, blue, or any other color, and all would be plausible. Addressing this uncertainty is a key focus in current research, with approaches exploring probabilistic models and user-guided colorization to refine results. 

In summary, image colorization is a complex and underdetermined problem that seeks to enrich grayscale images by predicting and applying appropriate colors, leveraging machine learning techniques to achieve this goal.

## Used architectures

## 1. Tensorflow approach

The architecture used for the image colorization task is a Convolutional Neural Network (CNN) designed to map grayscale images to their corresponding color images. The model consists of several layers, each with a specific function:

1. **Input Layer:**
   - **Shape:** (256, 256, 1)
   - **Description:** This layer accepts grayscale images of size 256x256 pixels with a single channel.

2. **Convolutional Layer 1:**
   - **Filters:** 64
   - **Kernel Size:** (3, 3)
   - **Activation:** ReLU
   - **Padding:** Same
   - **Description:** This layer applies 64 convolutional filters to the input image, each of size 3x3, and uses the ReLU activation function to introduce non-linearity. The 'same' padding ensures that the output has the same spatial dimensions as the input.

3. **UpSampling Layer 1:**
   - **Size:** (2, 2)
   - **Description:** This layer upsamples the input by a factor of 2, effectively doubling the spatial dimensions of the feature maps.

4. **Convolutional Layer 2:**
   - **Filters:** 32
   - **Kernel Size:** (3, 3)
   - **Activation:** ReLU
   - **Padding:** Same
   - **Description:** Similar to the first convolutional layer, but with 32 filters.

5. **UpSampling Layer 2:**
   - **Size:** (2, 2)
   - **Description:** This layer upsamples the input by a factor of 2 again.

6. **Convolutional Layer 3:**
   - **Filters:** 16
   - **Kernel Size:** (3, 3)
   - **Activation:** ReLU
   - **Padding:** Same
   - **Description:** Similar to the previous convolutional layers, but with 16 filters.

7. **Output Convolutional Layer:**
   - **Filters:** 3
   - **Kernel Size:** (3, 3)
   - **Activation:** Sigmoid
   - **Padding:** Same
   - **Description:** This layer produces the final output with 3 channels (representing the RGB color channels) and uses the sigmoid activation function to ensure the output values are between 0 and 1.

### Model Diagram
Below is a diagram representing the architecture of the model:

```
Input (256, 256, 1) 
    ↓
Conv2D (64 filters, 3x3, ReLU, same padding) 
    ↓
UpSampling2D (2x2) 
    ↓
Conv2D (32 filters, 3x3, ReLU, same padding) 
    ↓
UpSampling2D (2x2) 
    ↓
Conv2D (16 filters, 3x3, ReLU, same padding) 
    ↓
Conv2D (3 filters, 3x3, Sigmoid, same padding) 
    ↓
Output (256, 256, 3) 
```
![model](raport_sources\tensorflow.png)

## 2. Pytorch approach
One approach involved designing a neural network with an encoder-decoder structure to predict colors from grayscale inputs. A dataset of paired grayscale and color images was prepared, and the model was tested briefly using mixed-precision training. However, this specific approach was abandoned after initial trials, due to extremly slow learning, even tho it was trained using GPU, thats why we never finished learning process and abandon this at very start.

````PYTHON
# Dataset class
class ColorizationNet(nn.Module):
    def __init__(self):
        super(ColorizationNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3, padding=1),  # Input: 1x256x256
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),  # Downsample: 128x128x128
            nn.ReLU(),
            nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1),  # Downsample: 256x64x64
            nn.ReLU()
        )
        self.middle = nn.Sequential(
            nn.Conv2d(256, 512, kernel_size=3, padding=1),  # Bottleneck
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(512, 256, kernel_size=3, stride=2, padding=1, output_padding=1),  # Upsample: 256x128x128
            nn.ReLU(),
            nn.ConvTranspose2d(256, 128, kernel_size=3, stride=2, padding=1, output_padding=1),  # Upsample: 128x256x256
            nn.ReLU(),
            nn.Conv2d(128, 64, kernel_size=3, padding=1),  
            nn.ReLU(),
            nn.Conv2d(64, 3, kernel_size=3, padding=1),  # Output: 3x256x256
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.middle(x)
        x = self.decoder(x)
        return x
````

## Model analysis
model analysis: size in memory, number of parameters,  

## Training 
description of the training and the required commands to run it


## Evaluation
description of used metrics, loss, and evaluation

## 

plots: training and validation loss, metrics

## Hyperparameters
### Optimizers
#### 1.Adam Optimizer:

-Combines the benefits of Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSprop).
-Adjusts the learning rate individually for each parameter, making it highly effective and widely used.
-Suitable for models with sparse gradients or when fine-tuning, often leading to fast convergence.
-Tested with a learning rate of 0.001, showing stable training behavior.
![OPTIMIZER PLOT](raport_sources\adam.png)

#### 2.SGD (Stochastic Gradient Descent):

-A classic optimizer that updates weights based on the gradient of the loss function.
-Momentum was added (0.9) to help accelerate convergence and overcome local minima.
-Generally slower compared to adaptive optimizers but can yield better generalization with proper tuning.
-Learning rate set to 0.001; its simplicity makes it sensitive to such settings.
![OPTIMIZER PLOT](raport_sources\sgd.png)


#### 3.RMSprop:

-A popular adaptive learning rate optimizer, especially for recurrent neural networks (RNNs).
-Divides the learning rate by an exponentially decaying average of squared gradients, which helps balance learning -across parameters.
-Works well in scenarios with non-stationary objectives or noisy gradients.
-Like Adam, it was tested with a learning rate of 0.001 and proved effective in smoothing the optimization process.
![OPTIMIZER PLOT](raport_sources\rms.png)




Each optimizer was tested over 3 epochs on the same model and dataset with reduced steps for quicker evaluation. Validation loss was tracked to compare performance. Results showed the nuances of each optimizer, highlighting their strengths and trade-offs.





![OPTIMIZER PLOT](raport_sources\OPTIMIZER.png)






the original image:


![OPTIMIZER PLOT](raport_sources\Abyssinian_2.jpg)


## Models
comparison of models


## Libraries
list of libraries and tools used can be a requirements.txt file


## Rutime enviroment
a description of the runtime environment

## training and inference time

## Bibliography 
preparation of a bibliography - the bibliography should contain references to the data set (preferably the article in which the collection was presented) and all scientific works and studies, including websites with tips on the solution.


## Points
| **Task**                                         | **Status**      | **Points** |
|--------------------------------------------------|-----------------|------------|
| **Problem: Colorization**                        |  In progress    | 1          |
| **Model: Pre-trained model (different problem)** |  In progress    | 1          |
| **Data Augmentation**                            |  In progress    | 1          |
| **Cross-Validation**                             |  Not Attempted  | 1          |
| **Testing Optimizers (at least 3)**              |  Not Attempted  | 1          |
| **Testing Loss Functions (at least 3)**          |  Not Attempted  | 1          |
| **Dataset Requirements (at least 1000 images)**  |  Completed      | 0 (default requirement) |
| **Metrics (at least 2)**                         |  Not Attempted  | 0 (default requirement) |
| **Report (descriptions, diagrams, etc.)**        |  In progress    | 0          |
| **Visualization Tools (e.g., TensorBoard)**      |  Not Attempted  | 1          |

### **Points Summary**
- **Problem**: 1 
- **Model**: 1 
- **Additional Points (Training, Dataset, Tools)**: 5 
- **Total Points**: **7**




[Link to Git](https://github.com/Strajkerr/CV_Image_Colourization)