# Monet-Photo Image Translation Pipeline

## Introduction

The Monet-Photo Image Translation Pipeline is a comprehensive machine learning framework designed to transform photographs into Monet-style paintings and vice versa. Leveraging Generative Adversarial Networks (GANs), specifically a combination of Generator and Discriminator models, this pipeline facilitates the creation of artistic renditions of real-world images. The primary objective is to enable seamless translation between the realistic domain of photographs and the impressionistic style of Monet's artwork, thereby enhancing creative applications in digital art and image processing.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        if not filename.endswith('.jpg'):  # Skip files with '.jpg' extension
            print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# modal path: /kaggle/input/0731a/pytorch/default/1/view

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision.utils import make_grid
from PIL import Image

from tqdm import tqdm
import shutil

from pathlib import Path

## Methodology

### Data Preparation

The pipeline begins with the `MonetPhoto` dataset class, which inherits from PyTorch's `Dataset` class. This class is responsible for loading and pairing Monet-style images with their corresponding photographs. Key steps include:

1. **Initialization**: The dataset is initialized with paths to Monet and photo image directories. It lists all image filenames in these directories and applies optional transformations such as resizing, normalization, and conversion to tensor format.
2. **Data Splitting**: The dataset is split into training and testing subsets using an 80-20 split to ensure that the model generalizes well to unseen data.
3. **Data Loading**: PyTorch's `DataLoader` is utilized to handle batching, shuffling, and parallel data loading, facilitating efficient training and evaluation.

### Model Architecture

The pipeline employs two primary neural network architectures:

1. **Generator**:
    - **Structure**: The `Generator` class implements a U-Net architecture, characterized by a series of downsampling and upsampling blocks with skip connections. This design allows the model to capture both global and local features, essential for high-quality image translation.
    - **Components**:
        - **Downsampling Path**: Consists of multiple `Block` instances that progressively reduce the spatial dimensions while increasing feature depth.
        - **Bottleneck**: Acts as a bridge between the downsampling and upsampling paths, capturing the most abstract features.
        - **Upsampling Path**: Mirrors the downsampling path with `Block` instances that restore the spatial dimensions, incorporating skip connections from corresponding downsampling layers to preserve spatial information.
        - **Final Layer**: Uses a transposed convolution followed by a Tanh activation to generate the output image with pixel values scaled between -1 and 1.

2. **Discriminator**:
    - **Structure**: The `Discriminator` class is a convolutional neural network designed to distinguish between real and generated (fake) images. It utilizes a series of `CNNBlock` instances to extract hierarchical features from input images.
    - **Components**:
        - **Initial Layer**: A convolutional layer with a LeakyReLU activation to process the input image.
        - **Convolutional Blocks**: Multiple `CNNBlock` instances that downsample the image and extract complex features.
        - **Final Layer**: A convolutional layer that outputs a single-channel feature map indicating the authenticity of the input image.

### Training Procedure

The `Trainer` class orchestrates the training process of the GAN, encompassing the following steps:

1. **Initialization**: Sets up data loaders, models, training parameters (e.g., number of epochs, learning rates), and device configuration (CPU or GPU).
2. **Optimizer Setup**: Initializes Adam optimizers for both the Generator and Discriminator with specified learning rates and beta parameters.
3. **Training Loop**:
    - **Epoch Iteration**: For each epoch, the model undergoes training and validation phases.
    - **Training Phase**:
        - Iterates over the training data, performing forward and backward passes.
        - Calculates Generator and Discriminator losses using Mean Squared Error (MSE) loss functions.
        - Updates model weights based on the computed gradients.
    - **Validation Phase**:
        - Evaluates the model on the validation dataset without updating weights.
        - Computes validation losses to monitor model performance and prevent overfitting.
    - **Checkpointing**: Saves the model's state if the current validation loss surpasses the best recorded loss, facilitating early stopping based on performance improvements.
    - **Logging**: Records training and validation metrics for analysis and visualization.
4. **Sample Generation**: After each epoch, the Generator produces sample images from a fixed set of validation inputs to visualize the translation quality.

### Visualization

The pipeline includes visualization capabilities to assess the Generator's performance. By generating fake images and comparing them with real photos, users can qualitatively evaluate the effectiveness of the image translation process.


In [None]:
data_root = '/kaggle/input/gan-getting-started/'
print(data_root)
print(os.listdir(data_root))
monet_path = 'monet_jpg'
photo_path = 'photo_jpg'

In [None]:
class MonetPhoto(Dataset):
    def __init__(self, data_root, monet_path, photo_path, transform=None):
        """
        Initialize the MonetPhoto dataset with paths to Monet and photo images and optional transformations.
        
        Args:
            data_root (str): Root directory where the data is stored.
            monet_path (str): Subdirectory within data_root containing Monet-style images.
            photo_path (str): Subdirectory within data_root containing photo images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.data_root = data_root  # Set the root directory for data
        self.monet_path = monet_path  # Set the subdirectory for Monet images
        self.monet_images = os.listdir(os.path.join(data_root, monet_path))  # List all Monet image filenames
        self.photo_path = photo_path  # Set the subdirectory for photo images
        self.photo_images = os.listdir(os.path.join(data_root, photo_path))  # List all photo image filenames
        self.transform = transform  # Set the transformation to be applied on images

    def __len__(self):
        """
        Return the total number of samples in the dataset.
        The length is the maximum of the number of Monet and photo images to ensure pairing.
        
        Returns:
            int: Total number of samples.
        """
        return max(len(self.monet_images), len(self.photo_images))  # Ensure all images are paired

    def __getitem__(self, idx):
        """
        Retrieve a sample from the dataset at the given index.
        It pairs a Monet image with a photo image. If indices exceed the number of available images,
        it wraps around using modulo operation.
        
        Args:
            idx (int): Index of the sample to retrieve.
        
        Returns:
            tuple: A pair of Monet and photo images after applying transformations.
        """
        # Open the Monet image at the given index, wrapping around if idx exceeds length
        monet_image = Image.open(os.path.join(self.data_root, self.monet_path, self.monet_images[idx % len(self.monet_images)]))
        # Open the photo image at the given index, wrapping around if idx exceeds length
        photo_image = Image.open(os.path.join(self.data_root, self.photo_path, self.photo_images[idx % len(self.photo_images)]))
        
        # Apply transformations if any are provided
        if self.transform:
            monet_image = self.transform(monet_image)  # Transform the Monet image
            photo_image = self.transform(photo_image)  # Transform the photo image
        
        return monet_image, photo_image  # Return the transformed image pair


# Define a sequence of transformations to apply to the images
transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize((256, 256)),  # Resize images to 256x256 pixels
    torchvision.transforms.ToTensor(),  # Convert PIL images to PyTorch tensors
    torchvision.transforms.Normalize(0.5, 0.5)  # Normalize tensor values to [-1, 1]
])

# Create an instance of the MonetPhoto dataset with the specified paths and transformations
dataset = MonetPhoto(data_root, monet_path, photo_path, transform=transforms)

# Get the total number of samples in the dataset
len(dataset)

# Calculate the number of samples for training (80%) and testing (20%) splits
train_size, test_size = int(len(dataset) * 0.8), len(dataset) - int(len(dataset) * 0.8)
# Split the dataset into training and testing subsets
training_data, testing_data = random_split(dataset, [train_size, test_size])

# Get the number of samples in the training subset
len(training_data)

# Create a DataLoader for the training data with a batch size of 32, shuffling enabled, and using 2 worker processes
train_dataloader = DataLoader(training_data.dataset, batch_size=32, shuffle=True, num_workers=2)
# Create a DataLoader for the testing data with a batch size of 32, shuffling enabled, and using 2 worker processes
test_dataloader = DataLoader(testing_data.dataset, batch_size=32, shuffle=True, num_workers=2)

# Retrieve the first batch of Monet and photo images from the training DataLoader
mones, photos = next(iter(train_dataloader))

# Plot the first Monet image in the batch
plt.imshow(mones[0].permute(1, 2, 0).numpy() * 0.5 + 0.5)  # Convert tensor to numpy array and denormalize for display

## Modeling

In [None]:
class CNNBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=2, device="cuda"):
        """
        Initialize the CNNBlock with convolution, batch normalization, and activation layers.
        
        Args:
            in_channels (int): Number of input channels.
            out_channels (int): Number of output channels.
            stride (int, optional): Stride for the convolution. Defaults to 2.
            device (str, optional): Device to run the operations on ('cuda' or 'cpu'). Defaults to "cuda".
        """
        super().__init__()  # Initialize the parent class
        self.device = device  # Set the device
        self.conv = nn.Sequential(
            nn.Conv2d(
                in_channels, 
                out_channels, 
                kernel_size=4, 
                stride=stride, 
                bias=False, 
                padding_mode='reflect', 
                device=self.device
            ),  # Convolutional layer with specified parameters
            nn.BatchNorm2d(out_channels, device=self.device),  # Batch normalization to stabilize and accelerate training
            nn.LeakyReLU(0.2),  # Leaky ReLU activation with negative slope of 0.2
        )
        
    def forward(self, x):
        """
        Forward pass through the CNNBlock.
        
        Args:
            x (torch.Tensor): Input tensor.
        
        Returns:
            torch.Tensor: Output tensor after convolution, batch normalization, and activation.
        """
        return self.conv(x)  # Apply the sequential layers to the input


class Discriminator(nn.Module):
    def __init__(self, in_channels=3, features=[64, 128, 256, 512], device="cuda"):
        """
        Initialize the Discriminator model using a series of CNNBlocks.
        
        Args:
            in_channels (int, optional): Number of input channels. Defaults to 3 (e.g., RGB images).
            features (list, optional): List of feature sizes for each CNNBlock. Defaults to [64, 128, 256, 512].
            device (str, optional): Device to run the operations on ('cuda' or 'cpu'). Defaults to "cuda".
        """
        super().__init__()  # Initialize the parent class
        self.device = device  # Set the device
        self.initial = nn.Sequential(
            nn.Conv2d(
                in_channels, 
                features[0], 
                kernel_size=4, 
                stride=2, 
                padding=1, 
                padding_mode='reflect', 
                device=self.device
            ),  # Initial convolutional layer
            nn.LeakyReLU(0.2)  # Leaky ReLU activation
        )
        
        layers = []  # Initialize a list to hold subsequent CNNBlocks
        in_channels = features[0]  # Set the initial number of input channels
        for feature in features[1:]:
            layers.append(
                CNNBlock(in_channels, feature, stride=1 if feature == features[-1] else 2, device=self.device),
            )  # Append CNNBlock with appropriate stride
            in_channels = feature  # Update the number of input channels for the next block
        
        layers.append(
            nn.Conv2d(
                in_channels, 
                1, 
                kernel_size=4, 
                stride=1, 
                padding=1, 
                padding_mode='reflect', 
                device=self.device
            )
        )  # Final convolutional layer to produce a single output channel (real/fake classification)
            
        self.model = nn.Sequential(*layers)  # Combine all layers into a sequential model
        
    def forward(self, x):
        """
        Forward pass through the Discriminator.
        
        Args:
            x (torch.Tensor): Input tensor (e.g., an image).
        
        Returns:
            torch.Tensor: Output tensor representing the discriminator's prediction.
        """
        x = self.initial(x)  # Apply the initial convolution and activation
        return self.model(x)  # Apply the subsequent CNNBlocks and final convolution


class Block(nn.Module):
    def __init__(self, in_channels, out_channels, down=True, act='relu', use_dropout=False, device="cuda"):
        """
        Initialize a Block, which can act as either a downsampling or upsampling layer.
        
        Args:
            in_channels (int): Number of input channels.
            out_channels (int): Number of output channels.
            down (bool, optional): If True, performs downsampling using Conv2d; else, upsampling using ConvTranspose2d. Defaults to True.
            act (str, optional): Activation function to use ('relu' or 'leaky'). Defaults to 'relu'.
            use_dropout (bool, optional): Whether to include a dropout layer. Defaults to False.
            device (str, optional): Device to run the operations on ('cuda' or 'cpu'). Defaults to "cuda".
        """
        super().__init__()  # Initialize the parent class
        self.device = device  # Set the device
        self.conv = nn.Sequential(
            nn.Conv2d(
                in_channels, 
                out_channels, 
                kernel_size=4, 
                stride=2, 
                padding=1, 
                bias=False, 
                padding_mode='reflect', 
                device=self.device
            ) if down else nn.ConvTranspose2d(
                in_channels, 
                out_channels, 
                kernel_size=4, 
                stride=2, 
                padding=1, 
                bias=False, 
                device=self.device
            ),  # Choose Conv2d for downsampling or ConvTranspose2d for upsampling
            nn.BatchNorm2d(out_channels, device=self.device),  # Batch normalization
            nn.ReLU() if act == 'relu' else nn.LeakyReLU(0.2),  # Activation function
        )
        self.use_dropout = use_dropout  # Set whether to use dropout
        self.dropout = nn.Dropout(0.5)  # Define a dropout layer with 50% dropout rate
    
    def forward(self, x):
        """
        Forward pass through the Block.
        
        Args:
            x (torch.Tensor): Input tensor.
        
        Returns:
            torch.Tensor: Output tensor after convolution, normalization, activation, and optional dropout.
        """
        x = self.conv(x)  # Apply convolution, normalization, and activation
        return self.dropout(x) if self.use_dropout else x  # Apply dropout if enabled, else return the tensor


class Generator(nn.Module):
    def __init__(self, in_channels=3, features=64, device="cuda"):
        """
        Initialize the Generator model using a series of downsampling and upsampling Blocks.
        
        Args:
            in_channels (int, optional): Number of input channels. Defaults to 3 (e.g., RGB images).
            features (int, optional): Base number of feature channels. Defaults to 64.
            device (str, optional): Device to run the operations on ('cuda' or 'cpu'). Defaults to "cuda".
        """
        super().__init__()  # Initialize the parent class
        self.device = device  # Set the device
        self.initial_down = nn.Sequential(
            nn.Conv2d(
                in_channels, 
                features, 
                kernel_size=4, 
                stride=2, 
                padding=1, 
                padding_mode='reflect', 
                device=self.device
            ),  # Initial convolutional layer
            nn.LeakyReLU(0.2),  # Leaky ReLU activation
        )  # Output size: 128
        
        # Define downsampling Blocks
        self.down1 = Block(features, features*2, down=True, act='leaky', use_dropout=False)   # Output size: 64
        self.down2 = Block(features*2, features*4, down=True, act='leaky', use_dropout=False) # Output size: 32
        self.down3 = Block(features*4, features*8, down=True, act='leaky', use_dropout=False) # Output size: 16
        self.down4 = Block(features*8, features*8, down=True, act='leaky', use_dropout=False) # Output size: 8
        self.down5 = Block(features*8, features*8, down=True, act='leaky', use_dropout=False) # Output size: 4
        self.down6 = Block(features*8, features*8, down=True, act='leaky', use_dropout=False) # Output size: 2
        
        # Bottleneck layer
        self.bottleneck = nn.Sequential(
            nn.Conv2d(
                features*8, 
                features*8, 
                kernel_size=4, 
                stride=2, 
                padding=1, 
                padding_mode='reflect', 
                device=self.device
            ),  # Convolutional layer at the bottleneck
            nn.ReLU(),  # ReLU activation
        )  # Output size: 1x1
        
        # Define upsampling Blocks with skip connections
        self.up1 = Block(features*8, features*8, down=False, act='relu', use_dropout=True)  # Output size: 2
        self.up2 = Block(features*8*2, features*8, down=False, act='relu', use_dropout=True)  # Output size: 4
        self.up3 = Block(features*8*2, features*8, down=False, act='relu', use_dropout=True)  # Output size: 8
        self.up4 = Block(features*8*2, features*8, down=False, act='relu', use_dropout=False) # Output size: 16
        self.up5 = Block(features*8*2, features*4, down=False, act='relu', use_dropout=False) # Output size: 32
        self.up6 = Block(features*4*2, features*2, down=False, act='relu', use_dropout=False) # Output size: 64
        self.up7 = Block(features*2*2, features, down=False, act='relu', use_dropout=False)   # Output size: 128
        
        # Final upsampling layer to reconstruct the image
        self.finil_up = nn.Sequential(
            nn.ConvTranspose2d(
                features*2, 
                in_channels, 
                kernel_size=4, 
                stride=2, 
                padding=1, 
                device=self.device
            ),  # Transposed convolution to upsample to original image size
            nn.Tanh(),  # Tanh activation to scale the output between -1 and 1
        )
        
    def forward(self, x):
        """
        Forward pass through the Generator, implementing the U-Net architecture with skip connections.
        
        Args:
            x (torch.Tensor): Input tensor (e.g., an image).
        
        Returns:
            torch.Tensor: Output tensor representing the generated image.
        """
        d1 = self.initial_down(x)  # Apply initial downsampling
        d2 = self.down1(d1)  # Apply first downsampling Block
        d3 = self.down2(d2)  # Apply second downsampling Block
        d4 = self.down3(d3)  # Apply third downsampling Block
        d5 = self.down4(d4)  # Apply fourth downsampling Block
        d6 = self.down5(d5)  # Apply fifth downsampling Block
        d7 = self.down6(d6)  # Apply sixth downsampling Block
        
        bottleneck = self.bottleneck(d7)  # Apply bottleneck layer
        
        up1 = self.up1(bottleneck)  # Apply first upsampling Block
        up2 = self.up2(torch.cat([up1, d7], 1))  # Concatenate with corresponding downsampled feature map and apply second upsampling Block
        up3 = self.up3(torch.cat([up2, d6], 1))  # Concatenate with corresponding downsampled feature map and apply third upsampling Block
        up4 = self.up4(torch.cat([up3, d5], 1))  # Concatenate with corresponding downsampled feature map and apply fourth upsampling Block
        up5 = self.up5(torch.cat([up4, d4], 1))  # Concatenate with corresponding downsampled feature map and apply fifth upsampling Block
        up6 = self.up6(torch.cat([up5, d3], 1))  # Concatenate with corresponding downsampled feature map and apply sixth upsampling Block
        up7 = self.up7(torch.cat([up6, d2], 1))  # Concatenate with corresponding downsampled feature map and apply seventh upsampling Block

        # Concatenate with initial downsampled feature map and apply final upsampling to generate the output image
        return self.finil_up(torch.cat([up7, d1], 1))

## Training

In [None]:
class Trainer:
    def __init__(
        self,
        train_data: DataLoader,
        val_data: DataLoader,
        generator: torch.nn.Module,
        discriminator: torch.nn.Module,
        nb_epochs: int = 5,
        device: str = "cuda",
        save_path: str = None,
    ):
        """
        Initialize the Trainer with data loaders, models, training parameters, and device configuration.
        
        Args:
            train_data (DataLoader): DataLoader for training data.
            val_data (DataLoader): DataLoader for validation data.
            generator (torch.nn.Module): Generator model.
            discriminator (torch.nn.Module): Discriminator model.
            nb_epochs (int, optional): Number of training epochs. Defaults to 5.
            device (str, optional): Device to run the training on ('cuda' or 'cpu'). Defaults to "cuda".
            save_path (str, optional): Path to save the trained models. Defaults to None.
        """
        self.train_data = train_data        # Assign training data loader
        self.val_data = val_data            # Assign validation data loader
        self.generator = generator          # Assign generator model
        self.discriminator = discriminator  # Assign discriminator model
        self.nb_epochs = nb_epochs          # Set number of epochs
        self.device = device                # Set device for computation
        self.save_path = Path(save_path) if save_path else save_path  # Set save path as Path object if provided

        # Initialize a fixed set of validation samples for monitoring generator progress
        self.z = next(iter(self.val_data))[0][:32].to(self.device)
        
        # Initialize a dictionary to store training logs
        self.logs = {
            "Step": [],
            "Train_g_loss": [],
            "Train_d_loss": [],
            "Val_g_loss": [],
            "Val_d_loss": [],
            "Samples": [],
        }

    def init_optimizers(self, lr: float=3e-4, betas: tuple=(0.5, 0.999)):
        """
        Initialize the optimizers for the generator and discriminator.
        
        Args:
            lr (float, optional): Learning rate for the optimizers. Defaults to 3e-4.
            betas (tuple, optional): Beta parameters for the Adam optimizer. Defaults to (0.5, 0.999).
        """
        # Initialize Adam optimizer for the generator
        self.g_optimizer = torch.optim.Adam(
            self.generator.parameters(), lr=lr, betas=betas
        )
        # Initialize Adam optimizer for the discriminator
        self.d_optimizer = torch.optim.Adam(
            self.discriminator.parameters(), lr=lr, betas=betas
        )

    def train(self):
        """
        Execute the training loop for the GAN, including training and validation phases,
        logging, and model checkpointing.
        
        Returns:
            dict: Dictionary containing training logs.
        """
        # Ensure that both optimizers are initialized before training
        assert (self.g_optimizer is not None) and (
            self.d_optimizer is not None
        ), "Please run Trainer().init_optimizer()"

        # If a saved model exists at the save path, load the model weights
        if self.save_path and self.save_path.exists():
            self.load_model()
            
        best_score = torch.inf  # Initialize the best validation score to infinity

        # Iterate over each epoch
        for i in range(self.nb_epochs):
            # Initialize cumulative losses for the epoch
            train_d_loss, train_g_loss, val_d_loss, val_g_loss = 0, 0, 0, 0
            self.generator.train()      # Set generator to training mode
            self.discriminator.train()  # Set discriminator to training mode
            
            # Training loop with progress bar
            loop = tqdm(
                enumerate(self.train_data),
                desc=f"Epoch {i + 1}/{self.nb_epochs} train",
                leave=False,
                total=len(self.train_data),
            )
            for step, (x, y) in loop:
                x = x.to(self.device)  # Move input data to the specified device
                y = y.to(self.device)  # Move target data to the specified device

                # Perform a training step and retrieve generator and discriminator losses
                g_loss, d_loss = self.train_step(x, y)

                # Accumulate the losses
                train_g_loss += g_loss
                train_d_loss += d_loss

                # Update the progress bar with average losses
                loop.set_postfix_str(
                    f"g_loss: {train_g_loss / (step + 1) :.2f}, d_loss: {train_d_loss / (step + 1) :.2f}"
                )

            # Validation phase
            self.generator.eval()  # Set generator to evaluation mode
            self.discriminator.eval()  # Set discriminator to evaluation mode

            # Validation loop with progress bar
            loop = tqdm(
                enumerate(self.val_data),
                desc=f"Epoch {i + 1}/{self.nb_epochs} validation",
                leave=True,
                total=len(self.val_data),
            )
            for step, (x, y) in loop:
                x = x.to(self.device)  # Move input data to the specified device
                y = y.to(self.device)  # Move target data to the specified device

                # Perform a validation step and retrieve generator and discriminator losses
                g_loss, d_loss = self.val_step(x, y)

                # Accumulate the validation losses
                val_g_loss += g_loss
                val_d_loss += d_loss

                # Update the progress bar with average validation losses
                loop.set_postfix_str(
                    f"g_loss: {val_g_loss / (step + 1) :.2f} d_loss: {val_d_loss / (step + 1) :.2f}"
                )

            # Check if the current validation loss is the best so far
            if self.save_path and best_score > val_g_loss:
                best_score = val_g_loss  # Update the best score
                self.save_model()  # Save the current model as the best model

            # Log the metrics for the current epoch
            self.log_metrics(
                step=i,
                train_g_loss=train_g_loss,
                train_d_loss=train_d_loss,
                val_g_loss=val_g_loss,
                val_d_loss=val_d_loss,
            )
            # Generate and display sample images from the generator
            fake_img = self.generator((photos[0].unsqueeze(0)).to('cuda'))
            fake_img = fake_img[0].cpu().detach()
            
            # Plot the original photo and the generated fake image
            fig, ax = plt.subplots(1, 2, figsize=(20, 4))
            ax[0].imshow(photos[0].permute(1, 2, 0).numpy() * 0.5 + 0.5)
            ax[0].set_title("Photo")
            ax[0].set(xticks=[], yticks=[])
            ax[1].imshow(fake_img.permute(1, 2, 0).numpy() * 0.5 + 0.5)
            ax[1].set_title("FakeMonet")
            ax[1].set(xticks=[], yticks=[])
            plt.show()
            
        return self.logs  # Return the training logs

    def train_step(
        self,
        x: torch.Tensor,
        y: torch.Tensor
    ) -> tuple:
        """
        Perform a single training step for both generator and discriminator.
        
        Args:
            x (torch.Tensor): Input tensor (e.g., photos).
            y (torch.Tensor): Target tensor (e.g., Monet paintings).
            
        Returns:
            tuple: Generator loss and discriminator loss.
        """
        self.g_optimizer.zero_grad(set_to_none=True)  # Reset generator gradients
        self.d_optimizer.zero_grad(set_to_none=True)  # Reset discriminator gradients
        
        # Generate fake images and classify them with the discriminator
        fake_ = self.discriminator(
            self.generator(x)
        )
        # Classify real images with the discriminator
        real = self.discriminator(y)
        # Calculate discriminator loss: real images should be classified as ones and fake as zeros
        d_loss = (
            torch.nn.functional.mse_loss(real, torch.ones_like(real, device=self.device)) + 
            torch.nn.functional.mse_loss(fake_, torch.zeros_like(fake_, device=self.device))
        )

        d_loss.backward()  # Backpropagate discriminator loss
        self.d_optimizer.step()  # Update discriminator weights
        
        # Re-generate fake images for generator training
        fake = self.discriminator(
            self.generator(x)
        )
        
        # Calculate generator loss: fake images should be classified as ones
        g_loss = torch.nn.functional.mse_loss(fake, torch.ones_like(fake, device=self.device))
        g_loss.backward()  # Backpropagate generator loss
        self.g_optimizer.step()  # Update generator weights

        return g_loss.item(), d_loss.item()  # Return scalar losses

    @torch.no_grad()
    def val_step(
        self,
        x: torch.Tensor,
        y: torch.Tensor,
    ) -> tuple:
        """
        Perform a single validation step without updating the model.
        
        Args:
            x (torch.Tensor): Input tensor (e.g., photos).
            y (torch.Tensor): Target tensor (e.g., Monet paintings).
            
        Returns:
            tuple: Generator loss and discriminator loss.
        """
        # Generate fake images and classify them with the discriminator
        fake = self.discriminator(
            self.generator(x)
        )
        # Classify real images with the discriminator
        real = self.discriminator(y)
        # Calculate generator loss: fake images should be classified as ones
        g_loss = torch.nn.functional.mse_loss(fake, torch.ones_like(fake, device=self.device))
        # Calculate discriminator loss: real images as ones and fake as zeros
        d_loss = (
            torch.nn.functional.mse_loss(real, torch.ones_like(real, device=self.device)) + 
            torch.nn.functional.mse_loss(fake, torch.zeros_like(fake, device=self.device))
        )
        return g_loss.item(), d_loss.item()  # Return scalar losses

    @torch.no_grad()
    def log_metrics(
        self,
        step: int,
        train_g_loss: torch.Tensor,
        train_d_loss: torch.Tensor,
        val_g_loss: torch.Tensor,
        val_d_loss: torch.Tensor,
    ):
        """
        Log the training and validation metrics for the current epoch.
        
        Args:
            step (int): Current epoch number.
            train_g_loss (torch.Tensor): Cumulative generator loss during training.
            train_d_loss (torch.Tensor): Cumulative discriminator loss during training.
            val_g_loss (torch.Tensor): Cumulative generator loss during validation.
            val_d_loss (torch.Tensor): Cumulative discriminator loss during validation.
        """
        self.logs["Step"].append(step)  # Log the current epoch
        self.logs["Train_g_loss"].append(train_g_loss / len(self.train_data))  # Log average training generator loss
        self.logs["Train_d_loss"].append(train_d_loss / len(self.train_data))  # Log average training discriminator loss
        self.logs["Val_g_loss"].append(val_g_loss / len(self.val_data))  # Log average validation generator loss
        self.logs["Val_d_loss"].append(val_d_loss / len(self.val_data))  # Log average validation discriminator loss
        self.logs["Samples"].append(make_grid(self.generator(self.z).cpu() * 0.5 + 0.5, normalize=True))  # Log generated sample images

    def save_model(self, full: bool = False):
        """
        Save the generator and discriminator models.
        
        Args:
            full (bool, optional): Whether to save the full model or just the state dictionaries. Defaults to False.
        """
        if full:
            # Save the entire generator and discriminator models
            torch.save(self.generator, Path(self.save_path) / "generator.pth")
            torch.save(self.discriminator, Path(self.save_path) / "discriminator.pth")
        else:
            # Save only the state dictionaries (weights) of the models
            torch.save(
                self.generator.state_dict(),
                Path(self.save_path) / "generator_weights.pth",
            )
            torch.save(
                self.discriminator.state_dict(),
                Path(self.save_path) / "discriminator_weights.pth",
            )

    def load_model(self, full: bool = False):
        """
        Load the generator and discriminator models.
        
        Args:
            full (bool, optional): Whether to load the full model or just the state dictionaries. Defaults to False.
        """
        if (
            full
            and (self.save_path / "generator.pth").is_file()
            and (self.save_path / "discriminator.pth").is_file()
        ):
            # Load the entire generator and discriminator models
            self.generator = torch.load(self.save_path / "generator.pth")
            self.discriminator = torch.load(self.save_path / "discriminator.pth")
        elif (
            (self.save_path / "generator_weights.pth").is_file() and
            (self.save_path / "discriminator_weights.pth").is_file()
        ):
            # Load only the state dictionaries (weights) of the models
            self.generator.load_state_dict(torch.load(self.save_path / "generator_weights.pth"))
            self.discriminator.load_state_dict(torch.load(self.save_path / "discriminator_weights.pth"))

In [None]:
!mkdir models

In [None]:
# Set the device to GPU if available, otherwise default to CPU
device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize the Trainer with the necessary components and parameters
trainer = Trainer(
    train_data=train_dataloader,    # DataLoader for the training dataset
    val_data=test_dataloader,       # DataLoader for the validation dataset
    generator=Generator(in_channels=3, features=64),  # Instantiate the Generator model with 3 input channels and 64 feature maps
    discriminator=Discriminator(),  # Instantiate the Discriminator model with default parameters
    nb_epochs=20,                   # Set the number of training epochs to 20
    device=device,                  # Specify the device ('cuda' or 'cpu') for training
    save_path='./models/'           # Directory path where the trained models will be saved
)

trainer.load_model()       # Load pre-trained model weights from the specified save path, if available
trainer.init_optimizers()  # Initialize the optimizers for both the generator and discriminator models
logs = trainer.train()     # Begin the training process and store the training logs for later analysis

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(20, 4))
ax[0].plot(logs["Step"], logs["Train_g_loss"], label="Train_g_loss")
ax[1].plot(logs["Step"], logs["Train_d_loss"], label="Train_d_loss")
ax[0].plot(logs["Step"], logs["Val_g_loss"], label="Val_g_loss")
ax[1].plot(logs["Step"], logs["Val_d_loss"], label="Val_d_loss")
ax[0].set_title("Generator loss")
ax[1].set_title("Discriminator loss")
ax[0].legend()
ax[1].legend()
plt.show()

In [None]:
plt.imshow(photos[1].permute(1, 2, 0).numpy() * 0.5 + 0.5)

In [None]:
fake_img = trainer.generator((photos[1].unsqueeze(0)).to('cuda'))
fake_img = fake_img[0].cpu().detach()
plt.imshow(fake_img.permute(1, 2, 0).numpy() * 0.5 + 0.5  )

## Submmission

In [None]:
!mkdir ../images

In [None]:
with torch.no_grad():
    for i, (_,img) in enumerate(DataLoader(dataset, batch_size=1, shuffle=False)):
        prediction = trainer.generator(img.to(device))[0].cpu().permute(1, 2, 0).numpy()* 0.5 + 0.5
        im = Image.fromarray((prediction * 255).astype(np.uint8))
        im.save("../images/" + str(i) + ".jpg")

In [None]:
import shutil
shutil.make_archive("/kaggle/working/images", 'zip', "/kaggle/images")

## Conclusion

The Monet-Photo Image Translation Pipeline offers a robust framework for transforming photographs into Monet-style paintings using GANs. Through meticulous data preparation, sophisticated model architectures, and a structured training regimen, the pipeline achieves high-quality image translations that preserve both the content and artistic style of the input images. This system not only serves as a valuable tool for digital artists and researchers but also demonstrates the potential of GANs in creative and artistic applications. Future enhancements may include incorporating more diverse artistic styles, improving model scalability, and integrating advanced loss functions to further refine the quality of generated images.
