# Week 5 Peer Review
## Introduction to Project
### What Problem am I solving
The purpose of this project is to explore the full capabilities of GAN's. For this project we are given data from a data file of Monet painting and another file of other photos. Since Modet has a very distinctive feel that he gives to all of his paintings, the goal of the project is to assign his "look" to the file of random photos. At the end of this project, I will have created a zip file that includes the random photos after they've been transformed into the style of Monet.
### Data Description
As I mentioned previously, we are given two sets of data in the form of photos. The Monet photos that are included are photos of his paintings. In total, there are 300 of his paintings. The other dataset has random photos that are going to be used/converted by the GAN to be a Monet style painted photo.

## Exploratory Data Analysis Strategy
Exploratory Data Analysis (EDA) is essential for understanding the characteristics of the data, identifying patterns, and gaining insights that can be leveraged during model training and evaluation. Here's a good EDA strategy for this project:
1. Load and Inspect the Data - the goal with this step is to display a few images so that I can better understand what the proper "look" I'm going for with my GAN
2. Visualize the color distribution - the goal with this is to be able to see what types of colors are commonly used to paint Monets. This is also helpful because it can be difficult for the naked eye to see the real distribution of a photo when you're looking at all of the photo as a whole. A distribution of the colors in the paintings will help to see the distribution of the actual pixel values.
3. Understand the dataset characteristics - this step will give me a high-level view of what the data looks like and will provide meta information on the files.

In [22]:
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from tqdm.notebook import tqdm
import torch.optim as optim
import torch.nn as nn
from PIL import Image
import torchvision
import shutil
import torch
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from torchvision.utils import make_grid

In [None]:
monet_dir = "/kaggle/input/gan-getting-started/monet_jpg"
monet_images = os.listdir(monet_dir)
print("Number of Monet paintings:", len(monet_images))

photo_dir = "/kaggle/input/gan-getting-started/photo_jpg"
photo_images = os.listdir(photo_dir)
print("Number of photos:", len(photo_images))

# Analyze image sizes
monet_sizes = []
for img_name in monet_images:
    img = plt.imread(os.path.join(monet_dir, img_name))
    monet_sizes.append(img.shape)

photo_sizes = []
for img_name in photo_images:
    img = plt.imread(os.path.join(photo_dir, img_name))
    photo_sizes.append(img.shape)

# Plot size distributions
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.hist(monet_sizes, bins=20, edgecolor='black')
plt.title('Monet Image Size Distribution')
plt.xlabel('Size')
plt.ylabel('Frequency')

plt.subplot(1, 2, 2)
plt.hist(photo_sizes, bins=20, edgecolor='black')
plt.title('Photo Image Size Distribution')
plt.xlabel('Size')
plt.ylabel('Frequency')

plt.show()

Number of Monet paintings: 300
Number of photos: 7038


In [None]:
def plot_color_distribution(image_dir, title):
    r, g, b = [], [], []
    for img_name in os.listdir(image_dir):
        img = plt.imread(os.path.join(image_dir, img_name))
        r.append(np.mean(img[:,:,0]))
        g.append(np.mean(img[:,:,1]))
        b.append(np.mean(img[:,:,2]))
    
    plt.figure(figsize=(15,5))
    plt.subplot(1, 3, 1)
    plt.hist(r, bins=30, color='r', alpha=0.7)
    plt.xlabel('Red Intensity')
    plt.ylabel('Frequency')
    plt.title(f'Red Intensity Distribution - {title}')

    plt.subplot(1, 3, 2)
    plt.hist(g, bins=30, color='g', alpha=0.7)
    plt.xlabel('Green Intensity')
    plt.ylabel('Frequency')
    plt.title(f'Green Intensity Distribution - {title}')

    plt.subplot(1, 3, 3)
    plt.hist(b, bins=30, color='b', alpha=0.7)
    plt.xlabel('Blue Intensity')
    plt.ylabel('Frequency')
    plt.title(f'Blue Intensity Distribution - {title}')
    
    plt.tight_layout()
    plt.show()

plot_color_distribution(monet_dir, 'Monet Paintings')
plot_color_distribution(photo_dir, 'Photos')

In [None]:
class PaintingsDataset(Dataset):

    def __init__(self, folder_path, transform=None):
        self.folder_path = folder_path
        self.transform = transform
        self.images = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith(('.jpg', '.jpeg', '.png'))]

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        image_path = self.images[idx]
        image = Image.open(image_path).convert('RGB')
        if self.transform:
            image = self.transform(image)
        return image
    
    def __get_crude_item__(self, idx):
        image_path = self.images[idx]
        image = Image.open(image_path).convert('RGB')
        return image

## Model Architecture Description
For my model, I decided to use two PyTorch models for a CycleGAN: a discriminator and a generator. 

The Discriminator class aims to distinguish between real images and those generated by the generator. This neural network will learn to classify images as real or fake. The discriminator uses a series of convolutional layers to downsample the image. Each block of downsampling reduces the spatial dimensions while increasing the number of channels. After several downsampling blocks, the discriminator performs upsampling. Finally, a convolutional layer is applied to produce the output.

The Generator class is responsible for generating images from input images. It learns to translate images from one domain to another. The generator takes the input image and converts it into another domain using a series of downsample and upsample blocks. The downsample blocks reduce the spatial dimensions of the input, while the upsample blocks increase the dimensions. Residual blocks help in maintaining the identity of the image during the transformation process. The output layer produces the final image.

### Hyperparameter Tuning
While setting up my notebook and researching other methods of building this GAN, I saw that being able to use a TPU for analysis was crucial for running each code block for model development in a reasonable amount of time. Unfortunately, when I enable the TPU, none of the blocks run and the keras library throws an error on import, so I am stuck using the GPU for this assignment.  In order to run the code below the first time, I had to wait over 2 hours for it to run, so I decided to do hyperparameter research based on what other high scoring notebooks had used and made my next attempt my best attempt. I also googled what epoch, learning rate, and batch sizes worked best for image processing and for GAN creation and actually found that higher epochs were crucial for GAN projects.

For this reason, I decided to get an epoch size of 100. This high epoch size allowed the GAN to continue learning about the image qualities and important features that are necessary for building new images in a Monet style. The batch size was able to be a smaller, more normal size at 6 since we did not have too many Monet paintings to train on. 

In [None]:
# Device and tensors configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.Tensor

# Hyperparameters
batch_size = 6
lr = 2e-4
n_epoches = 100
display_epoch = 10

transform = transforms.Compose([
    transforms.Resize((256, 256)),

    # Data augmentation 
    #transforms.ElasticTransform(alpha=150.0),
    transforms.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 3.)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.15, 0.15, 0.15], std=[0.5, 0.5, 0.5])
])

# Initialize datasets
monet_dataset = PaintingsDataset('/kaggle/input/gan-getting-started/monet_jpg', transform)
photo_dataset = PaintingsDataset('/kaggle/input/gan-getting-started/photo_jpg', transform)


# Initialize data loaders
monet_loader = DataLoader(monet_dataset, batch_size=batch_size, shuffle=True)
photo_loader = DataLoader(photo_dataset, batch_size=batch_size, shuffle=True)

In [None]:
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()

        channels_img = 3
        features_d = 64
        
        self.model = nn.Sequential(
            # Input : N x 3 x 256 x 256
            nn.Conv2d(channels_img, features_d, kernel_size=4, stride=2, padding=1),
            nn.LeakyReLU(0.2),

            # Downsampling blocks
            self._block(features_d, features_d * 2, 4, 2, 1),
            self._block(features_d * 2, features_d * 4, 4, 2, 1),
            self._block(features_d * 4, features_d * 8, 4, 2, 1),

            # Upscaling blocks
            self._conv2d_transpose(features_d * 8, features_d * 4, 3, 2, 1, 1),
            self._conv2d_transpose(features_d * 4, features_d * 2, 3, 2, 1, 1),
            self._conv2d_transpose(features_d * 2, features_d, 3, 2, 1, 1),

            # Final convolutional layer
            nn.Conv2d(features_d, 1, kernel_size=3, stride=2, padding=1)
        )

    def _block(self, in_channels, out_channels, kernel_size, stride, padding):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias=False),
            nn.InstanceNorm2d(out_channels, affine=True),
            nn.LeakyReLU(0.2),
        )

    def _conv2d_transpose(self, in_channels, out_channels, kernel_size, stride, padding, output_padding):
        return nn.Sequential(
            nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride, padding, output_padding, bias=False),
            nn.InstanceNorm2d(out_channels, affine=True),
            nn.LeakyReLU(0.2),
        )

    def forward(self, x):
        return self.model(x)
    
    
class Generator(nn.Module):
    def __init__(self, num_residual_blocks=9):
        super(Generator, self).__init__()
        
        channels_img = 3
        features_g = 64
        
        # Initial convolution layer
        self.initial = nn.Sequential(
            nn.ReflectionPad2d(3),
            nn.Conv2d(channels_img, features_g, kernel_size=7, stride=1, padding=0),
            nn.InstanceNorm2d(features_g),
            nn.ReLU(inplace=True)
        )

        # Downsample layers
        self.downsample_blocks = nn.Sequential(
            self._downsample_block(features_g, features_g * 2),
            self._downsample_block(features_g * 2, features_g * 4)
        )

        # Residual blocks
        self.residual_blocks = nn.Sequential(
            *[ResidualBlock(features_g * 4) for _ in range(num_residual_blocks)]
        )

        # Upsample layers
        self.upsample_blocks = nn.Sequential(
            self._upsample_block(features_g * 4, features_g * 2),
            self._upsample_block(features_g * 2, features_g)
        )

        # Output layer
        self.output = nn.Sequential(
            nn.ReflectionPad2d(3),
            nn.Conv2d(features_g, channels_img, kernel_size=7, stride=1, padding=0),
            nn.Tanh()
        )

    def _downsample_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=2, padding=1),
            nn.InstanceNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def _upsample_block(self, in_channels, out_channels):
        return nn.Sequential(
            nn.ConvTranspose2d(in_channels, out_channels, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.InstanceNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        x = self.initial(x)
        x = self.downsample_blocks(x)
        x = self.residual_blocks(x)
        x = self.upsample_blocks(x)
        return self.output(x)

class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super(ResidualBlock, self).__init__()
        self.block = nn.Sequential(
            nn.ReflectionPad2d(1),
            nn.Conv2d(channels, channels, 3),
            nn.InstanceNorm2d(channels),
            nn.ReLU(inplace=True),
            nn.ReflectionPad2d(1),
            nn.Conv2d(channels, channels, 3),
            nn.InstanceNorm2d(channels)
        )

    def forward(self, x):
        return x + self.block(x)

In [None]:
G = Generator()
D = Discriminator()

models = [G, D]
for model in models:
    if torch.cuda.is_available():
        model.cuda()

optimizer_G = torch.optim.RMSprop(G.parameters(), lr=lr)
optimizer_D = torch.optim.RMSprop(D.parameters(), lr=lr)

In [None]:
def sample_images(real_X):    
    # turn off some learning parameters in order to evaluate model:
    G.eval()
    
    # transform to tensor and plot images
    real_pic = real_X.type(Tensor)
    artificial_pic = G(real_X).detach()

    ncols = real_X.size(0)
    real_pic_grid = make_grid(real_pic, nrow=ncols, normalize=True)
    artificial_pic_grid = make_grid(artificial_pic, nrow=ncols, normalize=True)

    fig, axs = plt.subplots(2, 1, figsize=(20, 4))  

    axs[0].imshow(real_pic_grid.permute(1, 2, 0).cpu())
    axs[0].set_title("Augmented real photos from photo dataset")
    axs[0].axis('off')

    axs[1].imshow(artificial_pic_grid.permute(1, 2, 0).cpu())
    axs[1].set_title("Generated monet paintings from photos")
    axs[1].axis('off')


    plt.show()

In [None]:
CRITIC_ITERATIONS = 5
WEIGHT_CLIP = 0.01
lambda_identity = 5
identity_loss_function = nn.L1Loss()

for epoch in tqdm(range(n_epoches)):
    for i, (real_photo, real_monet) in enumerate(zip(photo_loader, monet_loader)):

        # Prepare the real and generated data
        real_photo = real_photo.type(Tensor)
        real_monet = real_monet.type(Tensor)

        # Train the Critic - We wish to maximize the difference between the fake and the real images - Max(E[critic(real)] - E[critic(fake)])
        for _ in range(CRITIC_ITERATIONS):
            optimizer_D.zero_grad()

            # Generate a batch of images
            artificial_monet = G(real_photo)

            # Critic scores for real and fake images
            critic_real = D(real_monet).reshape(-1)
            critic_fake = D(artificial_monet.detach()).reshape(-1)

            # Calculate the loss for the critic
            loss_critic = -(torch.mean(critic_real) - torch.mean(critic_fake))
            loss_critic.backward(retain_graph=True)
            optimizer_D.step()

            # Weight clipping - to satisfy lipshitz constraint
            for p in D.parameters():
                p.data.clamp_(-WEIGHT_CLIP, WEIGHT_CLIP)

        # Train the Generator - We wish to minimize 
        optimizer_G.zero_grad()

        # Since we just updated D, perform another forward pass of all-fake batch through D
        gen_fake = D(artificial_monet).reshape(-1)

        # Calculate G's loss
        loss_G_Wasserstein = -torch.mean(gen_fake)

        # Calculate identity loss
        identity_loss = identity_loss_function(artificial_monet, real_photo)

        # Total generator loss
        loss_G = loss_G_Wasserstein + lambda_identity * identity_loss
        
        loss_G.backward()
        optimizer_G.step()
        
    if (epoch + 1) % display_epoch == 0:
        test_real_photo, test_real_monet = next(iter(zip(photo_loader, monet_loader)))
        sample_images(test_real_photo.type(Tensor))

        print(f'Epoch {epoch + 1}/{n_epoches}')
        print(f'Generator loss: {loss_G.item():.4f}')
        print(f'Discriminator (Critic) loss: {loss_critic.item():.4f}')
    

## Results and Analysis
### Hyperparameter and Model Architecture Comparison Discussion
As mentioned previously, I used CycleGAN to generate Monet-style paintings from the real photos dataset. The Generator consists of an initial convolutional layer, downsample layers, residual blocks, upsample layers, and an output layer. Initial Convolution Layer: Applies convolution with a large kernel size (7x7) to the input image. Downsample Layers: These layers downsample the input image. Residual Blocks: These blocks preserve the identity of the input image. Upsample Layers: These layers upsample the image back to the original size. Output Layer: Applies a convolutional layer with a tanh activation function to produce the final generated image. The Discriminator consists of convolutional layers followed by LeakyReLU activations. These layers aim to distinguish between real images and those generated by the generator.

Some hyperparameters that I chose to tweak on my submission run were critic iterations, weight clip, lambda identity, identity loss function, numner of epochs, and learning rate. Critic iterations are the number of iterations to train the critic for each generator iteration. It sets the number of times the discriminator is trained before the generator in order to update the discriminator weights. Weight clip is the value to satisfy the Lipschitz constraint. Lambda identity weight for the identity loss. Identity loss function is the loss function used to calculate the identity loss. Learning rate is used for the RMSprop optimizer.

## Conclusions
### Results discussion
Once my model was submitted, I was able to obtain a score of 91.01% accuracy. 

## Code to Generate Submission File

In [None]:
# create path to save created monet paintings
transformed_save_dir = '../images'
if not os.path.exists(transformed_save_dir):
    os.makedirs(transformed_save_dir)
    
# freeze some network params to generate imgs
G.eval()


for i, real_photos in enumerate(photo_loader):
    real_photos = real_photos.to(device)
    with torch.no_grad():
        monet_style_imgs = G(real_photos)

    # Save each transformed image
    for j, img in enumerate(monet_style_imgs):
        save_path = os.path.join(transformed_save_dir, f'monet_painting_{i * batch_size + j}.png')
        torchvision.utils.save_image(img, save_path)

print(f"Transformed images are saved in {transformed_save_dir}")

shutil.make_archive("/kaggle/working/images", 'zip', "/kaggle/images")