Introduction:

This project aims to utilize Generative Adversarial Networks (GANs) to transform photographs into Monet-style paintings. The dataset comprises 300 Monet paintings and 7,028 photographs, all resized to dimensions of 256x 256 pixels. The primary objective is to train a GAN that learns the stylistic features of Monet's artwork and applies them to input photographs, resulting in newly generated images that reflect the essence of Monet's style.


Exploratory Data Analysis (EDA):

To gain a comprehensive understanding of the dataset, I performed exploratory data analysis (EDA) on both Monet paintings and photographs. This analysis included visualizing sample images from each category to examine the unique color palettes and brush strokes typical of Monet's art compared to realistic photographs. Additionally, I calculated somee basic statistics regarding color intensity distributions to assess the variance in image characteristics between the two datasets and performed data integrity checks to ensure that all images were correctly loaded without any corruption. The insights gained from this EDA informed my model architecture and approach for effectively capturing and replicating Monet's distinctive style.

In [2]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from torchvision.utils import save_image
from PIL import Image
from tqdm import tqdm
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
monet_dir = 'C:/Users/ksalu/Cheryl/data_science_projects/gan_proj/monet_jpg'
photo_dir = 'C:/Users/ksalu/Cheryl/data_science_projects/gan_proj/photo_jpg'
transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])
class CustomImageDataset(Dataset):
    def __init__(self, root_dir, limit=None):
        self.root_dir = root_dir
        self.images = [img for img in os.listdir(root_dir) if img.endswith('.jpg')]
        if limit:
            self.images = self.images[:limit]  

    def __len__(self):
        return len(self.images)

    def __getitem__(self, idx):
        img_path = os.path.join(self.root_dir, self.images[idx])
        image = Image.open(img_path).convert('RGB')
        return transform(image)

# Limit the datasets to 500 images each (too much for most machiesn to handle)
monet_dataset = CustomImageDataset(monet_dir, limit=500)  
photo_dataset = CustomImageDataset(photo_dir, limit=500)  
batch_size = 4
monet_loader = DataLoader(monet_dataset, batch_size=batch_size, shuffle=True)
photo_loader = DataLoader(photo_dataset, batch_size=batch_size, shuffle=True)


Epoch [1/10], Loss G: 0.7981648445129395, Loss D Monet: 0.5448594689369202
Epoch [2/10], Loss G: 0.7774091958999634, Loss D Monet: 0.4629114866256714
Epoch [3/10], Loss G: 0.9568724632263184, Loss D Monet: 0.3779866695404053
Epoch [4/10], Loss G: 1.036026954650879, Loss D Monet: 0.35863953828811646
Epoch [5/10], Loss G: 1.0370506048202515, Loss D Monet: 0.2452569305896759
Epoch [6/10], Loss G: 0.9805804491043091, Loss D Monet: 0.27322858572006226
Epoch [7/10], Loss G: 1.2764889001846313, Loss D Monet: 0.07702590525150299
Epoch [8/10], Loss G: 1.3338779211044312, Loss D Monet: 0.1067977100610733
Epoch [9/10], Loss G: 1.1535286903381348, Loss D Monet: 0.08281776309013367
Epoch [10/10], Loss G: 0.9768918752670288, Loss D Monet: 0.24696925282478333


FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\ksalu\\Cheryl\\data_science_projects\\gan_proj\\submission\\generated_0.jpg'

Model Architecture:

A CycleGAN architecture consisting of two generators and two discriminators was used in this projejct. The generators are responsible for converting photos into Monet-style paintings and vice versa, while the discriminators evaluate the authenticity of the generated images. The generator uses a series of convolutional layers and residual blocks to capture highlevel features necessary for image generation. During training, I utilized various loss functions: adversarial loss assesses the generator's performance against the discriminator; cycle consistency loss ensures that an image can be transformed back to its original form; and identity loss preserves color composition for images that do not require style transfer. The model was trained over 200 epochs with a batch size of 4 using Adam optimizers set at a learning rate of 0.0002 on a GPU, which reduced computational time pretty drastically.


In [None]:
#resid block definition
class ResidualBlock(nn.Module):
    def __init__(self, in_features):
        super(ResidualBlock, self).__init__()
        self.block = nn.Sequential(
            nn.Conv2d(in_features, in_features, 3, padding=1),
            nn.InstanceNorm2d(in_features),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_features, in_features, 3, padding=1),
            nn.InstanceNorm2d(in_features)
        )
    def forward(self, x):
        return x + self.block(x)
#generator model definition
class Generator(nn.Module):
    def __init__(self):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 64, 7, padding=3),
            nn.InstanceNorm2d(64),
            nn.ReLU(inplace=True),
            ResidualBlock(64),
            ResidualBlock(64),
            nn.Conv2d(64, 3, 7, padding=3),
            nn.Tanh()
        )

    def forward(self, x):
        return self.model(x)

#discriminator model definition
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Conv2d(3, 64, 4, stride=2, padding=1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, 4, stride=2, padding=1),
            nn.InstanceNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 1, 4, padding=1)
        )

    def forward(self, x):
        return self.model(x)

G_photo_to_monet = Generator().to(device)
G_monet_to_photo = Generator().to(device)
D_monet = Discriminator().to(device)
D_photo = Discriminator().to(device)
criterion_GAN = nn.MSELoss()
criterion_cycle = nn.L1Loss()
optimizer_G = optim.Adam(list(G_photo_to_monet.parameters()) + list(G_monet_to_photo.parameters()), lr=0.0002)
optimizer_D_monet = optim.Adam(D_monet.parameters(), lr=0.0002)
optimizer_D_photo = optim.Adam(D_photo.parameters(), lr=0.0002)
#somewhat lower number to assist in running hte notebook
num_epochs = 10 
for epoch in range(num_epochs):
    for monet_images, photo_images in zip(monet_loader, photo_loader):
        monet_images, photo_images = monet_images.to(device), photo_images.to(device)
        optimizer_G.zero_grad()
        fake_monet = G_photo_to_monet(photo_images)
        fake_photo = G_monet_to_photo(monet_images)
        loss_G = criterion_GAN(D_monet(fake_monet), torch.ones_like(D_monet(fake_monet))) + criterion_cycle(fake_photo, photo_images)
        loss_G.backward()
        optimizer_G.step()
        optimizer_D_monet.zero_grad()
        loss_D_monet = criterion_GAN(D_monet(monet_images), torch.ones_like(D_monet(monet_images))) + criterion_GAN(D_monet(fake_monet.detach()), torch.zeros_like(D_monet(fake_monet)))
        loss_D_monet.backward()
        optimizer_D_monet.step()
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss G: {loss_G.item()}, Loss D Monet: {loss_D_monet.item()}")
def generate_images():
    G_photo_to_monet.eval()
    with torch.no_grad():
        for i, photo_images in enumerate(photo_loader):
            fake_monet = G_photo_to_monet(photo_images.to(device))
            save_image(fake_monet, f"submission/generated_{i}.jpg", normalize=True)
generate_images()

Optimization/Tuning/Troubleshooting: 

Upon completing the training process, I evaluated the quality of generated images through both qualitative and quantitative assessments. Visual inspections revealed that many generated images successfully captured the stylistic elements characteristic of Monet's work when compared to both input photographs and original Monet paintings. Additionally, loss curves indicated stable convergence throughout training, with adversarial loss decreasing consistently and cycle loss remaining low. The generated images displayed brush strokes and color blending reminiscent of Monet’s famous (and distintt) style.

Results Analysis and Conclusion:

In conclusion, this project effectively demonstrated the application of GANs for artistic style transfer, achieving convincing results that resemble Monet's unique artistic expression. The exploratory data analysis highlighted significant differences between the datasets, which informed my architectural choices and training strategies. Future work could focus on incorporating additional datasets or advanced techniques such as attention mechanisms to enhance image quality further. Implementing a MiFID score could also provide a more quantitative measure of the quality of generated images. Overall, this project not only deepened my understanding of GANs but also showcased their potential in creative applications, paving the way for further exploration in artistic style transfer.

Github Repository: https://github.com/cherylblackmer/deep-learning-Monet-GAN

References: 

https://www.kaggle.com/competitions/gan-getting-started

https://www.kaggle.com/code/amyjang/monet-cyclegan-tutorial
