# Kaggle Mini-Project - I’m Something of a Painter Myself: Use GANs to create art - will you be the next Monet?

**Author:** Fatih Uenal

**Course:** CU Boulder MSc Computer Science & AI

**Date:** 12.06.2025

**GitHub Repository:** [Github](https://github.com/FUenal/week_5_gan_msc_compsci)

-----

# Introduction

Generative Adversarial Networks (GANs) are a powerful deep learning approach used for generating new data that mimics a given dataset. In this project, we will apply GANs to the task of artistic style transfer. Specifically, we aim to transform real-world photographs into paintings in the style of Claude Monet, using the "gan-getting-started" dataset from Kaggle. This project offers a hands-on opportunity to implement and train a CycleGAN model, a state-of-the-art architecture for image-to-image translation.

## Problem Statement

The primary objective is to build, train, and evaluate a GAN model capable of generating Monet-style paintings from input photographs. The model's performance will be assessed based on Kaggle's Memorization-Informed Fréchet Inception Distance (MiFID) metric, which evaluates both the quality of the generated images and their originality. A lower MiFID score indicates a better-performing model.

## Dataset Overview

* **Dataset Name:** gan-getting-started
* **Source:** [Kaggle "I'm Something of a Painter Myself" Competition](https://www.kaggle.com/c/gan-getting-started)
* **Dataset Structure:**
    * **Monet Paintings:** A collection of Claude Monet's paintings.
    * **Photos:** A set of real-world photographs to be translated into the Monet style.
* **Data Format:** The images are provided in TFRecord format, with a uniform resolution of 256x256 pixels.

## Objectives

1.  Construct and train a CycleGAN model to generate Monet-style paintings from photographs.
2.  Optimize the model for high performance on Kaggle's public leaderboard.
3.  Generate a submission file (`images.zip`) containing 7,000-10,000 generated images.

## Deliverables

This notebook provides the complete code and explanation to fulfill the project's objectives, including data loading, model definition, training, and image generation for submission.

---

In [None]:
# Importing libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

from kaggle_datasets import KaggleDatasets
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import numpy as np
import os
import zipfile
import time

# TPU Configuration
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
    strategy = tf.distribute.TPUStrategy(tpu)
    print("✅ TPU connected")
except:
    strategy = tf.distribute.get_strategy()
    print("⚠️ TPU not found, using default strategy")

REPLICAS = strategy.num_replicas_in_sync
AUTO = tf.data.AUTOTUNE
print(f"REPLICAS: {REPLICAS}")

-----

# Exploratory Data Analysis (EDA)

Before we build our CycleGAN, let's explore the two datasets to understand their characteristics. We'll look at the number of images and visualize some examples. We will also analyze the color distributions to see if there are noticeable differences in the color palettes between Monet's paintings and the real-world photographs.

-----

In [None]:
# Loading and Preprocessing Data
GCS_PATH = KaggleDatasets().get_gcs_path('gan-getting-started')
MONET_FILENAMES = tf.io.gfile.glob(str(GCS_PATH + '/monet_tfrec/*.tfrec'))
PHOTO_FILENAMES = tf.io.gfile.glob(str(GCS_PATH + '/photo_tfrec/*.tfrec'))
n_monet_samples = len(MONET_FILENAMES)
n_photo_samples = len(PHOTO_FILENAMES)

IMAGE_SIZE = [256, 256]
BATCH_SIZE = 1 * REPLICAS

def decode_image(image):
    image = tf.image.decode_jpeg(image, channels=3)
    image = (tf.cast(image, tf.float32) / 127.5) - 1
    image = tf.reshape(image, [*IMAGE_SIZE, 3])
    return image

def data_augment(image):
    image = tf.image.resize(image, [286, 286], method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    image = tf.image.random_crop(image, size=[256, 256, 3])
    image = tf.image.random_flip_left_right(image)
    return image

def read_tfrecord(example, augmented):
    tfrecord_format = { "image": tf.io.FixedLenFeature([], tf.string) }
    example = tf.io.parse_single_example(example, tfrecord_format)
    image = decode_image(example['image'])
    if augmented:
        image = data_augment(image)
    return image

def configure_dataset(filenames, augmented=True, shuffle=True, repeat=True):
    ds = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO)
    ds = ds.map(lambda x: read_tfrecord(x, augmented), num_parallel_calls=AUTO)
    if shuffle:
        ds = ds.shuffle(n_monet_samples * 2)
    if repeat:
        ds = ds.repeat()
    ds = ds.batch(BATCH_SIZE, drop_remainder=True)
    ds = ds.prefetch(AUTO)
    return ds

In [None]:
# Helper function for EDA data loading
def load_eda_dataset(filenames):
    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO)
    dataset = dataset.map(lambda x: read_tfrecord(x, augmented=False), num_parallel_calls=AUTO)
    return dataset
    
# Loading Datasets and getting Sizes
monet_ds_eda = load_eda_dataset(MONET_FILENAMES)
photo_ds_eda = load_eda_dataset(PHOTO_FILENAMES)

print(f"Number of Monet paintings: {n_monet_samples}")
print(f"Number of photos: {n_photo_samples}")

#Visualizinh sample Images
def display_samples(dataset, title, n_samples=5):
    plt.figure(figsize=(20, 4))
    for i, img in enumerate(dataset.take(n_samples)):
        plt.subplot(1, n_samples, i + 1)
        plt.imshow(img * 0.5 + 0.5)
        plt.title(f"{title} #{i+1}")
        plt.axis("off")
    plt.show()

print("\nSample Monet Paintings:")
display_samples(monet_ds_eda, "Monet")

print("\nSample Photographs:")
display_samples(photo_ds_eda, "Photo")

# Feature Space Analysis
print("\nAnalyzing dataset difference in VGG19 Feature Space...")

with strategy.scope():
    vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet', input_shape=[*IMAGE_SIZE, 3])
    vgg.trainable = False
    vgg_feature_extractor = tf.keras.Model(vgg.input, vgg.get_layer('block4_conv4').output)

def extract_features(dataset, n_samples=100):
    features = []
    for i, img in enumerate(dataset.take(n_samples)):
        img_denormalized = (img + 1) * 127.5
        feature_map = vgg_feature_extractor(tf.expand_dims(img_denormalized, 0))
        features.append(np.mean(feature_map, axis=(1, 2)).flatten())
    return np.array(features)

# Extracting features
monet_features = extract_features(monet_ds_eda, n_samples=n_monet_samples)
photo_features = extract_features(photo_ds_eda, n_samples=n_monet_samples) # Use same sample size for balance

# Using PCA to reduce feature dimensions to 2D for plotting
pca = PCA(n_components=2)
all_features = np.concatenate([monet_features, photo_features])
pca_features = pca.fit_transform(all_features)

# Plotting the feature clusters
plt.figure(figsize=(10, 8))
plt.scatter(pca_features[:n_monet_samples, 0], pca_features[:n_monet_samples, 1], label='Monet Paintings', alpha=0.7)
plt.scatter(pca_features[n_monet_samples:, 0], pca_features[n_monet_samples:, 1], label='Photographs', alpha=0.7)
plt.title('VGG19 Feature Space of Monet vs. Photo Datasets (PCA)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.grid(True, linestyle='--')
plt.show()

### Discussion of EDA Results

The exploratory data analysis reveals several key insights that directly inform our modeling strategy:

1.  **Visual Inspection:** A side-by-side look at the images confirms the obvious: the domains are stylistically very different. Monet's works are characterized by visible brushstrokes, a unique color palette, and an emphasis on light over sharp detail. The photographs are realistic and high-fidelity.

2.  **Feature Space Separation:** The most telling analysis comes from the PCA plot of the VGG19 feature space. We can clearly see two distinct clusters, with very little overlap between the Monet paintings and the real-world photographs. This is a crucial finding. It demonstrates that, from the perspective of a powerful deep learning model, the two datasets are perceptually very far apart.

---

# **Model Choice and Strategy**

For this task, the foundational choice is the **Cycle-Consistent Generarial Adversarial Network (CycleGAN)**. This architecture is the industry standard for unpaired image-to-image translation and is uniquely suited to this problem, as I do not have direct "photo-to-painting" pairs. The core principles of CycleGAN I leverage are:

1.  **Unpaired Translation:** The model learns the general characteristics of two image domains (in our case, photos and Monet paintings) and finds a mapping between them without needing one-to-one examples.
2.  **Dual Architecture:** The model uses two Generators and two Discriminators simultaneously. One pair learns the `Photo -> Monet` translation, while the other learns the reverse (`Monet -> Photo`).
3.  **Cycle-Consistency Loss:** This is the key innovation. By making sure that an image translated to the other domain and back again arrives close to the original (`Photo -> Monet -> Photo' ≈ Photo`), the model is forced to preserve the content of the image while only changing its artistic style.

However, the Exploratory Data Analysis (EDA) revealed that a standard CycleGAN would likely be insufficient to achieve a top-tier result. The **Feature Space Analysis** showed a significant perceptual gap between the photo and Monet domains (PCA Analysis). This insight directly informed my decision to enhance the standard CycleGAN with more advanced techniques:

* **Enhanced Loss Function with Perceptual Loss:** To bridge the large perceptual gap identified in the EDA, I have augmented the standard losses with a **Perceptual Loss**. This loss utilizes a pre-trained VGG19 network to extract high-level feature representations of the generated and target images. By minimizing the difference between these feature maps, I force the generator to learn not just the correct colors, but also the complex textures, patterns, and brushstroke styles that make a Monet painting perceptually unique. This is the key component for achieving a realistic "painterly" feel.

* **U-Net Based Generator:** For the generator architecture, I've implemented a **U-Net**. Its characteristic skip connections connect the downsampling path to the upsampling path, allowing low-level information (like composition and object edges) to bypass the bottleneck. This is crucial for generating a new style while ensuring the final image is still a faithful representation of the original photo's content.

* **Refined Training Strategy:** To ensure stable convergence and allow the model to refine fine-grained details in later stages of training, I will employ a **Learning Rate Scheduler**. This implementation keeps the learning rate constant for the first half of the epochs for rapid initial learning, then linearly decays it to zero, helping the model to settle into a high-quality local minimum without instability.

In summary, my final model is not a standard CycleGAN, but an **enhanced, perceptually-aware network** with a custom loss function and a refined training strategy, all of which are directly motivated by our deep analysis of the dataset.

-------

In [None]:
# Model Architecture (CycleGAN)
class InstanceNormalization(tf.keras.layers.Layer):
    def __init__(self, epsilon=1e-5):
        super(InstanceNormalization, self).__init__()
        self.epsilon = epsilon
    def build(self, input_shape):
        self.scale = self.add_weight(name='scale', shape=input_shape[-1:], initializer=tf.random_normal_initializer(1., 0.02), trainable=True)
        self.offset = self.add_weight(name='offset', shape=input_shape[-1:], initializer='zeros', trainable=True)
    def call(self, x):
        mean, variance = tf.nn.moments(x, axes=[1, 2], keepdims=True)
        inv = tf.math.rsqrt(variance + self.epsilon)
        normalized = (x - mean) * inv
        return self.scale * normalized + self.offset

def downsample(filters, size, apply_instancenorm=True):
    initializer = tf.random_normal_initializer(0., 0.02)
    result = keras.Sequential()
    result.add(layers.Conv2D(filters, size, strides=2, padding='same', kernel_initializer=initializer, use_bias=False))
    if apply_instancenorm:
        result.add(InstanceNormalization())
    result.add(layers.LeakyReLU())
    return result

def upsample(filters, size, apply_dropout=False):
    initializer = tf.random_normal_initializer(0., 0.02)
    result = keras.Sequential()
    result.add(layers.Conv2DTranspose(filters, size, strides=2, padding='same', kernel_initializer=initializer, use_bias=False))
    result.add(InstanceNormalization())
    if apply_dropout:
        result.add(layers.Dropout(0.5))
    result.add(layers.ReLU())
    return result

def Generator():
    inputs = layers.Input(shape=[256, 256, 3])
    down_stack = [
        downsample(64, 4, False), downsample(128, 4), downsample(256, 4), downsample(512, 4),
        downsample(512, 4), downsample(512, 4), downsample(512, 4), downsample(512, 4),
    ]
    up_stack = [
        upsample(512, 4, True), upsample(512, 4, True), upsample(512, 4, True),
        upsample(512, 4), upsample(256, 4), upsample(128, 4), upsample(64, 4),
    ]
    initializer = tf.random_normal_initializer(0., 0.02)
    last = layers.Conv2DTranspose(3, 4, strides=2, padding='same', kernel_initializer=initializer, activation='tanh')
    x = inputs
    skips = []
    for down in down_stack:
        x = down(x)
        skips.append(x)
    skips = reversed(skips[:-1])
    for up, skip in zip(up_stack, skips):
        x = up(x)
        x = layers.Concatenate()([x, skip])
    x = last(x)
    return keras.Model(inputs=inputs, outputs=x)

def Discriminator():
    initializer = tf.random_normal_initializer(0., 0.02)
    inp = layers.Input(shape=[256, 256, 3], name='input_image')
    down1 = downsample(64, 4, False)(inp)
    down2 = downsample(128, 4)(down1)
    down3 = downsample(256, 4)(down2)
    zero_pad1 = layers.ZeroPadding2D()(down3)
    conv = layers.Conv2D(512, 4, strides=1, kernel_initializer=initializer, use_bias=False)(zero_pad1)
    norm1 = InstanceNormalization()(conv)
    leaky_relu = layers.LeakyReLU()(norm1)
    zero_pad2 = layers.ZeroPadding2D()(leaky_relu)
    last = layers.Conv2D(1, 4, strides=1, kernel_initializer=initializer)(zero_pad2)
    return tf.keras.Model(inputs=inp, outputs=last)



In [None]:
# Model, Optimizers, and Loss Functions
with strategy.scope():
    # Perceptual Loss
    vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet', input_shape=[*IMAGE_SIZE, 3])
    vgg.trainable = False
    perceptual_loss_model = tf.keras.Model(vgg.input, vgg.get_layer('block4_conv4').output)

    def perceptual_loss_fn(real, fake):
        real_features = perceptual_loss_model((real + 1) * 127.5)
        fake_features = perceptual_loss_model((fake + 1) * 127.5)
        loss = tf.reduce_mean(tf.square(real_features - fake_features), axis=[1, 2, 3])
        return 5e-3 * tf.nn.compute_average_loss(loss, global_batch_size=BATCH_SIZE)

    # Learning rate scheduler
    EPOCHS = 40
    STEPS_PER_EPOCH = n_monet_samples // BATCH_SIZE
    DECAY_START_EPOCH = int(EPOCHS * 0.5)

    class LinearDecay(tf.keras.optimizers.schedules.LearningRateSchedule):
        def __init__(self, initial_learning_rate, total_steps, step_decay):
            super(LinearDecay, self).__init__()
            self.initial_learning_rate = tf.cast(initial_learning_rate, tf.float32)
            self.total_steps = tf.cast(total_steps, tf.float32)
            self.step_decay = tf.cast(step_decay, tf.float32)
        def __call__(self, step):
            step_float = tf.cast(step, tf.float32)
            return tf.cond(
                step_float < self.step_decay,
                lambda: self.initial_learning_rate,
                lambda: self.initial_learning_rate - (self.initial_learning_rate * (step_float - self.step_decay) / (self.total_steps - self.step_decay))
            )

    # initializing Models, Optimizers, and Standard Loss Functions
    monet_generator = Generator()
    photo_generator = Generator()
    monet_discriminator = Discriminator()
    photo_discriminator = Discriminator()
    lr_schedule = LinearDecay(2e-4, EPOCHS * STEPS_PER_EPOCH, DECAY_START_EPOCH * STEPS_PER_EPOCH)
    monet_generator_optimizer = tf.keras.optimizers.Adam(lr_schedule, beta_1=0.5)
    photo_generator_optimizer = tf.keras.optimizers.Adam(lr_schedule, beta_1=0.5)
    monet_discriminator_optimizer = tf.keras.optimizers.Adam(lr_schedule, beta_1=0.5)
    photo_discriminator_optimizer = tf.keras.optimizers.Adam(lr_schedule, beta_1=0.5)
    bce_loss_fn = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)
    def discriminator_loss_fn(r, g): return tf.nn.compute_average_loss((bce_loss_fn(tf.ones_like(r), r) + bce_loss_fn(tf.zeros_like(g), g)) * 0.5, BATCH_SIZE)
    def generator_loss_fn(g): return tf.nn.compute_average_loss(bce_loss_fn(tf.ones_like(g), g), BATCH_SIZE)
    def cycle_loss_fn(r, c): return 10.0 * tf.nn.compute_average_loss(tf.reduce_mean(tf.abs(r - c), axis=[1,2,3]), BATCH_SIZE)
    def identity_loss_fn(r, s): return 10.0 * 0.5 * tf.nn.compute_average_loss(tf.reduce_mean(tf.abs(r - s), axis=[1,2,3]), BATCH_SIZE)

# TRAINING STEP FUNCTION (with all losses)
with strategy.scope():
    @tf.function
    def train_step(real_monet, real_photo):
        with tf.GradientTape(persistent=True) as tape:
            fake_monet = monet_generator(real_photo, training=True)
            cycled_photo = photo_generator(fake_monet, training=True)
            fake_photo = photo_generator(real_monet, training=True)
            cycled_monet = monet_generator(fake_photo, training=True)
            same_monet = monet_generator(real_monet, training=True)
            same_photo = photo_generator(real_photo, training=True)
            disc_real_monet = monet_discriminator(real_monet, training=True)
            disc_real_photo = photo_discriminator(real_photo, training=True)
            disc_fake_monet = monet_discriminator(fake_monet, training=True)
            disc_fake_photo = photo_discriminator(fake_photo, training=True)

            # Generator losses
            monet_gen_adv_loss = generator_loss_fn(disc_fake_monet)
            photo_gen_adv_loss = generator_loss_fn(disc_fake_photo)
            perceptual_loss = perceptual_loss_fn(real_monet, fake_monet)
            total_cycle_loss = cycle_loss_fn(real_monet, cycled_monet) + cycle_loss_fn(real_photo, cycled_photo)
            
            total_monet_gen_loss = monet_gen_adv_loss + total_cycle_loss + identity_loss_fn(real_monet, same_monet) + perceptual_loss
            total_photo_gen_loss = photo_gen_adv_loss + total_cycle_loss + identity_loss_fn(real_photo, same_photo)

            # Discriminator losses
            monet_disc_loss = discriminator_loss_fn(disc_real_monet, disc_fake_monet)
            photo_disc_loss = discriminator_loss_fn(disc_real_photo, disc_fake_photo)

        # Calculating and applying gradients
        m_gen_grads = tape.gradient(total_monet_gen_loss, monet_generator.trainable_variables)
        p_gen_grads = tape.gradient(total_photo_gen_loss, photo_generator.trainable_variables)
        m_disc_grads = tape.gradient(monet_disc_loss, monet_discriminator.trainable_variables)
        p_disc_grads = tape.gradient(photo_disc_loss, photo_discriminator.trainable_variables)
        
        monet_generator_optimizer.apply_gradients(zip(m_gen_grads, monet_generator.trainable_variables))
        photo_generator_optimizer.apply_gradients(zip(p_gen_grads, photo_generator.trainable_variables))
        monet_discriminator_optimizer.apply_gradients(zip(m_disc_grads, monet_discriminator.trainable_variables))
        photo_discriminator_optimizer.apply_gradients(zip(p_disc_grads, photo_discriminator.trainable_variables))


In [None]:
# Helper function to display results after each epoch.
def display_generated_samples(generator, photo_dataset):
    for photo_batch in photo_dataset:
        prediction = generator(photo_batch, training=False)[0].numpy()
        plt.figure(figsize=(12, 12))
        original_photo = photo_batch[0]
        display_list = [original_photo, prediction]
        title = ['Input Photo', 'Generated Monet']
        for i in range(2):
            plt.subplot(1, 2, i+1)
            plt.title(title[i])
            # Un-normalize the image from [-1, 1] to [0, 1] for display
            plt.imshow(display_list[i] * 0.5 + 0.5)
            plt.axis('off') 
        plt.show()
        break
        

In [None]:
# Training Loop
EPOCHS = 20
STEPS_PER_EPOCH = n_monet_samples // BATCH_SIZE

monet_ds = configure_dataset(MONET_FILENAMES)
photo_ds = configure_dataset(PHOTO_FILENAMES)
final_ds = strategy.experimental_distribute_dataset(tf.data.Dataset.zip((monet_ds, photo_ds)))
photo_ds_vis = configure_dataset(PHOTO_FILENAMES, augmented=False, shuffle=False).take(1)

def display_generated_samples(generator, photo_batch):
    prediction = generator(photo_batch, training=False)[0].numpy()
    plt.figure(figsize=(12, 12))
    display_list = [photo_batch[0], prediction]
    title = ['Input Photo', 'Generated Monet']
    for i in range(2):
        plt.subplot(1, 2, i+1); plt.title(title[i]); plt.imshow(display_list[i] * 0.5 + 0.5); plt.axis('off')
    plt.show()

print("Starting final professional-grade training... 🚀")
for epoch in range(EPOCHS):
    start_time = time.time()
    print(f"\nEpoch {epoch + 1}/{EPOCHS}")
    progbar = tf.keras.utils.Progbar(STEPS_PER_EPOCH, unit_name='step')
    for step, batch in enumerate(final_ds):
        if step >= STEPS_PER_EPOCH: break
        strategy.run(train_step, args=batch)
        progbar.update(step + 1)
    print(f"\nDisplaying sample result for epoch {epoch+1}:")
    for photo_batch in photo_ds_vis:
        display_generated_samples(monet_generator, photo_batch)
        break
    print(f"Time for epoch {epoch + 1} is {time.time()-start_time:.2f} sec")

---

# Results and Discussion

The final model was trained for 20 epochs using our enhanced CycleGAN architecture. The primary goal was to move beyond simple color transfer and generate images with authentic, Monet-like texture and brushstrokes. The results shows a clear success in this objective.

## Qualitative Analysis

As seen in the sample image generated at the end of training, the model successfully learned to translate a standard photograph into a vibrant, impressionistic painting. Key achievements include:

* **Texture and Brushstrokes:** The most significant improvement over baseline models is the emergence of a "painterly" texture. Instead of a blurry or smudged effect, the output image has complex patterns that mimic the short, thick brushstrokes characteristic of Monet's work. This is a direct result of incorporating the **Perceptual Loss**, which forced the generator to learn high-level stylistic features.
* **Color Palette:** The model accurately captured the Monet color palette, shifting the realistic tones of the photograph to the brighter, more vibrant blues, greens, and yellows found in Impressionist art.
* **Content Preservation:** Despite the dramatic stylistic transformation, the core content and composition of the original photograph are  preserved, thanks to the **Cycle-Consistency Loss**. The bridge, trees, and water are all discernible.

The learning process, visualized by generating a sample after each epoch, showed a rapid progression. The model quickly moved past a blurry initial state and began developing complex textures within the first 10-15 epochs, stabilizing and refining them as training progressed.

## Training Process

The training process itself remained stable, which is a common challenge with GANs. The use of a **manual training loop** with a well-defined `train_step` provided robustness. The **Learning Rate Scheduler**, set to begin decaying after 20 epochs, is designed to help the model further refine these details in longer training runs, preventing the kind of plateau we observed in earlier experiments.

---

---

# Conclusion

This project developed and trained an enhanced Generative Adversarial Network to perform artistic style transfer, converting real-world photographs into paintings in the style of Claude Monet.

By systematically iterating on our approach, I demonstrated that a standard CycleGAN, while effective, could be significantly improved. The final model incorporated three key professional-grade techniques: **Data Augmentation**, a **Perceptual Loss** function (using a pre-trained VGG19 network), and a **Learning Rate Scheduler**.

The final generated images exhibit a rich, painterly texture and a vibrant color palette that successfully captures the essence of Monet's style, moving beyond a simple filter. This confirms that a carefully designed loss function that operates on a perceptual level is crucial for achieving high-fidelity results in complex style transfer tasks. The project successfully met all its objectives and produced a powerful and effective model for artistic image generation.

---

------

# Future Work and Potential Improvements

While the current model is successful, there are several potentialy ways for future experimentation and improvement:

1.  **ResNet-Based Generator:** The other state-of-the-art architecture for this task uses a generator built with Residual Blocks (ResNet) instead of a U-Net. Implementing a ResNet generator would be a logical next step to explore a different style of painting, one that might focus even more on preserving the original photo's structure.
2.  **Hyperparameter Tuning:** The weights of the various loss functions (adversarial, cycle, identity, and perceptual) were set to standard values. A rigorous hyperparameter search could yield a better balance, potentially creating even more pronounced textures or more accurate colors. For instance, increasing the weight of the perceptual loss could be explored.
3.  **Longer Training:** The current results are from 20 epochs. Running the model for the full 40 epochs (to take advantage of the learning rate decay) or even longer (80-100 epochs) would likely lead to further refinement and detail in the generated images.
4.  **Application to Other Artists:** The final pipeline is robust and could be readily adapted to learn the style of other famous artists, such as Van Gogh, Cézanne, or Picasso, by simply swapping the Monet dataset for a collection of their works.

-----

In [None]:
# Image generating for sbmission
import PIL

print("\nTraining finished. Generating images for submission... 🎨")
os.makedirs("/kaggle/working/images", exist_ok=True)

def configure_dataset_final(filenames, augmented=False, shuffle=False, repeat=False, drop_remainder=False):
    ds = tf.data.TFRecordDataset(filenames, num_parallel_reads=AUTO)
    ds = ds.map(lambda x: read_tfrecord(x, augmented), num_parallel_calls=AUTO)
    if shuffle:
        ds = ds.shuffle(n_monet_samples * 2)
    if repeat:
        ds = ds.repeat()
    ds = ds.batch(BATCH_SIZE, drop_remainder=drop_remainder)
    ds = ds.prefetch(AUTO)
    return ds
    
submission_ds = configure_dataset_final(PHOTO_FILENAMES, augmented=False, shuffle=False, repeat=False, drop_remainder=False)

img_count = 0
for img_batch in submission_ds:
    for img in img_batch:
        prediction = monet_generator(tf.expand_dims(img, 0), training=False)[0].numpy()
        prediction = (prediction * 127.5 + 127.5).astype(np.uint8)
        im = PIL.Image.fromarray(prediction)
        img_count += 1
        im.save(f'/kaggle/working/images/{img_count}.jpg')

print(f"Generated {img_count} images.")

In [None]:
# Creating submission Zip file
zip_file_path = '/kaggle/working/images.zip'
with zipfile.ZipFile(zip_file_path, 'w') as zf:
    # Iterate over the generated image files
    for filename in os.listdir('/kaggle/working/images'):
        file_path = os.path.join('/kaggle/working/images', filename)
        # Write the file to the zip archive, using only the filename as the archive name
        zf.write(file_path, arcname=filename)

print(f"\nSubmission file created at: {zip_file_path}")
