<a href="https://colab.research.google.com/github/CourtSingerr/GAN_MonetStyle_ImageTransformation/blob/main/GANsKaggleProject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# GANs Kaggle Mini Project
#### Kaggle Competition: I’m Something of a Painter Myself - Use GANs to create art - will you be the next Monet?
#### https://www.kaggle.com/competitions/gan-getting-started
##### Courtney Singer, November 2024



---

# Description of the Problem

The core task in this competition is to learn a transformation from one domain of images (real photographs) to another (Monet-style paintings).

Instead of having pairs of images showing the same scene in both styles, this challenge relies on unpaired image-to-image translation— an ideal scenario for applying a Generative Adversarial Network (GAN) architecture.

**The goal** is to generate convincing Monet-style artwork from ordinary Photographs.

In this analysis, I compare two architectures - CycleGAN and Contrastive Unpaired Translation. CycleGAN proves to be more effective and is used in my final model.

---

# Overview of the Data

The dataset consists of two distinct image collections: a relatively small set of approximately 300 Monet paintings and a larger set of about 7,000 real landscape photographs.

**Source:** The dataset contains two distinct sets of images: one comprised of Monet paintings and the other composed of real landscape photographs.

**Size and Composition:**
  - Monet Paintings: Approximately 300 images of Monet-style artwork
  - Real Photographs: Around 7,000 images of everyday scenes and landscapes
  - Image dimensions:generally provided at a resolution of 256x256 pixels
  - All images are in RGB format, giving three color channels.


## Import Data

I imported the datasets directly from Kaggle using the Kaggle API and then extracted the files in your Colab environment.

In [1]:
pip install kaggle




In [5]:
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

mkdir: cannot create directory ‘/root/.kaggle’: File exists


In [None]:
%%capture
!kaggle competitions download -c gan-getting-started -p ./data
!unzip ./data/gan-getting-started.zip -d ./data

---

# Exploratory Data Analysis

### Exploration of Raw Data

**Basic Directory and File Checks**

In [None]:
import os

# Paths (adjust if you extracted to a different location)
data_path = '/content/data'  # This should be where you unzipped your data
monet_dir = os.path.join(data_path, 'monet_jpg')
photo_dir = os.path.join(data_path, 'photo_jpg')

# Count the number of images in each directory
monet_images = os.listdir(monet_dir)
photo_images = os.listdir(photo_dir)

print(f"Number of Monet images: {len(monet_images)}")
print(f"Number of Photo images: {len(photo_images)}")

**Visualizing a Few Samples**

In this step I view a handful of images from each set to get a qualitative feel for the data.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random

# Function to show a grid of images
def show_images(image_paths, title, num=5):
    plt.figure(figsize=(15,3))
    for i in range(num):
        img_path = random.choice(image_paths)
        img = mpimg.imread(os.path.join(data_path, img_path))
        plt.subplot(1, num, i+1)
        plt.imshow(img)
        plt.title(title)
        plt.axis('off')
    plt.show()

show_images([f"monet_jpg/{f}" for f in monet_images], title="Monet Paintings")
show_images([f"photo_jpg/{f}" for f in photo_images], title="Real Photos")

**Image Dimensions & Aspect Ratios**

Check if all images are consistent in dimensions and aspect ratios. You can also quickly verify image modes or any anomalies.

In [None]:
from PIL import Image
import numpy as np

def get_image_stats(image_dir, num_check=100):
    widths, heights = [], []
    image_files = os.listdir(image_dir)
    sample_files = image_files[:num_check]  # check only a subset for speed
    for f in sample_files:
        img_path = os.path.join(image_dir, f)
        img = Image.open(img_path)
        w, h = img.size
        widths.append(w)
        heights.append(h)
    return widths, heights

monet_widths, monet_heights = get_image_stats(monet_dir)
photo_widths, photo_heights = get_image_stats(photo_dir)

print(f"Monet Images: mean width={np.mean(monet_widths):.2f}, mean height={np.mean(monet_heights):.2f}")
print(f"Photo Images: mean width={np.mean(photo_widths):.2f}, mean height={np.mean(photo_heights):.2f}")

**Color Channel Distributions**

It could be interesting to see the distribution of pixel intensities for Monet paintings vs. real photos. For instance, check mean and standard deviation of pixel values across a sample:

This can give you insight into how Monet paintings differ in terms of color distribution compared to actual photos.

In [None]:
import tensorflow as tf

def load_and_preprocess(img_path):
    img = tf.io.read_file(img_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.convert_image_dtype(img, tf.float32)  # Scale [0,1]
    return img

def image_pixel_stats(image_dir, num_check=100):
    image_files = os.listdir(image_dir)[:num_check]
    means = []
    stds = []
    for f in image_files:
        img_path = os.path.join(image_dir, f)
        img = load_and_preprocess(img_path)
        means.append(tf.reduce_mean(img, axis=[0,1]))
        stds.append(tf.math.reduce_std(img, axis=[0,1]))
    mean = tf.reduce_mean(tf.stack(means), axis=0)
    std = tf.reduce_mean(tf.stack(stds), axis=0)
    return mean, std

monet_mean, monet_std = image_pixel_stats(monet_dir)
photo_mean, photo_std = image_pixel_stats(photo_dir)

print("Monet mean RGB:", monet_mean.numpy())
print("Monet std RGB:", monet_std.numpy())
print("Photo mean RGB:", photo_mean.numpy())
print("Photo std RGB:", photo_std.numpy())

**Simple Histogram Visualization for Color Channels**

Plot histograms of pixel intensities for a few samples to visually inspect color distributions.

In [None]:
def plot_image_histogram(img_path):
    img = load_and_preprocess(img_path).numpy()
    colors = ['r', 'g', 'b']
    plt.figure(figsize=(10,3))
    for i, c in enumerate(colors):
        plt.hist(img[:,:,i].flatten(), bins=50, alpha=0.5, color=c)
    plt.title("Color Channel Distribution")
    plt.xlabel("Pixel Intensity")
    plt.ylabel("Count")
    plt.show()

# Example: Plot histogram for one Monet image and one Photo image
monet_sample = os.path.join(monet_dir, random.choice(monet_images))
photo_sample = os.path.join(photo_dir, random.choice(photo_images))

print("Monet image histogram:")
plot_image_histogram(monet_sample)

print("Photo image histogram:")
plot_image_histogram(photo_sample)

7. Summary and Observations

After running these EDA steps, you might note:
- The number of Monet images is much smaller than the number of photos.
- Image sizes might be consistent (256x256) or may need resizing.
- Monet paintings have different color distributions (more pastel tones, etc.) compared to the real photos.
- Pixel intensity distributions and mean values differ between Monet and real photos, which is what the model will need to learn to transform.

These insights can guide how you preprocess the data, choose normalization techniques, and set up your training pipeline.

### Data Cleaning

The dataset for the “GAN - Getting Started” competition is relatively clean since it’s curated for a specific challenge.



**Check for Corrupted or Incomplete Images**

Occasionally, data downloads can be incomplete or corrupted. A good sanity check:

In [None]:
from PIL import Image
import os

def verify_images(directory):
    for fname in os.listdir(directory):
        fpath = os.path.join(directory, fname)
        try:
            with Image.open(fpath) as img:
                img.verify()  # just verify if it can be opened
        except (IOError, SyntaxError) as e:
            print(f"Corrupted file found: {fpath}")

# Check both Monet and Photo directories
verify_images('/content/data/monet_jpg')
verify_images('/content/data/photo_jpg')

**Ensure Consistent Dimensions**

While most images should already be uniformly sized (often 256x256), verify this. If any images differ, resize them:

In [None]:
import cv2

IMG_SIZE = 256

def resize_and_overwrite(directory):
    for fname in os.listdir(directory):
        fpath = os.path.join(directory, fname)
        img = cv2.imread(fpath)
        if img is not None and (img.shape[0] != IMG_SIZE or img.shape[1] != IMG_SIZE):
            img_resized = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
            cv2.imwrite(fpath, img_resized)

resize_and_overwrite('/content/data/monet_jpg')
resize_and_overwrite('/content/data/photo_jpg')

**Remove Duplicate or Near-Duplicate Images**

While not strictly necessary, you could check for duplicates. Near-duplicates could bias the model. This step is optional and more involved:

In [None]:
%%capture
!pip install imagehash
from PIL import Image
import imagehash
from collections import defaultdict

In [None]:
def find_duplicates(directory):
    hashes = defaultdict(list)
    for fname in os.listdir(directory):
        fpath = os.path.join(directory, fname)
        img = Image.open(fpath)
        h = imagehash.average_hash(img)
        hashes[h].append(fpath)

    # Print duplicates
    for h, files in hashes.items():
        if len(files) > 1:
            print("Duplicates found:", files)

# Check for duplicates in Monet and Photo sets
find_duplicates('/content/data/monet_jpg')
find_duplicates('/content/data/photo_jpg')

**Color/Mode Consistency**

Make sure all images are in RGB mode. If any image is grayscale or has an alpha channel, convert it:

In [None]:
def ensure_rgb(directory):
    for fname in os.listdir(directory):
        fpath = os.path.join(directory, fname)
        img = Image.open(fpath)
        if img.mode != 'RGB':
            img = img.convert('RGB')
            img.save(fpath)

ensure_rgb('/content/data/monet_jpg')
ensure_rgb('/content/data/photo_jpg')

## Data Preprocessing

**Splitting or Sampling the Data**




**Normalization**

It is standard practice to normalize pixel values for GAN training. CycleGAN, for example, often normalizes images to the range [-1, 1].

To achieve this normalization, you’ll likely do it on-the-fly with TensorFlow or PyTorch during dataset creation. For example, with TensorFlow:

In [3]:
import tensorflow as tf

def load_and_preprocess_image(img_path):
    img = tf.io.read_file(img_path)
    img = tf.image.decode_jpeg(img, channels=3)
    # Resize if not already done
    img = tf.image.resize(img, [IMG_SIZE, IMG_SIZE])
    # Normalize from [0,255] to [-1,1]
    img = (img / 127.5) - 1
    return img

**Creating TensorFlow Datasets**

Generate TF Dataset objects for Monet and photo images. This makes it easier to batch, shuffle, and prefetch.

In [None]:
monet_paths = tf.data.Dataset.list_files('/content/data/monet_jpg/*.jpg', shuffle=True)
photo_paths = tf.data.Dataset.list_files('/content/data/photo_jpg/*.jpg', shuffle=True)

monet_ds = monet_paths.map(load_and_preprocess_image, num_parallel_calls=tf.data.AUTOTUNE).batch(1)
photo_ds = photo_paths.map(load_and_preprocess_image, num_parallel_calls=tf.data.AUTOTUNE).batch(1)

# For CycleGAN training, often we zip these datasets so that each step you get a Monet and a photo image:
dataset = tf.data.Dataset.zip((monet_ds, photo_ds)).prefetch(tf.data.AUTOTUNE)

**Data Augmentation**

Applying data augmentation can help the model generalize better.

In [None]:
def augment_image(img):
    img = tf.image.random_flip_left_right(img)
    # Add other augmentations if desired
    return img

# Apply augmentations
monet_ds = monet_ds.map(augment_image, num_parallel_calls=tf.data.AUTOTUNE)
photo_ds = photo_ds.map(augment_image, num_parallel_calls=tf.data.AUTOTUNE)

# Defining Model Architecture


I plan to compare two models designed for image-to-image translation: a conditional Generative Adversarial Network (cGAN)-based model and a CycleGAN. 	The conditional GAN approach will be used as a baseline, and the CycleGAN will add cycle-consistency constraints at the cost of increased complexity to try to improve performance.


**Model 1:** CUT (Contrastive Unpaired Translation) Architecture:
-  A newer framework that avoids explicit cycle-consistency by adopting a patchwise contrastive loss, which encourages the generated image to retain similar local features as the source image.
- Instead of enforcing that an image translated A → B → A returns to its original form, CUT focuses on aligning patches between the input and output images at a feature level using contrastive learning.
- Components
  - Single Generator: A single generator network that transforms images from domain A to domain B
  - Single Discriminator: A standard discriminator to ensure that the generated images are plausible in the target domain.
  - Contrastive Loss: Guides the generator to preserve content-specific details
- Simpler conceptually than CycleGAN
- Potential t0 ignore crucial details and produce images that are stylistically correct but lose the content structure.


**Model 2:** CycleGAN:
- A well-established framework for unpaired image-to-image translation.
- Designed for unpaired image-to-image translation
- Does not require paired examples of source and target images
- Uses two generators and two discriminators
  -  One generator (G) translates photos to Monet paintings, and the other (F) translates Monet paintings back to photos.
  - One discriminator determines if an image is a real Monet painting or a generated Monet-style image, the other determines if an image is a real photo or a generated photo.
- More complex training setup (two generators, two discriminators, and additional loss terms).

### Building  Models

#### CUT (Contrastive Unpaired Translation) Architecture

**Configurations**



In [None]:
%%capture
!pip install tensorflow-addons
!pip install --upgrade tensorflow tensorflow-addons keras

In [None]:
import os
import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow_addons as tfa


N_RES = 9
TAU = 0.07
LR = 2e-4
BETA_1 = 0.5
EPOCHS = 10
STEPS_PER_EPOCH = 1000
NCE_LAYERS = ["re_lu", "re_lu_1", "re_lu_2"]

**Model Components**

In [None]:
def resnet_block(x, filters):
    """A single ResNet block as used in image-to-image translation models."""
    init = tf.random_normal_initializer(0., 0.02)
    # Use Instance Normalization as is common in style-transfer tasks
    # If unavailable, can revert to BatchNormalization
    y = tf.keras.layers.Conv2D(filters, 3, padding='same', kernel_initializer=init)(x)
    y = tfa.layers.InstanceNormalization()(y)
    y = tf.keras.layers.ReLU()(y)
    y = tf.keras.layers.Conv2D(filters, 3, padding='same', kernel_initializer=init)(y)
    y = tfa.layers.InstanceNormalization()(y)
    return tf.keras.layers.add([x, y])  # residual connection

def build_generator(input_shape=(256,256,3), n_res=9):
    init = tf.random_normal_initializer(0., 0.02)
    inputs = tf.keras.Input(shape=input_shape)

    # Downsampling
    x = tf.keras.layers.Conv2D(64, 7, padding='same', kernel_initializer=init)(inputs)
    x = tf.keras.layers.ReLU()(x)

    x = tf.keras.layers.Conv2D(128, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.ReLU()(x)
    x = tf.keras.layers.Conv2D(256, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.ReLU()(x)

    # Residual blocks
    for _ in range(n_res):
        x = resnet_block(x, 256)

    # Upsampling
    x = tf.keras.layers.Conv2DTranspose(128, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.ReLU()(x)

    x = tf.keras.layers.Conv2DTranspose(64, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.ReLU()(x)

    x = tf.keras.layers.Conv2D(3, 7, padding='same', kernel_initializer=init, activation='tanh')(x)

    return tf.keras.Model(inputs, x, name='cut_generator')


def build_discriminator(input_shape=(256,256,3)):
    """Builds a PatchGAN discriminator."""
    init = tf.random_normal_initializer(0., 0.02)
    inp = tf.keras.Input(shape=input_shape, name='dis_input')

    x = tf.keras.layers.Conv2D(64,4,strides=2,padding='same',kernel_initializer=init)(inp)
    x = tf.keras.layers.LeakyReLU()(x)

    x = tf.keras.layers.Conv2D(128,4,strides=2,padding='same',kernel_initializer=init)(x)
    x = tfa.layers.InstanceNormalization()(x)
    x = tf.keras.layers.LeakyReLU()(x)

    x = tf.keras.layers.Conv2D(256,4,strides=2,padding='same',kernel_initializer=init)(x)
    x = tfa.layers.InstanceNormalization()(x)
    x = tf.keras.layers.LeakyReLU()(x)

    x = tf.keras.layers.Conv2D(512,4,strides=1,padding='same',kernel_initializer=init)(x)
    x = tfa.layers.InstanceNormalization()(x)
    x = tf.keras.layers.LeakyReLU()(x)

    x = tf.keras.layers.Conv2D(1,4,strides=1,padding='same',kernel_initializer=init)(x)
    return tf.keras.Model(inp, x, name='cut_discriminator')

**Loss Function**

The CUT model uses PatchNCE (Contrastive) Loss. There is no function for this in tensorflow, so I have build a custom function modeled off the method used in PyTorch.

In [None]:
mse = tf.keras.losses.MeanSquaredError()

def gan_loss(pred_fake):
    """GAN loss for generator: tries to make pred_fake close to 1."""
    return mse(tf.ones_like(pred_fake), pred_fake)

def gan_loss_discriminator(pred_real, pred_fake):
    """Discriminator loss: real close to 1, fake close to 0."""
    real_loss = mse(tf.ones_like(pred_real), pred_real)
    fake_loss = mse(tf.zeros_like(pred_fake), pred_fake)
    return 0.5 * (real_loss + fake_loss)

def nce_loss(features_src, features_tgt, tau=TAU):
    """PatchNCE loss for one pair of feature maps (N,C,H,W)."""
    N, C, H, W = tf.unstack(tf.shape(features_src))
    features_src = tf.cast(features_src, tf.float32)
    features_tgt = tf.cast(features_tgt, tf.float32)

    # [N,C,H,W] -> [N,H,W,C]
    features_src = tf.transpose(features_src, [0, 2, 3, 1])
    features_tgt = tf.transpose(features_tgt, [0, 2, 3, 1])

    # Flatten H*W patches
    features_src = tf.reshape(features_src, [N * H * W, C])
    features_tgt = tf.reshape(features_tgt, [N * H * W, C])

    # Normalize
    features_src = tf.nn.l2_normalize(features_src, axis=1)
    features_tgt = tf.nn.l2_normalize(features_tgt, axis=1)

    # Similarity
    similarity = tf.matmul(features_src, features_tgt, transpose_b=True)
    similarity = similarity / tau

    num_patches = N * H * W
    labels = tf.range(num_patches)

    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels, similarity)
    return tf.reduce_mean(loss)


**Feature Extraction**

In [None]:
def create_feature_extractor(base_model, layer_names):
    """Extract intermediate features for NCE loss."""
    outputs = [base_model.get_layer(name).output for name in layer_names]
    return tf.keras.Model(inputs=base_model.input, outputs=outputs, name='feature_extractor')


**Build Models and Setup**

In [None]:
generator_CUT = build_generator()
discriminator_CUT = build_discriminator()
feature_extractor = create_feature_extractor(generator_CUT, NCE_LAYERS)

gen_optimizer_CUT = tf.keras.optimizers.Adam(LR, beta_1=BETA_1)
disc_optimizer_CUT = tf.keras.optimizers.Adam(LR, beta_1=BETA_1)

# Setup TensorBoard logging
log_dir = "logs/CUT"
train_writer = tf.summary.create_file_writer(log_dir)

**Training Setup**

In [None]:
@tf.function
def cut_train_step(real_photo, real_monet):
    # You can add identity loss if desired, for better color preservation:
    # identity_monet = generator_CUT(real_monet, training=True)
    # identity_loss_val = tf.reduce_mean(tf.abs(real_monet - identity_monet)) * 10.0
    # Then add identity_loss_val to total_g_loss if you want

    with tf.GradientTape(persistent=True) as tape:
        fake_monet = generator_CUT(real_photo, training=True)
        disc_real = discriminator_CUT(real_monet, training=True)
        disc_fake = discriminator_CUT(fake_monet, training=True)

        g_gan_loss = gan_loss(disc_fake)
        d_loss = gan_loss_discriminator(disc_real, disc_fake)

        features_photo = feature_extractor(real_photo, training=True)
        features_fakemonet = feature_extractor(fake_monet, training=True)

        # Compute NCE loss
        nce = 0.0
        for f_p, f_fm in zip(features_photo, features_fakemonet):
            # Convert to N,C,H,W for NCE
            f_p = tf.transpose(f_p, [0,3,1,2])
            f_fm = tf.transpose(f_fm, [0,3,1,2])
            nce += nce_loss(f_p, f_fm, tau=TAU)
        nce = nce / len(features_photo)

        total_g_loss = g_gan_loss + nce  # + identity_loss_val if using

    gen_grad = tape.gradient(total_g_loss, generator_CUT.trainable_variables)
    disc_grad = tape.gradient(d_loss, discriminator_CUT.trainable_variables)

    gen_optimizer_CUT.apply_gradients(zip(gen_grad, generator_CUT.trainable_variables))
    disc_optimizer_CUT.apply_gradients(zip(disc_grad, discriminator_CUT.trainable_variables))

    return g_gan_loss, nce, d_loss


metrics_history_cut = {
    'epoch': [],
    'g_gan_loss': [],
    'nce_loss': [],
    'd_loss': []
}


In [None]:
# Assume `dataset` yields (photo, monet) pairs.
# Ensure dataset is batched and provides [N,H,W,3] tensors.
for epoch in range(1, EPOCHS+1):
    g_gan_losses = []
    nce_losses = []
    d_losses = []

    for step, (photo, monet) in enumerate(dataset.take(STEPS_PER_EPOCH)):
        g_loss_val, nce_val, d_loss_val = cut_train_step(photo, monet)
        g_gan_losses.append(g_loss_val.numpy())
        nce_losses.append(nce_val.numpy())
        d_losses.append(d_loss_val.numpy())

    avg_g_gan_loss = np.mean(g_gan_losses)
    avg_nce_loss = np.mean(nce_losses)
    avg_d_loss = np.mean(d_losses)

    # Log metrics
    metrics_history_cut['epoch'].append(epoch)
    metrics_history_cut['g_gan_loss'].append(avg_g_gan_loss)
    metrics_history_cut['nce_loss'].append(avg_nce_loss)
    metrics_history_cut['d_loss'].append(avg_d_loss)

    print(f"Epoch {epoch}/{EPOCHS} completed: "
          f"g_gan_loss={avg_g_gan_loss:.4f}, nce_loss={avg_nce_loss:.4f}, d_loss={avg_d_loss:.4f}")

    # Write to TensorBoard
    with train_writer.as_default():
        tf.summary.scalar('g_gan_loss', avg_g_gan_loss, step=epoch)
        tf.summary.scalar('nce_loss', avg_nce_loss, step=epoch)
        tf.summary.scalar('d_loss', avg_d_loss, step=epoch)

df_cut_metrics = pd.DataFrame(metrics_history_cut)
print(df_cut_metrics.head())

In [None]:
df_metrics_cut = pd.DataFrame(metrics_history)
df_metrics_cycleGAN.head()

#### CycleGAN Architecture
CycleGAN requires two generators and two discriminators, as well as cycle-consistency and identity losses.

**Generators**

 I am using a ResNet-based generator here.

In [None]:
from tensorflow.keras import layers
import tensorflow as tf
from tensorflow.keras import layers, Model

In [None]:
def residual_block(x, filters=256):
    init = tf.random_normal_initializer(0., 0.02)
    y = layers.Conv2D(filters, 3, padding='same', kernel_initializer=init)(x)
    y = layers.BatchNormalization()(y)
    y = layers.ReLU()(y)
    y = layers.Conv2D(filters, 3, padding='same', kernel_initializer=init)(y)
    y = layers.BatchNormalization()(y)
    return layers.add([x, y])

def cyclegan_generator(input_shape=(256,256,3), num_res_blocks=6):
    init = tf.random_normal_initializer(0., 0.02)
    inputs = layers.Input(shape=input_shape)

    # Downsampling
    x = layers.Conv2D(64, 7, padding='same', kernel_initializer=init)(inputs)
    x = layers.ReLU()(x)
    x = layers.Conv2D(128, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.Conv2D(256, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    # Residual blocks
    for _ in range(num_res_blocks):
        x = residual_block(x, 256)

    # Upsampling
    x = layers.Conv2DTranspose(128, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    x = layers.Conv2DTranspose(64, 3, strides=2, padding='same', kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)

    x = layers.Conv2D(3, 7, padding='same', kernel_initializer=init, activation='tanh')(x)

    return Model(inputs, x, name='cyclegan_generator')

**Discriminators**

CycleGAN discriminators are simpler PatchGAN discriminators, similar to the cGAN discriminator but without conditional concatenation. They only take a single image as input and classify patches as real or fake.

In [None]:
def cyclegan_discriminator(input_shape=(256,256,3)):
    init = tf.random_normal_initializer(0., 0.02)
    inp = layers.Input(shape=input_shape, name='input_image')

    x = layers.Conv2D(64,4,strides=2,padding='same',kernel_initializer=init)(inp)
    x = layers.LeakyReLU()(x)

    x = layers.Conv2D(128,4,strides=2,padding='same',kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.LeakyReLU()(x)

    x = layers.Conv2D(256,4,strides=2,padding='same',kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.LeakyReLU()(x)

    x = layers.Conv2D(512,4,strides=1,padding='same',kernel_initializer=init)(x)
    x = layers.BatchNormalization()(x)
    x = layers.LeakyReLU()(x)

    x = layers.Conv2D(1,4,strides=1,padding='same',kernel_initializer=init)(x)
    return Model(inp, x, name='cyclegan_discriminator')

**Defining Loss Functions**


In [None]:
import tensorflow as tf

mse_loss = tf.keras.losses.MeanSquaredError()

def generator_loss(fake_output):
    # The generator tries to fool the discriminator, so we compare to "ones"
    return mse_loss(tf.ones_like(fake_output), fake_output)

def discriminator_loss(real_output, fake_output):
    real_loss = mse_loss(tf.ones_like(real_output), real_output)
    fake_loss = mse_loss(tf.zeros_like(fake_output), fake_output)
    return (real_loss + fake_loss) * 0.5

# For cycleGAN:
LAMBDA = 10.0
def cycle_consistency_loss(real_image, cycled_image):
    return tf.reduce_mean(tf.abs(real_image - cycled_image)) * LAMBDA

def identity_loss(real_image, same_image):
    return tf.reduce_mean(tf.abs(real_image - same_image)) * LAMBDA * 0.5

l1_loss = tf.keras.losses.MeanAbsoluteError()

def l1_reconstruction_loss(real_image, generated_image):
    return l1_loss(real_image, generated_image)

**Build generators and Discriminators, and Define Optimizer**

In [None]:
G = cyclegan_generator()   # Photo -> Monet
F = cyclegan_generator()   # Monet -> Photo
D_M = cyclegan_discriminator() # Distinguish Monet from generated Monet
D_P = cyclegan_discriminator() # Distinguish Photos from generated Photos

generator_g_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
generator_f_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_m_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
discriminator_p_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)


**Train Model**

In [None]:
@tf.function
def cyclegan_train_step(real_monet, real_photo, G, F, D_M, D_P):
    with tf.GradientTape(persistent=True) as tape:
        # Generator forward passes
        fake_monet = G(real_photo, training=True)
        cycled_photo = F(fake_monet, training=True)

        fake_photo = F(real_monet, training=True)
        cycled_monet = G(fake_photo, training=True)

        # Identity mapping (optional)
        same_monet = G(real_monet, training=True)
        same_photo = F(real_photo, training=True)

        # Discriminator outputs
        disc_real_monet = D_M(real_monet, training=True)
        disc_fake_monet = D_M(fake_monet, training=True)

        disc_real_photo = D_P(real_photo, training=True)
        disc_fake_photo = D_P(fake_photo, training=True)

        # Generator losses
        gen_g_loss = generator_loss(disc_fake_monet)
        gen_f_loss = generator_loss(disc_fake_photo)

        total_cycle_loss = cycle_consistency_loss(real_photo, cycled_photo) + cycle_consistency_loss(real_monet, cycled_monet)
        total_identity_loss = identity_loss(real_monet, same_monet) + identity_loss(real_photo, same_photo)

        total_gen_g_loss = gen_g_loss + total_cycle_loss + total_identity_loss
        total_gen_f_loss = gen_f_loss + total_cycle_loss + total_identity_loss

        # Discriminator losses
        disc_m_loss = discriminator_loss(disc_real_monet, disc_fake_monet)
        disc_p_loss = discriminator_loss(disc_real_photo, disc_fake_photo)

    # Compute gradients
    generator_g_gradients = tape.gradient(total_gen_g_loss, G.trainable_variables)
    generator_f_gradients = tape.gradient(total_gen_f_loss, F.trainable_variables)

    discriminator_m_gradients = tape.gradient(disc_m_loss, D_M.trainable_variables)
    discriminator_p_gradients = tape.gradient(disc_p_loss, D_P.trainable_variables)

    # Apply gradients
    generator_g_optimizer.apply_gradients(zip(generator_g_gradients, G.trainable_variables))
    generator_f_optimizer.apply_gradients(zip(generator_f_gradients, F.trainable_variables))
    discriminator_m_optimizer.apply_gradients(zip(discriminator_m_gradients, D_M.trainable_variables))
    discriminator_p_optimizer.apply_gradients(zip(discriminator_p_gradients, D_P.trainable_variables))

    return {
        'gen_g_loss': total_gen_g_loss,
        'gen_f_loss': total_gen_f_loss,
        'disc_m_loss': disc_m_loss,
        'disc_p_loss': disc_p_loss,
        'cycle_loss': total_cycle_loss,
        'identity_loss': total_identity_loss
    }

 **Logging Metrics Each Epoch, to be used in future Evaluation**

In [None]:
metrics_history = {
    'epoch': [],
    'gen_g_loss': [],
    'gen_f_loss': [],
    'disc_m_loss': [],
    'disc_p_loss': [],
    'cycle_loss': [],
    'identity_loss': []
}

# Suppose we have a dataset of pairs for CycleGAN (real_monet, real_photo)
EPOCHS = 10
steps_per_epoch = 1000  # adjust based on data size

for epoch in range(1, EPOCHS+1):
    gen_g_losses = []
    gen_f_losses = []
    disc_m_losses = []
    disc_p_losses = []
    cycle_losses = []
    identity_losses = []

    for step, (real_monet, real_photo) in enumerate(dataset.take(steps_per_epoch)):
        results = cyclegan_train_step(real_monet, real_photo, G, F, D_M, D_P)
        gen_g_losses.append(results['gen_g_loss'].numpy())
        gen_f_losses.append(results['gen_f_loss'].numpy())
        disc_m_losses.append(results['disc_m_loss'].numpy())
        disc_p_losses.append(results['disc_p_loss'].numpy())
        cycle_losses.append(results['cycle_loss'].numpy())
        identity_losses.append(results['identity_loss'].numpy())

    # Average the metrics over the epoch
    metrics_history['epoch'].append(epoch)
    metrics_history['gen_g_loss'].append(sum(gen_g_losses)/len(gen_g_losses))
    metrics_history['gen_f_loss'].append(sum(gen_f_losses)/len(gen_f_losses))
    metrics_history['disc_m_loss'].append(sum(disc_m_losses)/len(disc_m_losses))
    metrics_history['disc_p_loss'].append(sum(disc_p_losses)/len(disc_p_losses))
    metrics_history['cycle_loss'].append(sum(cycle_losses)/len(cycle_losses))
    metrics_history['identity_loss'].append(sum(identity_losses)/len(identity_losses))

    print(f"Epoch {epoch}/{EPOCHS} completed.")

Epoch 1/10 completed.
Epoch 2/10 completed.
Epoch 3/10 completed.
Epoch 4/10 completed.
Epoch 5/10 completed.
Epoch 6/10 completed.
Epoch 7/10 completed.
Epoch 8/10 completed.
Epoch 9/10 completed.
Epoch 10/10 completed.


In [None]:
generator_G.save('generator_G.h5')   # Photo->Monet generator
generator_F.save('generator_F.h5')   # Monet->Photo generator

NameError: name 'generator_G' is not defined

**Evaluation of the CycleGan Method**

In [None]:
df_metrics_cycleGAN = pd.DataFrame(metrics_history)
df_metrics_cycleGAN.head()

Unnamed: 0,epoch,gen_g_loss,gen_f_loss,disc_m_loss,disc_p_loss,cycle_loss,identity_loss
0,1,8.993119,8.923896,0.451812,0.43337,5.777522,2.710005
1,2,6.866129,6.857996,0.262974,0.28032,4.436697,2.034035
2,3,6.354539,6.311831,0.242731,0.248642,4.068135,1.849852
3,4,6.131477,6.075729,0.243348,0.259131,3.895566,1.776513
4,5,6.095644,6.027769,0.227133,0.247844,3.865579,1.755112


Generate Transformations on the test data to submit to kaggle competition for scoring.

In [None]:
import glob
import tensorflow as tf
import os

test_photos_paths = glob.glob('/kaggle/input/gan-getting-started/test_photos/*.jpg')  # Adjust path as needed

def load_and_preprocess_image(img_path):
    img = tf.io.read_file(img_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [256,256])
    # Normalize as per training, for CycleGAN typically [-1,1]
    img = (img / 127.5) - 1
    return tf.expand_dims(img, 0)  # Add batch dimension

In [None]:
import numpy as np
import tensorflow as tf

os.makedirs('generated_images', exist_ok=True)

for i, path in enumerate(test_photos_paths):
    input_image = load_and_preprocess_image(path)
    fake_monet = generator_G(input_image, training=False)
    # De-normalize: from [-1,1] to [0,255]
    fake_monet = (fake_monet[0].numpy() * 127.5 + 127.5).astype(np.uint8)
    out_path = f'generated_images/image_{i}.jpg'
    tf.keras.utils.save_img(out_path, fake_monet)

**Examine Training Process**

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))

# Plot generator losses
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['gen_g_loss'], label='gen_g_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['gen_f_loss'], label='gen_f_loss')

# Plot discriminator losses
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['disc_m_loss'], label='disc_m_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['disc_p_loss'], label='disc_p_loss')

# Plot cycle and identity losses
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['cycle_loss'], label='cycle_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['identity_loss'], label='identity_loss')

plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('CycleGAN Training Losses Over Epochs')
plt.legend()
plt.grid(True)
plt.show()

### Hyperparameter Tuning

While the CUT model seemed like a good, more straightforward approach, the training of the model was quite complx, requiring custom loss functions, and took significant time.

As the performance was also slightly lower than the CylceGAN, I am moving forward with the CycleGAN approach. In this section I will perform hyperparameter tuning on the model.

In [None]:
import keras_tuner as kt

def build_model(hp):
    # Choose LR from a range
    lr = hp.Choice('learning_rate', [2e-4, 1e-4, 5e-5])
    lam = hp.Choice('lambda_cycle', [5, 10, 15])

    # Build generator discriminator with chosen hyperparameters
    generator = build_generator(n_res=9)
    discriminator = build_discriminator()

    gen_optimizer = tf.keras.optimizers.Adam(lr, beta_1=0.5)
    disc_optimizer = tf.keras.optimizers.Adam(lr, beta_1=0.5)

    # Return a compile-like step or a model object. Since CycleGAN isn't a standard Keras compile scenario,
    # you might need a custom training loop integrated here or wrap training in a lambda.

    return (generator, discriminator, gen_optimizer, disc_optimizer, lam)

def hypermodel(hp):
    # Build and do partial training, return a metric (e.g., cycle_loss or FID)
    generator, discriminator, gen_opt, disc_opt, lam = build_model(hp)
    # Perform a short training session (e.g., 2-3 epochs) and compute a metric
    fid_score = train_and_get_fid(generator, discriminator, gen_opt, disc_opt, lambda_cycle=lam, epochs=3)
    return fid_score

tuner = kt.BayesianOptimization(
    hypermodel,
    objective='val_loss',  # or a custom objective like negative FID
    max_trials=10,
    directory='my_tuner_dir',
    project_name='cyclegan_tuning'
)

tuner.search()
best_hps = tuner.get_best_hyperparameters(num_trials=1)[0]

---

# Final Model

### Evaluation

**Train Model on Test Set**

In [None]:
test_photos_paths = glob.glob('/kaggle/input/gan-getting-started/test_photos/*.jpg')  # Adjust path as needed

def load_and_preprocess_image(img_path):
    img = tf.io.read_file(img_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [256,256])
    # Normalize as per training, for CycleGAN typically [-1,1]
    img = (img / 127.5) - 1
    return tf.expand_dims(img, 0)  # Add batch dimension

In [None]:
os.makedirs('generated_images_final', exist_ok=True)

for i, path in enumerate(test_photos_paths):
    input_image = load_and_preprocess_image(path)
    fake_monet = generator_G(input_image, training=False)
    # De-normalize: from [-1,1] to [0,255]
    fake_monet = (fake_monet[0].numpy() * 127.5 + 127.5).astype(np.uint8)
    out_path = f'generated_images/image_{i}.jpg'
    tf.keras.utils.save_img(out_path, fake_monet)

**Plotting Training Metrics**

In [None]:
import matplotlib.pyplot as plt

# Example: Plot generator and discriminator losses
plt.figure(figsize=(10,6))
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['gen_g_loss'], label='gen_g_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['gen_f_loss'], label='gen_f_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['disc_m_loss'], label='disc_m_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['disc_p_loss'], label='disc_p_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['cycle_loss'], label='cycle_loss')
plt.plot(df_metrics_cycleGAN['epoch'], df_metrics_cycleGAN['identity_loss'], label='identity_loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Losses Over Epochs')
plt.legend()
plt.grid(True)
plt.show()

### Interpretation

**Visual Comparison of Generated Outputs**

In [None]:
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import os

# Suppose you have a trained generator: generator_G (Photo->Monet)
# and a directory of test photos: /path/to/test_photos

def load_and_preprocess_image(img_path):
    img = tf.io.read_file(img_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, [256,256])
    img = (img / 127.5) - 1  # Scale to [-1,1]
    return tf.expand_dims(img, 0)  # Add batch dimension

test_image_paths = ["/path/to/test_photos/img1.jpg",
                    "/path/to/test_photos/img2.jpg",
                    "/path/to/test_photos/img3.jpg"]

plt.figure(figsize=(12, 8))
for i, path in enumerate(test_image_paths):
    input_image = load_and_preprocess_image(path)
    fake_monet = generator_G(input_image, training=False)
    fake_monet = (fake_monet[0].numpy() * 127.5 + 127.5).astype(np.uint8)

    # Original image
    orig = tf.image.decode_jpeg(tf.io.read_file(path), channels=3)
    orig = tf.image.resize(orig, [256,256]).numpy().astype(np.uint8)

    # Plot
    plt.subplot(len(test_image_paths), 2, 2*i + 1)
    plt.imshow(orig)
    plt.title("Original")
    plt.axis('off')

    plt.subplot(len(test_image_paths), 2, 2*i + 2)
    plt.imshow(fake_monet)
    plt.title("Translated (Monet-style)")
    plt.axis('off')

plt.tight_layout()
plt.show()