# Transform Photos to Monet Paintings with CycleGANs

This project implements a CycleGAN (Generative Adversarial Network) to transfer the artistic style of Claude Monet's paintings to regular photographs.
It demonstrates unpaired image-to-image translation, where the model learns to capture the essence of Monet's style—color palettes, brush strokes, and impressionistic feel—and apply it to new photos without requiring directly corresponding pairs of photos and paintings.

### Importing Required Libraries

In [None]:
!pip -q install tensorflow

In [None]:
import warnings
warnings.simplefilter('ignore')

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from keras.utils import plot_model
from keras.layers import TFSMLayer

import numpy as np
import matplotlib.pyplot as plt

print(f"TensorFlow version: {tf.__version__}")

The following code initializes a TPU strategy if a TPU is available. Otherwise, it defaults to other available strategies (GPU/CPU).

In [None]:
try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver.connect()
    strategy = tf.distribute.TPUStrategy(tpu)
    print('Running on TPU:', tpu.master())
except ValueError:
    strategy = tf.distribute.get_strategy()
    print(f"Running on {strategy.num_replicas_in_sync} replicas")
    print(tf.config.list_physical_devices('GPU'))

### Defining Helper Functions

In [None]:
AUTOTUNE = tf.data.AUTOTUNE
IMAGE_SIZE = [256, 256]

def decode_image(file_path):
    """
    Reads and decodes a JPEG image file to a TensorFlow tensor,
    normalizes it to [-1, 1], and reshapes it.
    """
    image = tf.io.read_file(file_path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = (tf.cast(image, tf.float32) / 127.5) - 1
    image = tf.image.resize(image, IMAGE_SIZE) # Ensure resizing here
    return image


def load_dataset(directory):
    """
    Creates a tf.data.Dataset from JPEG images in a directory.
    """
    dataset = tf.data.Dataset.list_files(str(Path(directory) / "*.jpg"))
    dataset = dataset.map(decode_image, num_parallel_calls=AUTOTUNE)
    return dataset

## Understanding Image Style Transfer

Image Style Transfer is a computer vision task that involves recomposing an image in the style of another. Essentially, it combines the content of one image with the artistic style (textures, colors, patterns) of a reference style image. CycleGANs allow this to be done even when there are no direct pairs of content and styled images for training.

<center><img src="https://junyanz.github.io/CycleGAN/images/teaser.jpg" width="100%" style="vertical-align:middle;margin:20px 0px"></center>
<p style="color:gray; text-align:center;"><i>Examples of image-to-image translations by CycleGAN.

## CycleGANs Overview

### Recap on Vanilla GANs

A standard Generative Adversarial Network (GAN) consists of two neural networks: a Generator ($\boldsymbol G$) and a Discriminator ($\boldsymbol D$).
*   The **Generator** tries to create realistic data (e.g., images) from random noise or a source domain image.
*   The **Discriminator** tries to distinguish between real data and fake data generated by $\boldsymbol G$.
They are trained in an adversarial process: $\boldsymbol G$ aims to fool $\boldsymbol D$, while $\boldsymbol D$ aims to get better at identifying fakes. This minimax game leads to $\boldsymbol G$ producing increasingly realistic data

### What's Novel About CycleGANs?

CycleGANs introduce two key innovations for unpaired image-to-image translation:

1.  **Unpaired Training Data:** Unlike many style transfer methods that require paired examples (e.g., an exact photo and its stylized version), CycleGANs can learn from two separate, unpaired collections of images (e.g., a set of landscape photos and a set of Monet paintings).

2.  **Cycle Consistency Loss:** This is the core idea. If you translate an image from domain X to domain Y, and then translate it back from Y to X, you should ideally get the original image back. This enforces a structural and content consistency during translation.
    *   Let $G: X \rightarrow Y$ be the generator from domain X to Y, and $F: Y \rightarrow X$ be the generator from Y to X.
    *   Cycle consistency means $F(G(x)) \approx x$ for $x \in X$, and $G(F(y)) \approx y$ for $y \in Y$.
    *   The loss incurred from these cycles (e.g., L1 distance between $x$ and $F(G(x))$) is added to the standard adversarial losses, guiding the generators to learn meaningful mappings.

## Data Loading and Preprocessing

The dataset used is a subset of the ["I'm Something of a Painter Myself" Kaggle competition](https://www.kaggle.com/competitions/gan-getting-started/data), containing Monet paintings and landscape photographs. We have download a prepared subset of 300 images for each domain.

In [None]:
DATA_ROOT_DIR = "data"
MONET_DIR = Path(DATA_ROOT_DIR) / "monet_jpg_300"
PHOTO_DIR = Path(DATA_ROOT_DIR) / "photo_jpg_300"

MONET_FILENAMES = tf.io.gfile.glob(str(MONET_DIR / '*.jpg'))
print('Monet JPG Files:', len(MONET_FILENAMES))

PHOTO_FILENAMES = tf.io.gfile.glob(str(PHOTO_DIR / '*.jpg'))
print('Photo JPG Files:', len(PHOTO_FILENAMES))

Create `tf.data.Dataset` objects for efficient data loading and batching.

In [None]:
monet_ds = load_dataset(str(MONET_DIR)).batch(1)
photo_ds = load_dataset(str(PHOTO_DIR)).batch(1)

print("Monet Dataset:", monet_ds)
print("Photo Dataset:", photo_ds)

Let's visualize a sample from each dataset.

In [None]:
example_monet = next(iter(monet_ds))
example_photo = next(iter(photo_ds))

plt.figure(figsize=(8, 4))
plt.subplot(121)
plt.title('Sample Photo')
plt.imshow(example_photo[0] * 0.5 + 0.5) # Denormalize for viewing
plt.axis('off')

plt.subplot(122)
plt.title('Sample Monet Painting')
plt.imshow(example_monet[0] * 0.5 + 0.5) # Denormalize for viewing
plt.axis('off')
plt.show()

## Building the Generator
The Generator architecture is typically an encoder-decoder structure with skip connections (like U-Net). It uses downsampling blocks to encode the input image into a compact representation, and upsampling blocks to decode this representation into the target style image.

### Defining the Downsampling Block
Each downsampling block consists of a Convolutional layer, Instance Normalization (implemented via Group Normalization), and a LeakyReLU activation. Instance Normalization is preferred over Batch Normalization in style transfer tasks as it normalizes features per-sample, preserving style information better. We use `tf.keras.layers.GroupNormalization` with `groups` equal to the number of `filters` to achieve Instance Normalization.

In [None]:
OUTPUT_CHANNELS = 3

def downsample(filters, size, apply_instancenorm=True):
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

    result = keras.Sequential()
    result.add(layers.Conv2D(filters, size, strides=2, padding='same',
                             kernel_initializer=initializer, use_bias=False))

    if apply_instancenorm:
        # Use GroupNormalization as InstanceNormalization by setting groups=filters
        result.add(layers.GroupNormalization(groups=filters, gamma_initializer=gamma_init))

    result.add(layers.LeakyReLU())

    return result

### Defining the Upsampling Block
Each upsampling block uses a Transposed Convolutional layer (Conv2DTranspose) to increase spatial dimensions, followed by Instance Normalization (via Group Normalization) and ReLU activation. Dropout can be applied to some upsampling layers to prevent overfitting.

In [None]:
def upsample(filters, size, apply_dropout=False):
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

    result = keras.Sequential()
    result.add(layers.Conv2DTranspose(filters, size, strides=2,
                                      padding='same',
                                      kernel_initializer=initializer,
                                      use_bias=False))

    # Use GroupNormalization as InstanceNormalization by setting groups=filters
    result.add(layers.GroupNormalization(groups=filters, gamma_initializer=gamma_init))

    if apply_dropout:
        result.add(layers.Dropout(0.5))

    result.add(layers.ReLU())

    return result

### Assembling the Generator
The full generator model connects downsampling blocks, a series of residual blocks (not explicitly defined as separate blocks here but implied by the U-Net like structure with skip connections), and upsampling blocks. Skip connections concatenate feature maps from downsampling layers to corresponding upsampling layers, helping preserve low-level details.

In [None]:
def Generator():
    inputs = layers.Input(shape=[256,256,3])

    # Downsampling stack
    down_stack = [
        downsample(64, 4, apply_instancenorm=False), # (bs, 128, 128, 64)
        downsample(128, 4), # (bs, 64, 64, 128)
        downsample(256, 4), # (bs, 32, 32, 256)
        downsample(512, 4), # (bs, 16, 16, 512)
        downsample(512, 4), # (bs, 8, 8, 512)
        downsample(512, 4), # (bs, 4, 4, 512)
        downsample(512, 4), # (bs, 2, 2, 512)
        downsample(512, 4), # (bs, 1, 1, 512)
    ]

    # Upsampling stack
    up_stack = [
        upsample(512, 4, apply_dropout=True), # (bs, 2, 2, 1024)
        upsample(512, 4, apply_dropout=True), # (bs, 4, 4, 1024)
        upsample(512, 4, apply_dropout=True), # (bs, 8, 8, 1024)
        upsample(512, 4), # (bs, 16, 16, 1024)
        upsample(256, 4), # (bs, 32, 32, 512)
        upsample(128, 4), # (bs, 64, 64, 256)
        upsample(64, 4), # (bs, 128, 128, 128)
    ]

    initializer = tf.random_normal_initializer(0., 0.02)
    last = layers.Conv2DTranspose(OUTPUT_CHANNELS, 4,
                                  strides=2,
                                  padding='same',
                                  kernel_initializer=initializer,
                                  activation='tanh') # Output normalized to [-1, 1]

    x = inputs

    # Downsampling through the model
    skips = []
    for down in down_stack:
        x = down(x)
        skips.append(x)

    skips = reversed(skips[:-1]) # Exclude the last one (bottleneck)

    # Upsampling and establishing the skip connections
    for up, skip in zip(up_stack, skips):
        x = up(x)
        x = layers.Concatenate()([x, skip])

    x = last(x)

    return keras.Model(inputs=inputs, outputs=x)

Visualize the Generator architecture.

In [None]:
gen_model_example = Generator()
plot_model(gen_model_example, show_shapes=True, show_layer_names=True, dpi=60)

In [None]:
gen_model_example.summary()

## Building the Discriminator
The Discriminator is a convolutional neural network that classifies input images (or patches of images, in a PatchGAN) as real or fake. It typically consists of several downsampling convolutional layers. Instance Normalization (via Group Normalization) is also used here.

In [None]:
def Discriminator():
    initializer = tf.random_normal_initializer(0., 0.02)
    gamma_init = keras.initializers.RandomNormal(mean=0.0, stddev=0.02)

    inp = layers.Input(shape=[256, 256, 3], name='input_image')

    x = inp

    down1 = downsample(64, 4, apply_instancenorm=False)(x) # (bs, 128, 128, 64) - No InstanceNorm on first layer typically
    down2 = downsample(128, 4)(down1) # (bs, 64, 64, 128)
    down3 = downsample(256, 4)(down2) # (bs, 32, 32, 256)

    zero_pad1 = layers.ZeroPadding2D()(down3) # (bs, 34, 34, 256)
    conv = layers.Conv2D(512, 4, strides=1, # Strides=1 for PatchGAN-like behavior
                         kernel_initializer=initializer,
                         use_bias=False)(zero_pad1) # (bs, 31, 31, 512)

    # Use GroupNormalization as InstanceNormalization. Input to this layer has 512 channels from 'conv'.
    norm1 = layers.GroupNormalization(groups=512, gamma_initializer=gamma_init)(conv)
    leaky_relu = layers.LeakyReLU()(norm1)

    zero_pad2 = layers.ZeroPadding2D()(leaky_relu) # (bs, 33, 33, 512)
    last = layers.Conv2D(1, 4, strides=1, # Output is a single channel (logit for real/fake)
                         kernel_initializer=initializer)(zero_pad2) # (bs, 30, 30, 1)

    return keras.Model(inputs=inp, outputs=last)

Visualize the Discriminator architecture.

In [None]:
disc_model_example = Discriminator()
plot_model(disc_model_example, show_shapes=True, show_layer_names=True, dpi=60)

In [None]:
#Parameters
disc_model_example.summary()

## Building the CycleGAN Model

A CycleGAN requires two Generators ($G_{AB}$: Photo $\rightarrow$ Monet, $G_{BA}$: Monet $\rightarrow$ Photo) and two Discriminators ($D_A$: distinguishes real Photos from fake Photos generated by $G_{BA}$; $D_B$: distinguishes real Monet paintings from fake Monet paintings generated by $G_{AB}$)

In [None]:
with strategy.scope(): # For distributed training (TPU/multi-GPU)
    monet_generator = Generator() # Transforms photos to Monet-esque paintings (G_AB)
    photo_generator = Generator() # Transforms Monet paintings to photos (G_BA)

    monet_discriminator = Discriminator() # Differentiates real Monet and fake Monet (D_B)
    photo_discriminator = Discriminator() # Differentiates real photos and fake photos (D_A)

The `CycleGan` class encapsulates the entire model, its components, and the custom training step.

In [None]:
class CycleGan(keras.Model):
    def __init__(
        self,
        monet_generator, # G_AB (photo -> monet)
        photo_generator, # G_BA (monet -> photo)
        monet_discriminator, # D_B (discriminates monet)
        photo_discriminator, # D_A (discriminates photo)
        lambda_cycle=10, # Weight for cycle consistency loss
        lambda_identity=0.5 # Weight for identity loss (relative to lambda_cycle)
    ):
        super(CycleGan, self).__init__()
        self.m_gen = monet_generator
        self.p_gen = photo_generator
        self.m_disc = monet_discriminator
        self.p_disc = photo_discriminator
        self.lambda_cycle = lambda_cycle
        self.lambda_identity_factor = lambda_identity

    def compile(
        self,
        m_gen_optimizer,
        p_gen_optimizer,
        m_disc_optimizer,
        p_disc_optimizer,
        gen_loss_fn,
        disc_loss_fn,
        cycle_loss_fn,
        identity_loss_fn
    ):
        super(CycleGan, self).compile()
        self.m_gen_optimizer = m_gen_optimizer
        self.p_gen_optimizer = p_gen_optimizer
        self.m_disc_optimizer = m_disc_optimizer
        self.p_disc_optimizer = p_disc_optimizer
        self.gen_loss_fn = gen_loss_fn
        self.disc_loss_fn = disc_loss_fn
        self.cycle_loss_fn = cycle_loss_fn
        self.identity_loss_fn = identity_loss_fn

    @tf.function
    def train_step(self, batch_data):
        real_monet, real_photo = batch_data # Monet is domain B, Photo is domain A

        # Calculate actual identity loss weight
        current_lambda_identity = self.lambda_identity_factor * self.lambda_cycle

        with tf.GradientTape(persistent=True) as tape:
            # Forward cycle: Photo (A) -> Monet (B) -> Photo (A')
            fake_monet = self.m_gen(real_photo, training=True) # G_AB(A) = B_fake
            cycled_photo = self.p_gen(fake_monet, training=True) # G_BA(B_fake) = A_cycled

            # Backward cycle: Monet (B) -> Photo (A) -> Monet (B')
            fake_photo = self.p_gen(real_monet, training=True) # G_BA(B) = A_fake
            cycled_monet = self.m_gen(fake_photo, training=True) # G_AB(A_fake) = B_cycled

            # Identity mapping: Generator should not change images from its target domain
            same_monet = self.m_gen(real_monet, training=True) # G_AB(B) ideally should be B
            same_photo = self.p_gen(real_photo, training=True) # G_BA(A) ideally should be A

            # Discriminator outputs for real images
            disc_real_monet = self.m_disc(real_monet, training=True) # D_B(B_real)
            disc_real_photo = self.p_disc(real_photo, training=True) # D_A(A_real)

            # Discriminator outputs for fake images
            disc_fake_monet = self.m_disc(fake_monet, training=True) # D_B(B_fake)
            disc_fake_photo = self.p_disc(fake_photo, training=True) # D_A(A_fake)

            # Generator adversarial losses
            monet_gen_adv_loss = self.gen_loss_fn(disc_fake_monet) # For G_AB
            photo_gen_adv_loss = self.gen_loss_fn(disc_fake_photo) # For G_BA

            # Cycle consistency losses (L1 norm)
            forward_cycle_loss = self.cycle_loss_fn(real_photo, cycled_photo, self.lambda_cycle)
            backward_cycle_loss = self.cycle_loss_fn(real_monet, cycled_monet, self.lambda_cycle)
            total_cycle_loss = forward_cycle_loss + backward_cycle_loss

            # Identity losses (L1 norm)
            monet_identity_loss = self.identity_loss_fn(real_monet, same_monet, current_lambda_identity)
            photo_identity_loss = self.identity_loss_fn(real_photo, same_photo, current_lambda_identity)

            # Total generator losses
            total_monet_gen_loss = monet_gen_adv_loss + total_cycle_loss + monet_identity_loss
            total_photo_gen_loss = photo_gen_adv_loss + total_cycle_loss + photo_identity_loss

            # Discriminator losses
            monet_disc_loss = self.disc_loss_fn(disc_real_monet, disc_fake_monet)
            photo_disc_loss = self.disc_loss_fn(disc_real_photo, disc_fake_photo)

        # Calculate gradients
        m_gen_grads = tape.gradient(total_monet_gen_loss, self.m_gen.trainable_variables)
        p_gen_grads = tape.gradient(total_photo_gen_loss, self.p_gen.trainable_variables)
        m_disc_grads = tape.gradient(monet_disc_loss, self.m_disc.trainable_variables)
        p_disc_grads = tape.gradient(photo_disc_loss, self.p_disc.trainable_variables)

        # Apply gradients
        self.m_gen_optimizer.apply_gradients(zip(m_gen_grads, self.m_gen.trainable_variables))
        self.p_gen_optimizer.apply_gradients(zip(p_gen_grads, self.p_gen.trainable_variables))
        self.m_disc_optimizer.apply_gradients(zip(m_disc_grads, self.m_disc.trainable_variables))
        self.p_disc_optimizer.apply_gradients(zip(p_disc_grads, self.p_disc.trainable_variables))

        return {
            "monet_gen_loss": total_monet_gen_loss,
            "photo_gen_loss": total_photo_gen_loss,
            "monet_disc_loss": monet_disc_loss,
            "photo_disc_loss": photo_disc_loss,
            "total_cycle_loss": total_cycle_loss,
            "monet_identity_loss": monet_identity_loss,
            "photo_identity_loss": photo_identity_loss
        }

## Defining Loss Functions
The CycleGAN uses several types of loss functions:

**1. Adversarial Loss:** measures how well the generator can fool the discriminator.

Typically Binary Cross-Entropy (BCE) loss.

* For the discriminator, it's the sum of BCE for real images (target 1s) and fake images (target 0s).

* For the generator, it's BCE for fake images (target 1s, as it tries to make them look real).

In [None]:
with strategy.scope():
    # Using from_logits=True as discriminator output is a logit
    bce = tf.keras.losses.BinaryCrossentropy(from_logits=True, reduction=tf.keras.losses.Reduction.NONE)

    def discriminator_loss(real, generated):
        real_loss = bce(tf.ones_like(real), real)
        generated_loss = bce(tf.zeros_like(generated), generated)
        total_disc_loss = real_loss + generated_loss
        # Average over the patch/image outputs and then over the batch
        # For PatchGAN, the loss is averaged over all patches.
        return tf.reduce_mean(total_disc_loss) * 0.5

    def generator_loss(generated):
        # Generator tries to make discriminator output 1 for fake images
        return tf.reduce_mean(bce(tf.ones_like(generated), generated))

**2. Cycle Consistency Loss:** penalizes the difference (L1 norm) between an original image and its cycled version. This is crucial for unpaired translation.

In [None]:
with strategy.scope():
    def calc_cycle_loss(real_image, cycled_image, LAMBDA_CYCLE):
        loss1 = tf.reduce_mean(tf.abs(real_image - cycled_image))
        return LAMBDA_CYCLE * loss1

**3. Identity Loss:** encourages the generator to act as an identity mapping when given an image from its target domain. E.g., the photo-to-Monet generator should not significantly alter an image that is already a Monet painting. This helps preserve color composition.

In [None]:
with strategy.scope():
    def identity_loss(real_image, same_image, LAMBDA_IDENTITY):
        loss = tf.reduce_mean(tf.abs(real_image - same_image))
        return LAMBDA_IDENTITY * loss

## Model Training

In [None]:
with strategy.scope():
    # Optimizers
    monet_generator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
    photo_generator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
    monet_discriminator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)
    photo_discriminator_optimizer = tf.keras.optimizers.Adam(2e-4, beta_1=0.5)

    # Create CycleGAN model instance
    # lambda_identity is a factor of lambda_cycle (e.g., 0.5 * lambda_cycle)
    cycle_gan_model = CycleGan(
        monet_generator, photo_generator, monet_discriminator, photo_discriminator,
        lambda_cycle=10, lambda_identity=0.5
    )

    # Compile the model
    cycle_gan_model.compile(
        m_gen_optimizer = monet_generator_optimizer,
        p_gen_optimizer = photo_generator_optimizer,
        m_disc_optimizer = monet_discriminator_optimizer,
        p_disc_optimizer = photo_discriminator_optimizer,
        gen_loss_fn = generator_loss,
        disc_loss_fn = discriminator_loss,
        cycle_loss_fn = calc_cycle_loss,
        identity_loss_fn = identity_loss
    )

In [None]:
EPOCHS_TO_TRAIN = 50  # Set to a higher number for actual training (e.g., 25)

#Check if datasets are not empty
if tf.data.experimental.cardinality(monet_ds).numpy() == 0 or tf.data.experimental.cardinality(photo_ds).numpy() == 0:
    print("Error: One or both datasets are empty. Please check data loading.")
else:
    print(f"Starting training for {EPOCHS_TO_TRAIN} epochs...")

    # Uncomment the following block for actual training
    history = cycle_gan_model.fit(
        tf.data.Dataset.zip((monet_ds.repeat(), photo_ds.repeat())),
        epochs=EPOCHS_TO_TRAIN,
        steps_per_epoch=max(len(MONET_FILENAMES), len(PHOTO_FILENAMES))
    )

The cell below starts the training process. Training GANs can be time-consuming, especially without a GPU/TPU. For a demonstration, a few epochs might suffice to see some initial results. For good quality, more epochs (e.g., 25-100+) are usually needed.

**Note:** We limited compute, consider training for a very small number of epochs (e.g., `epochs=1` or `epochs=5`) or skipping this step and directly loading pre-trained weights in the next section.

In [None]:
# Save trained models
monet_generator.save('monet_generator_trained_model')  # Saves in SavedModel format
photo_generator.save('photo_generator_trained_model')
# Alternatively save just weights:
# monet_generator.save_weights('monet_generator_trained.weights.h5')
# photo_generator.save_weights('photo_generator_trained.weights.h5')

### Loading Pre-trained Weights
For convenience, pre-trained weights for the Monet generator (trained for approximately 50 epochs on the Kaggle dataset) can be downloaded and loaded. This allows for immediate visualization without lengthy training.

In [None]:
PRETRAINED_MODEL_PATH = "models/monet_generator_50_epochs"
monet_generator_pretrained = None

if Path(PRETRAINED_MODEL_PATH).exists():
    try:
        with strategy.scope():  # Use strategy scope if needed
            monet_generator_pretrained = TFSMLayer(
                PRETRAINED_MODEL_PATH,
                call_endpoint="serving_default"  # You might need to adjust this
            )
        print("Pre-trained Monet generator loaded successfully as TFSMLayer.")
    except Exception as e:
        print(f"Error loading pre-trained model: {e}")
else:
    print(f"Pre-trained model path {PRETRAINED_MODEL_PATH} not found. Skipping loading.")


In [None]:
PRETRAINED_MODEL_PATH = "monet_generator_50_epochs"
monet_generator_pretrained = None

if Path(PRETRAINED_MODEL_PATH).exists():
    try:
        with strategy.scope():
            # Load as a inference-only layer
            monet_generator_pretrained = TFSMLayer(
                PRETRAINED_MODEL_PATH,
                call_endpoint='serving_default'  # Adjust if your model uses a different endpoint
            )
        print("Pre-trained Monet generator loaded as TFSMLayer.")
    except Exception as e:
        print(f"Error loading pre-trained model: {e}")
else:
    print(f"Pre-trained model path {PRETRAINED_MODEL_PATH} not found. Skipping loading.")

## Visualize Results
After training (or by loading pre-trained weights), we can visualize the style transfer by taking a few sample photos and transforming them using the loaded (or trained) Monet generator.

In [None]:
def display_generated_images(generator_model, num_images=5):
    if generator_model is None:
        print("Generator model not available for visualization.")
        return
    if not PHOTO_FILENAMES:
        print("Photo filenames not loaded. Cannot generate images.")
        return

    plt.figure(figsize=(12, num_images * 4))
    for i in range(num_images):
        rand_idx = random.randint(0, len(PHOTO_FILENAMES) - 1)
        img_path = PHOTO_FILENAMES[rand_idx]
        input_image = decode_image(img_path)
        input_image_batch = tf.expand_dims(input_image, axis=0)  # Keep batch dim for model input

        # Get prediction (handles both TFSMLayer and regular models)
        prediction_output = generator_model(input_image_batch, training=False)

        # Extract output based on model type
        if isinstance(prediction_output, dict):  # TFSMLayer returns a dict
            prediction = prediction_output.get('output_0', prediction_output.get('serving_default', None))
            if prediction is None:
                prediction = list(prediction_output.values())[0]
        else:  # Regular model
            prediction = prediction_output[0] if isinstance(prediction_output, (list, tuple)) else prediction_output

        # Convert to numpy and remove batch dimension (if present)
        prediction = np.squeeze(prediction.numpy())  # Removes extra dims (e.g., (1, 256, 256, 3) → (256, 256, 3))
        input_image = input_image.numpy()  # Already (256, 256, 3)

        # Denormalize (if normalized to [-1, 1])
        display_input = (input_image * 0.5 + 0.5).clip(0, 1)  # clip to avoid overflow
        display_prediction = (prediction * 0.5 + 0.5).clip(0, 1)

        # Plot
        plt.subplot(num_images, 2, i * 2 + 1)
        plt.imshow(display_input)
        plt.title(f"Input Photo {i+1}")
        plt.axis("off")

        plt.subplot(num_images, 2, i * 2 + 2)
        plt.imshow(display_prediction)
        plt.title(f"Monet-esque Output {i+1}")
        plt.axis("off")

    plt.tight_layout()
    plt.show()

In [None]:
# Visualize using the pre-trained model
if monet_generator_pretrained:
    display_generated_images(monet_generator_pretrained, num_images=5)
else:
    print("Skipping visualization as pre-trained model was not loaded.")


In [None]:
# Visualize using the trained and saved model
my_trained_generator = tf.keras.models.load_model('monet_generator_trained_model')
display_generated_images(my_trained_generator, num_images=3)

## Conclusion

This notebook demonstrated the implementation of a CycleGAN for artistic style transfer, specifically transforming photographs into Monet-like paintings. Key takeaways include understanding the CycleGAN architecture, the importance of cycle-consistency and identity losses for unpaired image translation, and the process of building, training, and evaluating such models.
