# <center> **AI Face Generation with a Kaggle TPU** </center>

***Welcome! This notebook contains a complete, end-to-end implementation of a Generative Adversarial Network (GAN) designed to create artificial human faces from scratch.***

***This notebook is specifically configured to run on a **Kaggle TPU VM v3-8 accelerator** for maximum training speed. We will walk through every step, from loading the data to training the competing Generator and Discriminator models.***

## <center>**Step 1: Import Essential Libraries**</center>

***First, we'll import all the necessary packages. This includes TensorFlow for building the model, Matplotlib for visualizing results, and helper libraries like PIL for image manipulation and `tqdm` for progress bars.***

In [None]:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
from tqdm import tqdm
import time

## <center>**Step 2: Initialize the TPU Strategy**</center>

***This is the most critical step for using a TPU. We create a `TPUStrategy` object that tells TensorFlow how to find and distribute the computations across all 8 cores of the TPU.***

***All model building, loss definitions, and optimizer creation must be done within the scope of this strategy (`with strategy.scope():`) to ensure they are placed on the TPU hardware for accelerated, distributed training.***

In [None]:
try:
    # Explicitly connect to the local TPU
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='local')
    
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.TPUStrategy(tpu)
    print('✅ TPU initialized successfully.')

except Exception as e:
    print(f"TPU initialization failed: {e}")
    # Fall back to the default strategy if TPU is not available
    strategy = tf.distribute.get_strategy()

print("✅ REPLICAS: ", strategy.num_replicas_in_sync)
REPLICAS = strategy.num_replicas_in_sync

## <center>**Step 3: Configure Project Parameters**</center>

***Here, we define all the key constants for our project. This includes image dimensions, the path to our dataset, and crucial training hyperparameters.***

***For TPU training, the `GLOBAL_BATCH_SIZE` is the most important parameter. It is calculated by multiplying the `BATCH_SIZE_PER_REPLICA` (the number of images processed by a single TPU core) by the number of `REPLICAS` (which is 8). This ensures all TPU cores are kept busy, maximizing training speed.***

In [None]:
# Image dimensions
IMAGE_WIDTH = 128
IMAGE_HEIGHT = 128
IMAGE_CHANNELS = 3
IMAGE_SHAPE = (IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS)

# Dataset path
DATASET_PATH = '/kaggle/input/face-mask-lite-dataset/without_mask'

# The size of the random noise vector (latent space) for the generator
LATENT_DIM = 100

# --- TPU-Specific Training Configuration ---
BATCH_SIZE_PER_REPLICA = 128
GLOBAL_BATCH_SIZE = BATCH_SIZE_PER_REPLICA * REPLICAS
EPOCHS = 800

print(f"Batch size per replica: {BATCH_SIZE_PER_REPLICA}")
print(f"Global batch size (for all {REPLICAS} replicas): {GLOBAL_BATCH_SIZE}")

## <center>**Step 4: Load and Prepare the Dataset**</center>

***In this step, we load the images from the disk and prepare them for training. This involves several key actions:***

***1.  Reading each image file from the specified directory.***

***2.  Resizing it to our standard 128x128 dimensions.***

***3.  Normalizing the pixel values from the original `[0, 255]` range to `[-1, 1]`. This is a critical step that helps stabilize GAN training.***

***4.  Creating a high-performance `tf.data.Dataset` pipeline.This object efficiently shuffles, batches, and prefetches the data, ensuring the TPU is never idle waiting for the next batch of images.***

In [None]:
def load_images_from_path(path, target_size=(IMAGE_WIDTH, IMAGE_HEIGHT)):
    """Loads all images from a directory, resizes, and normalizes them."""
    image_list = []
    print(f"Loading images from: {path}")

    # Use tqdm for a progress bar
    for filename in tqdm(os.listdir(path)):
        try:
            img_path = os.path.join(path, filename)
            # Open the image using PIL, convert to RGB, and resize
            img = Image.open(img_path).convert('RGB').resize(target_size)
            image_list.append(np.asarray(img))
        except Exception as e:
            print(f"\nSkipping file {filename} due to error: {e}")

    # Convert the list of images to a single NumPy array
    images_np = np.array(image_list, dtype='float32')

    # Normalize the images to the range [-1, 1]. This is crucial for GAN stability.
    images_np = (images_np - 127.5) / 127.5
    return images_np

# --- Create the Data Pipeline for the TPU ---

# 1. Load the images into a NumPy array.
X_train = load_images_from_path(DATASET_PATH)
print(f"\nDataset loaded into memory. Shape of image array: {X_train.shape}")

# 2. Create a TensorFlow Dataset object.
train_dataset = tf.data.Dataset.from_tensor_slices(X_train)

# 3. Shuffle, Batch, and Prefetch for high performance.
#   .shuffle(): Randomizes the order of images each epoch for better training.
#   .batch(): Groups images into the large GLOBAL_BATCH_SIZE we defined.
#   .prefetch(): Pre-loads the next batch onto the TPU while the current one is processing.
#   .cache(): Caches the dataset in memory after the first epoch to speed up subsequent epochs.
train_dataset = train_dataset.cache().shuffle(buffer_size=X_train.shape[0]).batch(GLOBAL_BATCH_SIZE).prefetch(tf.data.AUTOTUNE)

print(f"✅ Data pipeline created successfully and is ready for training.")

## <center>**Step 5: Build the Generator**</center>

***Now we'll build the first of our two competing models: the **Generator**. This model acts as the "artist." Its job is to take a small vector of random numbers (from the "latent space") and learn how to transform it into a complex, 128x128 image that looks like a real human face.***

***The architecture works by progressively upsampling the input through a series of `Conv2DTranspose` layers, which are essentially the reverse of a standard convolutional layer. The `tanh` activation on the final layer is important because it ensures the output pixel values are in the `[-1, 1]` range, perfectly matching the normalization of our real images.***

***Crucially, we define the model within the `strategy.scope()` to ensure it is created on the TPU hardware for accelerated training.***

In [None]:
def build_generator(latent_dim=LATENT_DIM):
    """
    Creates the Generator model.
    It takes a random noise vector and upsamples it to a 128x128x3 image.
    """
    model = Sequential(name='Generator')

    # Start with a dense layer to project the noise vector into a suitable shape
    # We'll start with a small 8x8 image with 1024 filters
    model.add(layers.Dense(8 * 8 * 1024, input_dim=latent_dim))
    model.add(layers.Reshape((8, 8, 1024)))

    # --- Upsampling Block 1: 8x8 -> 16x16 ---
    model.add(layers.Conv2DTranspose(512, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Upsampling Block 2: 16x16 -> 32x32 ---
    model.add(layers.Conv2DTranspose(256, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Upsampling Block 3: 32x32 -> 64x64 ---
    model.add(layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Upsampling Block 4: 64x64 -> 128x128 ---
    model.add(layers.Conv2DTranspose(64, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Output Layer ---
    # Final layer to produce the 128x128 image with 3 color channels (RGB)
    # The 'tanh' activation function squashes the output pixel values to be between -1 and 1,
    # matching the normalization of our real training images.
    model.add(layers.Conv2DTranspose(3, kernel_size=5, padding='same', activation='tanh'))

    return model

# --- Create the Generator within the TPU strategy scope ---
with strategy.scope():
    generator = build_generator()

print("✅ Generator model built successfully.")
generator.summary()

## <center>**Step 6: Build the Discriminator**</center>

***Now we build the second model, the **Discriminator**. This model acts as the "art critic." It's a standard convolutional neural network (CNN) designed for image classification.***

***Its job is to take an image (either a real one from our dataset or a fake one from the Generator) and output a single probability score indicating how likely it thinks the image is to be real. It does this by downsampling the image using `Conv2D` layers to extract key features. The `Dropout` layer is included to help stabilize training by preventing the critic from overpowering the artist too quickly.***

In [None]:
def build_discriminator(image_shape=IMAGE_SHAPE):
    """
    Creates the Discriminator model.
    It's a CNN that takes a 128x128x3 image and classifies it as real (output ~1) or fake (output ~0).
    """
    model = Sequential(name='Discriminator')

    # --- Downsampling Block 1: 128x128 -> 64x64 ---
    model.add(layers.GaussianNoise(0.1, input_shape=image_shape))
    model.add(layers.Conv2D(64, kernel_size=4, strides=2, padding='same'))
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Downsampling Block 2: 64x64 -> 32x32 ---
    model.add(layers.Conv2D(128, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Downsampling Block 3: 32x32 -> 16x16 ---
    model.add(layers.Conv2D(256, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))
    
    # --- Downsampling Block 4: 16x16 -> 8x8 ---
    model.add(layers.Conv2D(512, kernel_size=4, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(alpha=0.2))

    # --- Final Layers for Classification ---
    # Flatten the feature maps into a single vector
    model.add(layers.Flatten())
    
    # Dropout layer to prevent the discriminator from becoming too powerful too quickly.
    model.add(layers.Dropout(0.4))
    
    # Output layer: A single neuron with a sigmoid activation to give a probability (0 to 1).
    model.add(layers.Dense(1, activation='sigmoid'))

    return model

# --- Create the Discriminator within the TPU strategy scope ---
with strategy.scope():
    discriminator = build_discriminator()

print("✅ Discriminator model built successfully.")
discriminator.summary()

## <center>**Step 7: Define Loss Functions and Optimizers**</center>

***Now we define the "rules of the game." This involves setting up the loss functions and optimizers for both models.***

-   ***Loss Function: We use `BinaryCrossentropy`, which is perfect for a classification task with two outcomes (in our case, "real" or "fake"). The **Discriminator's loss** is a combination of its error on real images and its error on fake images. The **Generator's loss** is based purely on how well it manages to fool the Discriminator.***
  
-   ***Optimizers: We use the `Adam` optimizer, a popular and effective choice for GANs. We create two separate instances—one for each model—so they can learn independently.***
  
-   ***TPU Scope: Crucially, these components are defined within the `strategy.scope()` to ensure they are created on the TPU for distributed training. The `tf.nn.compute_average_loss` function is used to correctly average the loss calculated across all 8 TPU cores.***

In [None]:
# This block is mandatory. It ensures that the optimizers and loss function
# are created on the TPU, allowing for distributed training.
with strategy.scope():
    # --- Loss Function ---
    cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=False,
                                                       reduction=tf.keras.losses.Reduction.NONE)

    def discriminator_loss(real_output, fake_output):
        # The discriminator wants to classify real images as 1 and fake images as 0.
        real_loss = cross_entropy(tf.ones_like(real_output), real_output)
        fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
        total_loss = real_loss + fake_loss
        # We must scale the loss by the global batch size for proper gradient updates on the TPU.
        return tf.nn.compute_average_loss(total_loss, global_batch_size=GLOBAL_BATCH_SIZE)

    def generator_loss(fake_output):
        # The generator's goal is to fool the discriminator. It wants the discriminator
        # to classify its fake images as 1 (real).
        loss = cross_entropy(tf.ones_like(fake_output), fake_output)
        # Scale the loss by the global batch size.
        return tf.nn.compute_average_loss(loss, global_batch_size=GLOBAL_BATCH_SIZE)


    # --- Optimizers ---
    # Adam is the go-to optimizer for GANs. It's efficient and works well.
    # We use a slightly lower learning rate (lr) and a beta_1 term of 0.5,
    # which are common practices that help stabilize GAN training.
    # We need two separate optimizers because we train the two networks independently.
    generator_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0001, beta_1=0.5)
    discriminator_optimizer = tf.keras.optimizers.Adam(learning_rate=0.0004, beta_1=0.5)

print("✅ Loss functions and optimizers defined successfully within the TPU scope.")

## <center>**Step 8: Define the Core Training Step**</center>

***This is the heart of our GAN. We wrap the logic for a single training step inside a `@tf.function` decorator. This special command tells TensorFlow to compile the Python code into a highly optimized, high-performance graph, which is essential for getting maximum speed from the TPU.***

***The logic involves:***

***1.  Generating fake images.***

***2.  Asking the Discriminator to classify both real and fake images.***

***3.  Calculating the loss for both models.***

***4.  Calculating the gradients and updating the model weights.***

***The `distributed_train_step` function then acts as a manager, using `strategy.run` to execute this core `train_step` on all TPU cores in parallel and `strategy.reduce` to gather the results.***

In [None]:
# The @tf.function decorator compiles the function into a high-performance TensorFlow graph.
@tf.function
def train_step(images):
    """Executes a single training step on one batch of images."""
    
    # Dynamically get the per-replica batch size from the input tensor's shape.
    batch_size = tf.shape(images)[0]
    noise = tf.random.normal([batch_size, LATENT_DIM])

    # Use tf.GradientTape to record the operations for automatic differentiation.
    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        # 1. Generate a batch of fake images
        generated_images = generator(noise, training=True)

        # 2. Get the discriminator's predictions
        real_output = discriminator(images, training=True)
        fake_output = discriminator(generated_images, training=True)

        # 3. Calculate the loss for each model
        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    # 4. Calculate and apply gradients
    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)
    generator_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    discriminator_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))
    
    return gen_loss, disc_loss


@tf.function
def distributed_train_step(dataset_inputs):
    """Executes the train_step function on each of the TPU replicas."""
    per_replica_losses = strategy.run(train_step, args=(dataset_inputs,))
    
    gen_loss = strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses[0], axis=None)
    disc_loss = strategy.reduce(tf.distribute.ReduceOp.SUM, per_replica_losses[1], axis=None)
    
    return gen_loss, disc_loss


print("✅ Core training step functions re-defined and compiled for the TPU.")

## <center>**Step 9: Create Image Saving Utilities**</center>

***To monitor our GAN's progress during the long training process, it's essential to visualize its output. This helper function will generate a grid of sample images from the Generator and save it to a file.***

***We use a `fixed_seed` (a constant noise vector that doesn't change) to generate these samples. By giving the Generator the same starting point every time, we can clearly see how its ability to create a face from that seed evolves and improves over the epochs. This is a much better way to gauge progress than looking at completely random new images each time.***

In [None]:
# Create a directory to save the generated images during training
if not os.path.exists('gan_images'):
    os.makedirs('gan_images')

def save_plot(images, epoch, n=4):
    """
    Generates a plot of n x n images and saves it to a file.
    The pixel values are rescaled from [-1, 1] to [0, 1] for plotting.
    """
    images = (images + 1) / 2.0  # Rescale from [-1, 1] to [0, 1]
    fig, axs = plt.subplots(n, n, figsize=(8, 8))
    for i in range(n * n):
        ax = axs[i // n, i % n]
        ax.imshow(images[i])
        ax.axis('off')
    fig.savefig(f"gan_images/generated_plot_epoch-{epoch+1:03d}.png")
    plt.close(fig)


# We'll use a fixed random seed to see how the same starting noise evolves over time.
# This makes it easier to judge the generator's progress.
fixed_seed = tf.random.normal([16, LATENT_DIM])

print("✅ Helper function for saving images created.")

## <center>**Step 10: The Main Training Loop**</center>

***This is the final step where we bring everything together and start training. We loop for the specified number of `EPOCHS`. In each epoch, we iterate through every batch of our `train_dataset`, calling our `distributed_train_step` function to update the models based on their performance.***

***We'll print the average loss for both models at the end of each epoch and save our sample images every 5 epochs so we can watch our GAN learn to create faces in near real-time.***

In [None]:
def train_gan(dataset, epochs):
    """The main function to train the GAN models."""
    # Record the start time
    start = time.time()

    for epoch in range(epochs):
        epoch_start = time.time()
        
        # Initialize loss trackers for the epoch
        gen_loss_epoch = 0
        disc_loss_epoch = 0
        num_batches = 0

        # Iterate over each batch in the dataset.
        # The 'train_dataset' we created handles the batching automatically.
        for image_batch in tqdm(dataset, desc=f"Epoch {epoch + 1}/{epochs}"):
            # Execute one distributed training step and get the losses.
            g_loss, d_loss = distributed_train_step(image_batch)
            
            # Accumulate the losses
            gen_loss_epoch += g_loss
            disc_loss_epoch += d_loss
            num_batches += 1

        # --- End of Epoch ---
        # Calculate the average loss for the epoch
        avg_gen_loss = gen_loss_epoch / num_batches
        avg_disc_loss = disc_loss_epoch / num_batches
        
        epoch_time = time.time() - epoch_start
        
        print(f"Time for epoch {epoch + 1} is {epoch_time:.2f} sec")
        print(f"    Generator Loss: {avg_gen_loss:.4f}, Discriminator Loss: {avg_disc_loss:.4f}")

        # Generate and save a grid of images every 5 epochs
        if (epoch + 1) % 5 == 0:
            print("    Generating and saving sample images...")
            # Generate images from the fixed seed
            predictions = generator(fixed_seed, training=False)
            # Save the plot
            save_plot(predictions, epoch)

    # --- End of Training ---
    total_time = time.time() - start
    print(f"\n🎉 Training finished in {total_time / 60:.2f} minutes. 🎉")


# --- Let's start the training! ---
print("Starting GAN training on the TPU...")
train_gan(train_dataset, EPOCHS)

## <center>**Step 11: Generate Final Images**</center>

***After the training is complete, the `generator` variable in our notebook holds the fully trained "artist." We can now use it to create new, unique faces on demand.***

***This cell generates a 5x5 grid of completely new faces by feeding brand new random noise vectors into the generator, showcasing the final result of our training process.***

In [None]:
# Create a new plot to display the final results
fig, axs = plt.subplots(5, 5, figsize=(12, 12))
fig.suptitle("Newly Generated Faces from Trained Model", fontsize=20)

for i in range(5 * 5):
    # 1. Generate a new random noise vector
    # We use tf.random.normal to create a single noise vector of size LATENT_DIM
    noise = tf.random.normal([1, LATENT_DIM])
    
    # 2. Pass the noise to the generator. `training=False` is important here.
    generated_image = generator(noise, training=False)
    
    # 3. Post-process the image for display
    # The generator outputs pixel values from [-1, 1]. We rescale them to [0, 1]
    # so matplotlib can display the image correctly.
    img_display = (generated_image[0].numpy() + 1) / 2.0
    
    # 4. Plot the image
    ax = axs[i // 5, i % 5]
    ax.imshow(img_display)
    ax.axis('off')

plt.show()

## <center>**Step 12: Save the Final Model**</center>

***Saving the model allows us to load it later in a different notebook or application for inference, so we can generate new faces anytime without having to retrain the model from scratch.***

In [None]:
generator.save('face_generator_model.h5')