## Generative Adversarial Networks

Generative Adversarial Networks (or GANs) are neural networks that contain two components: The Generator and the Discrinator.

First, the GAN receives an original dataset of genuine data.

The generator takes in random noise and tries to turn them into observations that look like they were from the original dataset. The discriminator, on the other hand, takes in observations and tries to predict whether they are from the original dataset or if it was a "forgery" created by the generator. This create-and-evaluate process is used to create better forgeries from the generator, and likewise for the discriminator to get better at identifying forgeries.

In more practical terms:
* A original dataset may contain images of cats
* The generator is passed in random noise as an input, and tries to turn that noise into cat images ("forgeries")
* The discriminator tries to determine whether the image was real or forged
* The generator and the discriminator both get better at their jobs

In [5]:
import numpy as np
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras import (
    layers,
    models,
    callbacks,
    losses,
    utils,
    metrics,
    optimizers,
)

IMAGE_SIZE = 64
CHANNELS = 1
BATCH_SIZE = 128
Z_DIM = 100
EPOCHS = 300
LOAD_MODEL = False
ADAM_BETA_1 = 0.5
ADAM_BETA_2 = 0.999
LEARNING_RATE = 0.0002
NOISE_PARAM = 0.1

Using Colab cache for faster access to the 'lego-brick-images' dataset.
Path to dataset files: /kaggle/input/lego-brick-images
Found 46384 files.


Download the lego brick images dataset. We're going to make images of bricks.

See https://www.kaggle.com/datasets/joosthazelzet/lego-brick-images

NOTE: If you're running this in Colab, you'll need to choose one of these options
to get the dataset.

1. Upload your image data to a directory via the folder icon on the left sidebar.
2. Use the Terminal at the bottom of colab to `curl` the dataset into the directory.
$ curl -L -o ~/Downloads/lego-brick-images.zip https://www.kaggle.com/api/v1/datasets/download/joosthazelzet/lego-brick-images
3. import it via kagglehub library (below)

In [None]:
import kagglehub
path = kagglehub.dataset_download("joosthazelzet/lego-brick-images")
print("Path to dataset files:", path)

train_data = utils.image_dataset_from_directory(
    path,
    labels=None,
    color_mode="grayscale",
    image_size=(IMAGE_SIZE, IMAGE_SIZE),
    batch_size=BATCH_SIZE,
    shuffle=True,
    seed= 0, # The seed parameter in tf.keras.utils.image_dataset_from_directory is used to ensure reproducibility when shuffling the data.
    interpolation="bilinear", # The algorithm to use when resizing images
)

The images, because they use pixel values, are in the scale of 0 to 255. In order to make them more compatible with activation functions, we need to process them into the range of -1 to 1.

In [None]:
def preprocess(img):
  img = tf.cast(img, tf.float32)
  img = (img - 127.5) / 127.5 # The result is in the range [-1, 1] so we can use tanh activation functions on it
  return img

train_data = train_data.map(lambda x: preprocess(x))

And now we make the actual discriminator... To understand this, you must understand the **Hierarchy of Features**. A neural network doesn't see a "face" or a "cat" immediately. It builds that understanding up from the bottom, layer by layer.

The seemingly random combinaton of layers has some logic to it that can be
explained with this Hierarchy of Features. Below is an example of what this
hierarchy can be:

### Step 1: The Initial "Glance" (Low-Level Features)
The model takes the raw pixel input and looks for the most basic building blocks of an image: edges, contrast, and simple lines. e.g. This is the Art Critic glancing at the canvas to check the brushwork. Is the image blurry? Are the lines sharp?


In [None]:
discriminator_input = layers.Input(shape=(64, 64, 1))

x = layers.Conv2D(
    64,                  # Learn 64 basic filters (vertical lines, horizontal lines, curves)
    kernel_size=4,       # Look at a 4x4 pixel patch at a time
    strides=2,           # Move 2 steps over. This shrinks the image size by half (64 -> 32)
    padding="same",      # Keep the edges tidy
    use_bias=False       # Bias is not strictly necessary here
)(discriminator_input)

x = layers.LeakyReLU(0.2)(x) # Activation: "Light up" the neurons that found a match (e.g., found a line).
x = layers.Dropout(0.3)(x)   # Forget 30% of what you saw. Prevents the Critic from memorizing exact pixel values.

## Step 2: Pattern Recognition (Mid-Level Features)

The network combines the lines and edges from Step 1 to form textures and shapes. It might find a circle, a corner, or a grid pattern.

e.g. The Critic steps back. "Okay, the lines are sharp. Now, do I see actual shapes? Is that a curve representing an eye, or just random noise?"

In [None]:
# Input is now 32x32.
# We DOUBLE the filters (64 -> 128). Why?
# As the image gets smaller physically, the information gets DENSE.
# We need more "drawers" (filters) to file away these complex combinations of edges.
x = layers.Conv2D(
    128, kernel_size=4, strides=2, padding="same", use_bias=False)(x)

x = layers.BatchNormalization()(x) # New addition! Normalizes data to keep training stable.
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)

## Step 3: Part Detection (High-Level Features)

The network combines shapes to find object parts. It looks for specific components relevant to your dataset. For example, if you are training on faces, this layer looks for noses, eyes, and mouths. In our case, we're training on
Lego brick images, so we're finding features that make up Lego bricks, like
studs and corners. This is where the Discriminator identifies the "uncanny valley". A fake brick may look correct in its individual parts, but the overall geometry is subtly wrong or "impossible."

In [None]:
# Input is now 16x16.
# Double filters again (128 -> 256).
x = layers.Conv2D(
    256, kernel_size=4, strides=2, padding="same", use_bias=False)(x)

x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)

## Step 4: Global Features

This is the final, most abstract stage of feature extraction. The network is no longer looking for parts, but verifying the overall quality and plausibility of the complete object. Perhaps we're checking for the overall smoothness and
glossiness of a Lego brick.

In [None]:
x = layers.Conv2D(512, kernel_size=4, strides=2, padding="same", use_bias=False)(x)
x = layers.BatchNormalization()(x)
x = layers.LeakyReLU(0.2)(x)
x = layers.Dropout(0.3)(x)

## Step 5: Verdict

This final step collapses the vast amount of accumulated information into the single answer the Discriminator must provide. By using a 4x4 kernel on a 4x4 input, we perform a total collapse, resulting in a single 1 x 1 x 1 number.

So we output a single number between 0 and 1, where a high value indicates "Real Lego Brick" and a low value indicates "Fake Generator Image."

In [None]:
# Collapsing all 512 features into a single decision score
x = layers.Conv2D(1, kernel_size=4, strides=1, padding="valid", use_bias=False)(x)
discriminator_output = layers.Flatten()(x)

Note that the above hierarchy of features may not be the exact and fixed hierarchy that the network learns.' Instaed, we're trying to represent an
abstraction that emerges from a learning process, and is supported by
technques like visualization (which show that early layers often do detect simple edges). But the *exact* features and their sequences can change slightly from run-to-run due to the randomness of weights and the random nature of the training process.

So consider the above hierarchy of features as more of an interpretation.