<a href="https://colab.research.google.com/github/RDGopal/IB9AU-2026/blob/main/SD1_Auto_Encoder_Illustration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Auto Encoding with images

Autoencoders are a type of neural network used for unsupervised learning, specifically for learning efficient data codings (representations). They aim to reconstruct their inputs, which means they learn to copy their input to their output. Internally, they have a hidden layer that describes a code used to represent the input. The network consists of two parts: an encoder, which compresses the input into a latent-space representation, and a decoder, which reconstructs the input from this latent-space representation. This notebook demonstrates a simple autoencoder using the MNIST dataset.


This cell imports the necessary libraries for building and training our autoencoder. `numpy` is for numerical operations, `matplotlib.pyplot` for plotting, and `tensorflow.keras` for building the neural network.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, losses

Here, we define key configuration parameters such as the size of the latent space (`LATENT_DIM_AE`), the flattened image size (`IMAGE_SIZE`), and training parameters (`EPOCHS`, `BATCH_SIZE`). We then load the MNIST dataset, which consists of handwritten digit images. For autoencoders, we only need the image data (`x_train`, `x_test`) as the input is also the target output. The pixel values are normalized to a `[0, 1]` range and the 28x28 images are flattened into a 784-dimensional vector.

In [None]:
# Configuration ---
LATENT_DIM_AE = 32  # Size of the compressed representation
IMAGE_SIZE = 784   # 28x28 pixels flattened
EPOCHS = 10        # Number of training epochs (keep low for quick demo)
BATCH_SIZE = 128   # Number of samples per gradient update

# Load and Preprocess MNIST Data ---
print("Loading MNIST data...")
(x_train, _), (x_test, _) = keras.datasets.mnist.load_data() # We don't need labels (y) for standard AE

# Normalize pixel values to [0, 1] and flatten images
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), IMAGE_SIZE))
x_test = x_test.reshape((len(x_test), IMAGE_SIZE))
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")

This section defines the architecture of our autoencoder. It's composed of three main parts:

*   **Encoder:** Takes the `IMAGE_SIZE` input, passes it through a dense layer, and compresses it into the `LATENT_DIM_AE` (latent space) representation. This bottleneck forces the encoder to learn a compressed, meaningful representation of the input data.
*   **Decoder:** Takes the `LATENT_DIM_AE` latent-space representation, expands it through a dense layer, and reconstructs the original `IMAGE_SIZE` output. The `sigmoid` activation ensures the output pixel values are also within the `[0, 1]` range.
*   **Autoencoder Model:** Combines the encoder and decoder. It takes the original input, encodes it, and then decodes the latent representation back into a reconstruction. The model is then compiled with the `adam` optimizer and `BinaryCrossentropy` loss function, suitable for reconstructing pixel values between 0 and 1.

In [None]:
# Build the Standard Autoencoder (AE) ---

# Encoder Network
# Input -> Dense Layer -> Bottleneck (Latent Space)
ae_encoder_inputs = keras.Input(shape=(IMAGE_SIZE,), name='encoder_input')
x = layers.Dense(128, activation='relu')(ae_encoder_inputs)
ae_latent_space = layers.Dense(LATENT_DIM_AE, activation='relu', name='ae_latent_space')(x)
ae_encoder = models.Model(ae_encoder_inputs, ae_latent_space, name='ae_encoder')
print("\nEncoder Architecture:")
ae_encoder.summary()

# Decoder Network
# Bottleneck (Latent Space) -> Dense Layer -> Output (Reconstruction)
ae_latent_inputs = keras.Input(shape=(LATENT_DIM_AE,), name='decoder_input')
x = layers.Dense(128, activation='relu')(ae_latent_inputs)
ae_outputs = layers.Dense(IMAGE_SIZE, activation='sigmoid', name='decoder_output')(x) # Sigmoid for [0,1] pixel output
ae_decoder = models.Model(ae_latent_inputs, ae_outputs, name='ae_decoder')
print("\nDecoder Architecture:")
ae_decoder.summary()

# Autoencoder Model (Connecting Encoder and Decoder)
ae_model_outputs = ae_decoder(ae_encoder(ae_encoder_inputs))
autoencoder = models.Model(ae_encoder_inputs, ae_model_outputs, name='autoencoder')
print("\nFull Autoencoder Architecture:")
autoencoder.summary()

# --- 3. Compile the Autoencoder ---
# Use binary_crossentropy loss for comparing pixel probabilities (values between 0 and 1)
autoencoder.compile(optimizer='adam', loss=losses.BinaryCrossentropy())


This cell trains the autoencoder model using the `x_train` data as both input and target output. The model learns to reconstruct the input images over a specified number of `EPOCHS`, with `validation_data` used to monitor performance on unseen data (`x_test`). The `loss` values indicate how well the autoencoder is performing its reconstruction task, with lower values being better.

In [None]:
# Train the Autoencoder ---
print("\n--- Training Autoencoder ---")
history_ae = autoencoder.fit(x_train, x_train, # Input data is used as both input and target output
                             epochs=EPOCHS,
                             batch_size=BATCH_SIZE,
                             shuffle=True,
                             validation_data=(x_test, x_test), # Evaluate on test set
                             verbose=1) # Set verbose=1 or 2 to see progress


Finally, this cell visualizes the autoencoder's performance. The `plot_reconstructions` function takes a few images from the test set, encodes them into their latent representation, and then decodes them back into reconstructed images. By comparing the original and reconstructed images side-by-side, we can qualitatively assess how well the autoencoder has learned to compress and decompress the image data.

In [None]:
# Visualize Results ---
print("\n--- Visualizing AE Reconstructions ---")

def plot_reconstructions(encoder, decoder, x_data, n=10):
    """Plots original and reconstructed images."""
    # Encode and decode some digits
    encoded_imgs = encoder.predict(x_data[:n])
    decoded_imgs = decoder.predict(encoded_imgs)

    plt.figure(figsize=(20, 4))
    plt.suptitle("AE: Original vs Reconstructed Images", fontsize=16)
    for i in range(n):
        # Display original image
        ax = plt.subplot(2, n, i + 1)
        plt.imshow(x_data[i].reshape(28, 28), cmap='gray')
        plt.title("Original")
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

        # Display reconstructed image
        ax = plt.subplot(2, n, i + 1 + n)
        plt.imshow(decoded_imgs[i].reshape(28, 28), cmap='gray')
        plt.title("Reconstructed")
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)
    plt.show()

# Select some images from the test set to display
plot_reconstructions(ae_encoder, ae_decoder, x_test, n=10)

print("\nStandard Autoencoder program finished!")

# Visualizing the Latent Vector
Let usv isualize the 32-dimensional latent vectors generated by the autoencoder for several examples of each MNIST digit (0-9) from the test set, displaying the original image alongside its corresponding latent vector.

Reload the MNIST dataset to obtain the `y_test` labels, which are necessary to categorize the latent vectors by digit.


In [None]:
print("Reloading MNIST data to get y_test...")
(_, _), (_, y_test) = keras.datasets.mnist.load_data() # Only interested in y_test

print(f"y_test shape: {y_test.shape}")
print(f"First 5 y_test labels: {y_test[:5]}")

## Generate Latent Vectors

Use the previously trained `ae_encoder` model to transform the `x_test` images into their 32-dimensional latent representations. This will give us the 'bottleneck' vectors for each test image.


In [None]:
print("Generating latent vectors for x_test using ae_encoder...")
x_test_encoded = ae_encoder.predict(x_test)
print(f"Shape of generated latent vectors (x_test_encoded): {x_test_encoded.shape}")
print(f"First 5 latent vectors:\n{x_test_encoded[:5]}")

The following shows the visualization function that displays an original image and its corresponding 32-dimensional latent vector. This function will be used to show how each digit is represented in the latent space.



In [None]:
import matplotlib.pyplot as plt

def plot_latent_representation(original_image, latent_vector, digit, index):
    """Plots the original image and its 32-dimensional latent vector."""
    fig, axes = plt.subplots(1, 2, figsize=(10, 4))
    fig.suptitle(f"Digit: {digit}, Sample: {index}", fontsize=16)

    # Plot original image
    axes[0].imshow(original_image.reshape(28, 28), cmap='gray')
    axes[0].set_title('Original Image')
    axes[0].axis('off')

    # Plot latent vector
    axes[1].bar(range(len(latent_vector)), latent_vector)
    axes[1].set_title('32-dim Latent Vector')
    axes[1].set_xlabel('Latent Dimension')
    axes[1].set_ylabel('Activation Value')
    axes[1].set_xticks([]) # Remove x-axis ticks for cleaner visualization of the vector
    axes[1].set_ylim(0, max(latent_vector) * 1.1) # Set y-limit based on max value for consistent scaling
    plt.tight_layout(rect=[0, 0.03, 1, 0.95])
    plt.show()

print("Defined plot_latent_representation function.")

For each digit, we will select a few examples from `x_test` and their corresponding latent vectors from `x_test_encoded`. Then, we use the `plot_latent_representation` function to visualize each original image alongside its latent vector.



In [None]:
print("Visualizing latent representations for each digit...")

# Number of samples to show for each digit
samples_per_digit = 3

# Get unique digits
unique_digits = np.unique(y_test)

for digit in unique_digits:
    print(f"\n--- Displaying samples for Digit: {digit} ---")
    # Find indices for the current digit
    indices = np.where(y_test == digit)[0]

    # Select a few random samples for visualization
    # Ensure we don't try to select more samples than available
    num_samples_to_show = min(samples_per_digit, len(indices))
    if num_samples_to_show == 0:
        print(f"No samples found for digit {digit}")
        continue

    # Randomly pick indices to ensure variety if samples_per_digit is small
    # For larger samples_per_digit, it might be better to pick first few for consistency
    chosen_indices = np.random.choice(indices, num_samples_to_show, replace=False)

    for i, idx in enumerate(chosen_indices):
        original_image = x_test[idx]
        latent_vector = x_test_encoded[idx]
        plot_latent_representation(original_image, latent_vector, digit, i + 1)

print("Visualization complete.")

## Summary of Latent Space Representation

*   **Digit-Specific Patterns**:
    *   Some digits might show a strong activation in one particular dimension, while others show distributed activations across several dimensions.
    *   The overall `shape` of the bar chart (the pattern of activated and inactive dimensions) appears to be somewhat consistent within samples of the same digit, suggesting that the autoencoder has learned characteristic features for each digit.

*   **Variability within a Digit**: Even within the same digit, there is some variability in the latent vectors. This is expected, as different handwritten samples of the same digit will have slight variations in style, thickness, and orientation. The autoencoder captures these nuances, but the core 'digit identity' is likely preserved by a consistent underlying pattern in the latent space.

In conclusion, the autoencoder effectively transforms complex image data into a compressed, digit-specific latent representation, although the exact semantic meaning of each latent dimension remains abstract.