## PROBLEM SET 1 - TAKE AT HOME (25 POINTS)

**You will lose all corresponding points if we can't access the implementation notebook URL or the Github URL. We will NOT message you. Do NOT invite the TAs to your Github repo (option 2 below) EARLIER than the morning of the day of the exam, since invites expire in 7 days.** 

## Introduction

In the last few years, the field of generative modeling has seen a lot of progress. The techniques that have been developed in this field are very useful for other tasks such as semi-supervised learning, representation learning, and reinforcement learning. In this problem statement we attempt to prepare you to understand the basics of a technique that has been very successful in generative modeling: Variational Autoencoders (VAEs). Understanding VAEs is  the key for you to get a head start with more advanced models such as [Stable Diffusion that are all the rage these days](https://clipdrop.co/stable-diffusion-turbo). 



## Task 1: Study what VAEs are and how they work (0 points)

Consult this [blog post](https://jaan.io/what-is-variational-autoencoder-vae-tutorial/) and its [2D VAE autoencoder implemenation for the MNIST dataset](https://github.com/jaanli/variational-autoencoder). 



## Task 2
Implement the VAE model for the MNIST dataset and train it ensuring that you plot the loss curves for the training and validation sets and perform hyperparameter optimization on the size of the latent space as well as on the optimizer parameters. To all points you need to include as comments in the code or as markdown cells what each function in the code is doing. Don't be frugal in your commentary. (15 points)

Replicate the figure showing the final latent variable space shown below. Note that you will not produce the animated figure shown below but show only the final latent space $(z_1, z_2)$. (5 points)

Show VAE generated images for all digits 0-9 after model training. (5 points)

![](latent-variables.gif)

In [1]:
!pip install tensorflow numpy matplotlib imageio optuna

import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import imageio
import optuna
from tensorflow.keras import layers, callbacks
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from scipy.stats import norm


Collecting optuna
  Downloading optuna-3.5.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.0-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.8.0-py3-none-any.whl.metadata (10 kB)
Collecting sqlalchemy>=1.3.0 (from optuna)
  Downloading SQLAlchemy-2.0.23-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.0-py3-none-any.whl.metadata (2.9 kB)
Collecting greenlet!=0.4.17 (from sqlalchemy>=1.3.0->optuna)
  Downloading greenlet-3.0.2-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.metadata (3.7 kB)
Downloading optuna-3.5.0-py3-none-any.whl (413 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.4/413.4 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.13.0-py3-none-any.whl (230 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2

In [None]:
# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((x_train.shape[0], -1))
x_test = x_test.reshape((x_test.shape[0], -1))

original_dim = x_train.shape[1]

In [None]:
# Sampling function
def sampling(args):
    z_mean, z_log_var = args
    batch = tf.shape(z_mean)[0]
    dim = tf.shape(z_mean)[1]
    epsilon = tf.keras.backend.random_normal(shape=(batch, dim))
    return z_mean + tf.exp(0.5 * z_log_var) * epsilon

In [None]:
def build_vae(latent_dim, intermediate_dim, learning_rate):
    # Encoder
    inputs = layers.Input(shape=(original_dim,), name='encoder_input')
    x = layers.Dense(intermediate_dim, activation='relu')(inputs)
    z_mean = layers.Dense(latent_dim, name='z_mean')(x)
    z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)
    z = layers.Lambda(sampling, output_shape=(latent_dim,), name='z')([z_mean, z_log_var])

    # Decoder
    latent_inputs = layers.Input(shape=(latent_dim,), name='z_sampling')
    x = layers.Dense(intermediate_dim, activation='relu')(latent_inputs)
    outputs = layers.Dense(original_dim, activation='sigmoid')(x)

    # VAE model
    encoder = Model(inputs, [z_mean, z_log_var, z], name='encoder')
    decoder = Model(latent_inputs, outputs, name='decoder')
    outputs = decoder(encoder(inputs)[2])
    vae = Model(inputs, outputs, name='vae_mlp')

    # VAE loss
    reconstruction_loss = tf.keras.losses.binary_crossentropy(inputs, outputs)
    reconstruction_loss *= original_dim
    kl_loss = 1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var)
    kl_loss = tf.reduce_sum(kl_loss, axis=-1)
    kl_loss *= -0.5
    vae_loss = tf.reduce_mean(reconstruction_loss + kl_loss)
    vae.add_loss(vae_loss)

    # Compile VAE
    optimizer = Adam(learning_rate=learning_rate)
    vae.compile(optimizer=optimizer)
    return vae, encoder, decoder


In [None]:
def objective(trial):
    # Hyperparameters to be optimized
    latent_dim = trial.suggest_int('latent_dim', 2, 20)
    intermediate_dim = trial.suggest_int('intermediate_dim', 128, 1024)
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)

    # Build VAE model
    vae, _, _ = build_vae(latent_dim, intermediate_dim, learning_rate)

    # Early stopping callback
    early_stopping = callbacks.EarlyStopping(monitor='val_loss', patience=5)

    # Training the model
    history = vae.fit(
        x_train, x_train,
        epochs=50,  # Reduced for faster optimization
        batch_size=128,
        validation_data=(x_test, x_test),
        callbacks=[early_stopping],
        verbose=0
    )

    return min(history.history['val_loss'])

In [None]:
# Optuna study
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=10)

# Best hyperparameters
best_params = study.best_trial.params
vae, encoder, decoder = build_vae(best_params['latent_dim'], best_params['intermediate_dim'], best_params['learning_rate'])

# Train the model with best parameters
vae.fit(x_train, x_train, epochs=100, batch_size=128, validation_data=(x_test, x_test))

# Directory to save the plots
os.makedirs('latent_space_plots', exist_ok=True)
os.makedirs('generated_images_plots', exist_ok=True)

In [None]:
# Function to plot and save the latent space at a given epoch
def save_latent_space_plot(encoder, epoch, data, labels):
    z_mean, _, _ = encoder.predict(data)
    plt.figure(figsize=(12, 10))
    plt.scatter(z_mean[:, 0], z_mean[:, 1], c=labels, cmap='viridis')
    plt.colorbar()
    plt.xlabel('z[0]')
    plt.ylabel('z[1]')
    filename = f'latent_space_plots/latent_space_epoch_{epoch}.png'
    plt.savefig(filename)
    plt.close()
    return filename

In [None]:
# Function to plot and save the generated images from the latent space at a given epoch
def save_generated_images_plot(decoder, epoch, grid_size=15, figure_size=28):
    figure = np.zeros((figure_size * grid_size, figure_size * grid_size))
    # Use the norm.ppf function to get more interesting points in the latent space
    grid_x = norm.ppf(np.linspace(0.05, 0.95, grid_size))
    grid_y = norm.ppf(np.linspace(0.05, 0.95, grid_size))
    for i, yi in enumerate(grid_x):
        for j, xi in enumerate(grid_y):
            z_sample = np.array([[xi, yi]])
            x_decoded = decoder.predict(z_sample)
            digit = x_decoded[0].reshape(figure_size, figure_size)
            figure[i * figure_size: (i + 1) * figure_size,
                   j * figure_size: (j + 1) * figure_size] = digit
    filename = f'generated_images_plots/generated_images_epoch_{epoch}.png'
    plt.figure(figsize=(10, 10))
    plt.imshow(figure, cmap='Greys_r')
    plt.axis('off')
    plt.savefig(filename)
    plt.close()
    return filename

In [None]:
# Custom training loop for visualization
filenames = []
generated_images_filenames = []
for epoch in range(100):  # Adjust number of epochs if necessary
    if epoch % 10 == 0:
        latent_space_filename = save_latent_space_plot(encoder, epoch, x_test, y_test)
        filenames.append(latent_space_filename)
        generated_image_filename = save_generated_images_plot(decoder, epoch)
        generated_images_filenames.append(generated_image_filename)

# Creating GIFs
with imageio.get_writer('latent_space_evolution.gif', mode='I', loop=0) as writer:
    for filename in filenames:
        image = imageio.imread(filename)
        writer.append_data(image)

with imageio.get_writer('generated_images_evolution.gif', mode='I', loop=0) as writer:
    for filename in generated_images_filenames:
        image = imageio.imread(filename)
        writer.append_data(image)

You have two implementation options: (1) All in one notebook (2) Github repo. 

### Option 1: All in one Colab notebook

You submit a single Colab notebook URL that contains all the code and the outputs. Your notebook should be self-contained and can be launched in Google Colab using a corresponding button at the top of the notebook. See the regression notebook in the course site for an example of such button. You **need to save all outputs in the notebook** so that the TAs can check that your code is working properly. 

### Option 2: Github repo 

If you prefer to work without notebooks, with containers or not, you can submit the implementation python scripts and version control your code in a private Github repo. You have to submit the Github URL and clearly document the way to launch the runtime / install requirements.txt and include all required figures in the README.md file. 

IMPORTANT: Ensure that the github repo remains private. If you submit a public github repo you will be held responsible for violating the honor code.

2023-12-17 01:07:49.474336: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-17 01:07:51.470837: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-17 01:07:51.470945: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-17 01:07:51.768319: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-17 01:07:52.423872: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-17 01:07:52.424907: I tensorflow/core/platform/cpu_feature_guard.cc:1

Epoch 1/50


ValueError: in user code:

    File "/home/codespace/.python/current/lib/python3.10/site-packages/keras/src/engine/training.py", line 1401, in train_function  *
        return step_function(self, iterator)
    File "/home/codespace/.python/current/lib/python3.10/site-packages/keras/src/engine/training.py", line 1384, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/codespace/.python/current/lib/python3.10/site-packages/keras/src/engine/training.py", line 1373, in run_step  **
        outputs = model.train_step(data)
    File "/home/codespace/.python/current/lib/python3.10/site-packages/keras/src/engine/training.py", line 1150, in train_step
        y_pred = self(x, training=True)
    File "/home/codespace/.python/current/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "/tmp/__autograph_generated_filec1oprb1m.py", line 12, in tf__call
        z = ag__.converted_call(ag__.ld(self).reparameterize, (ag__.ld(z_mean), ag__.ld(z_log_var)), None, fscope)
    File "/tmp/__autograph_generated_fileot3sz2dl.py", line 10, in tf__reparameterize
        eps = ag__.converted_call(ag__.ld(tf).random.normal, (), dict(shape=ag__.ld(mean).shape), fscope)

    ValueError: Exception encountered when calling layer 'vae' (type VAE).
    
    in user code:
    
        File "/tmp/ipykernel_3803/1732829699.py", line 50, in call  *
            z = self.reparameterize(z_mean, z_log_var)
        File "/tmp/ipykernel_3803/1732829699.py", line 59, in reparameterize  *
            eps = tf.random.normal(shape=mean.shape)
    
        ValueError: Cannot convert a partially known TensorShape (None, 2) to a Tensor.
    
    
    Call arguments received by layer 'vae' (type VAE):
      • x=tf.Tensor(shape=(None, 784), dtype=float32)
