This repository contains a Jupyter Notebook (Variational_autoencoder.ipynb) that implements a Variational Autoencoder (VAE) from scratch using PyTorch. The model is trained on the MNIST dataset to learn a probabilistic latent space and generate new, synthetic images of handwritten digits.
This project demonstrates the complete workflow for building and training a VAE:
- Dataset: The MNIST dataset is loaded and pre-processed.
- Model Definition: A
VAEclass is defined with three main components:- An Encoder that compresses images into a probability distribution (mean and log-variance).
- A Reparameterization Trick to sample from this distribution.
- A Decoder that reconstructs images from the sampled latent vectors.
- Loss Function: A custom loss function is implemented that combines:
- Reconstruction Loss: Binary Cross Entropy (BCE) to make the output image look like the input.
- Kullback-Leibler (KL) Divergence: A regularization term that forces the latent space to resemble a standard normal distribution.
- Training: The model is trained to minimize this combined loss.
- Sampling: After training, new images are generated by sampling random vectors from the latent space and passing them through the decoder.
The VAE is built using simple Multi-Layer Perceptrons (MLPs).
The Encoder takes a
-
Input:
$784$ -dim vector (flattened image) - Layer 1: Linear (784 -> 512) + ReLU
- Layer 2: Linear (512 -> 256) + ReLU
-
Output: Two parallel Linear layers (256 -> 20) to output the mean (
mu) and log-variance (log_var) of the latent distribution.
To allow for backpropagation, we sample from the latent distribution
The Decoder takes a 20-dimensional latent vector (
-
Input:
$20$ -dim latent vector ($z$ ) - Layer 1: Linear (20 -> 256) + ReLU
- Layer 2: Linear (256 -> 512) + ReLU
- Output Layer: Linear (512 -> 784) + Sigmoid (to scale output pixels between 0 and 1).
The total loss is the sum of the Reconstruction Loss and the KL Divergence:
- Reconstruction Loss: Binary Cross Entropy (BCE) is used to measure the difference between the original and reconstructed images.
-
KL Divergence: This term acts as a regularizer, pushing the learned latent distribution to be close to a standard normal distribution. It is calculated as:
$KLD = -0.5 \times \sum(1 + \log(\sigma^2) - \mu^2 - \sigma^2)$ .
The project uses the standard MNIST dataset.
- Images are transformed into PyTorch Tensors.
- Data is loaded in batches using
DataLoader.
-
Optimizer:
Adamwith a learning rate of1e-3. - Epochs: The model is trained for 20 epochs.
-
Sampling: After training,
$64$ random vectors are sampled from a standard normal distribution ($N(0, 1)$) and passed through the decoder to generate new images. -
Output: The generated samples are saved as
samples.png.
- Ensure you have the required libraries installed:
pip install torch torchvision numpy matplotlib
- Launch Jupyter Notebook:
jupyter notebook
- Open the
Variational_autoencoder.ipynbfile. - Run the cells sequentially from top to bottom. The notebook will:
- Download the MNIST dataset.
- Initialize the
VAEmodel, optimizer, and loss function. - Run the training loop for 20 epochs, printing the loss.
- Generate a
$8 \times 8$ grid of new digit images and save it assamples.png.