## Week 3 : Autoencoders
```
- Generative Artificial Intelligence (Spring semester 2025)
- Professor: Muhammad Fahim
- Teaching Assistant: Ahmad Taha
```
<hr>

## Content
In this lab we will cover the following topics:
```
1. Types of autoencoders
2. Applications of autoencoders
3. Autoencoders training procedure
4. Reparametrisation trick

```

## Undercomplete & Overcomplete

PCA vs. Undercomplete autoencoders
* Autoencoders are much flexible than PCA.
* Neural Network activation functions introduce “non-linearities” in encoding, but PCA only linear transformation.

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchsummary import summary
from torch.utils.data import TensorDataset, DataLoader

import torchvision
import torchvision.transforms as transforms


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

## Defining Undercomplete Autoencoder

In [3]:
## Undercomplete
class autoencoder(nn.Module):
    def __init__(self, input_size, latent_dim):
      super(autoencoder, self).__init__()
      # Step 1 : Define the encoder
      self.encoder = nn.Sequential(
            nn.Linear(input_size, 32),
            nn.ReLU(),
            nn.Linear(32, latent_dim)
        )
      # Step 2 : Define the decoder
      self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 128),
            nn.ReLU(),
            nn.Linear(128, input_size),
            nn.Sigmoid()  # Normalizes the output to [0, 1]
        )
      # Step 3 : Initialize the weights (optional)
      self.apply(self.__init_weights)

    def forward(self, x):
      # Step 1: Pass the input through encoder to get latent representation
      latent = self.encoder(x)
      # Step 2: Take latent representation and pass through decoder
      reconstructed = self.decoder(latent)
      return reconstructed
      # x = x + noise
      # x = None
      # return x

    def encode(self,input):
      #Step 1: Pass the input through the encoder to get latent representation
      return self.encoder(input)

    def __init_weights(self,m):
      #Init the weights (optional)
      if type(m) == nn.Linear:
          torch.nn.init.xavier_uniform_(m.weight)
          m.bias.data.fill_(0.01)

## Define training parameters

```
Step 1: Set training parameters (batch size, learning rate, optimizer, number of epochs, loss function)
Step 2: Create dataset (Randomly generated)
Step 3: Create data loader
Step 4: Define the training loop
```

In [4]:
batchSize = 100
learning_rate = 0.01
num_epochs = 20
sample = torch.randn((batchSize,1,64))
AE = autoencoder(64,5).to(device)
print(AE)

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(AE.parameters(),lr=learning_rate)

#Create a random dataset
data_loader = DataLoader(TensorDataset(torch.randn((1000,1,64))),batch_size=32,shuffle=True)

autoencoder(
  (encoder): Sequential(
    (0): Linear(in_features=64, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=5, bias=True)
  )
  (decoder): Sequential(
    (0): Linear(in_features=5, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=64, bias=True)
    (3): Sigmoid()
  )
)


## Training Loop

In [5]:
for epoch in range(num_epochs):
    epoch_loss = 0.0
    for X in data_loader:
        X = X[0].to(device)

        optimizer.zero_grad()
        # forward
        output = AE(X)
        loss = criterion(output, X)

        # backward
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()

    # log
    print('epoch [{}/{}], loss:{:.4f}'.format(epoch + 1, num_epochs, loss.item()))

epoch [1/20], loss:1.0053
epoch [2/20], loss:1.0112
epoch [3/20], loss:0.9800
epoch [4/20], loss:0.9516
epoch [5/20], loss:1.0006
epoch [6/20], loss:1.0519
epoch [7/20], loss:0.9141
epoch [8/20], loss:1.0003
epoch [9/20], loss:0.9990
epoch [10/20], loss:0.9892
epoch [11/20], loss:1.0715
epoch [12/20], loss:0.9802
epoch [13/20], loss:0.9971
epoch [14/20], loss:0.9738
epoch [15/20], loss:0.9497
epoch [16/20], loss:0.9772
epoch [17/20], loss:0.9003
epoch [18/20], loss:1.0191
epoch [19/20], loss:0.8944
epoch [20/20], loss:1.0608


## Regularized Autoencoder

Regularized autoencoders use a loss function that encourages the model to have other properties besides the ability to copy its input to its output.

* **Sparse Autoencoders** : It impose a constraint in its loss by adding a regularization term in the loss function.
$$L(x,\hat{x}) + λ \sum_{i}|h_i|$$

  **Regularization Form** : It can be L1 regularization or Any other kinds of penalties are possible


* **Denoising Autoencoder** : a special autoencoder that is robust to noise. By adding stochastic noise, we force Autoencoder to learn more robust features

## Sparse Autoencoder

**Task**: implement a Sparse Autoencoder for 1D data of your choice

In [6]:
# TODO: implement a Sparse Autoencoder for 1D data of your choice

class SparseAutoencoder(nn.Module):
  def __init__(self, input_size, latent_dim):
    super(SparseAutoencoder, self).__init__()

    # Encoder: Compress input to latent space
    self.encoder = nn.Sequential(
        nn.Linear(input_size, 128),
        nn.ReLU(),
        nn.Linear(128, latent_dim),
        nn.ReLU()
    )

    # Decoder: Reconstruct input from latent space
    self.decoder = nn.Sequential(
        nn.Linear(latent_dim, 128),
        nn.ReLU(),
        nn.Linear(128, input_size),
        nn.Sigmoid()  # Normalize to [0, 1]
    )


  def foward(self, x):
    latent = self.encoder(x)  # Encode input
    reconstructed = self.decoder(latent)  # Decode latent representation
    return reconstructed, latent  # Return both reconstructed output and latent space


## Denoising Autoencoder

**Task** : implement a Denoising Autoencoder for CIFAR 10 dataset. Choose one class from the 10, Define and train the model

In [7]:
# TODO : implement a Denoising Autoencoder for CIFAR 10 dataset. Choose one class from the 10

class DenoisingAutoencoder(nn.Module):
  def __init__(self, input_size, latent_dim):
    super(DenoisingAutoencoder, self).__init__()

    # Encoder: Compress input to latent space
    self.encoder = nn.Sequential(
        nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1),
        nn.ReLU(),
        nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
        nn.ReLU(),
        nn.Conv2d(128, latent_dim, kernel_size=3, stride=2, padding=1),
        nn.ReLU()
    )

    # Decoder: Reconstruct input from latent space
    self.decoder = nn.Sequential(
        nn.ConvTranspose2d(latent_dim, 128, kernel_size=3, stride=2, padding=1, output_padding=1),
        nn.ReLU(),
        nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
        nn.ReLU(),
        nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
        nn.Sigmoid()  # Normalize to [0, 1]
    )

  def foward(self, x):
    latent = self.encoder(x)  # Encode input
    reconstructed = self.decoder(latent)  # Decode latent representation
    return reconstructed

### Get Image data (CIFAR 10 dataset)

In [8]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170M/170M [00:02<00:00, 81.4MB/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


## Variational Autoencoders

![caption](https://learnopencv.com/wp-content/uploads/2020/11/vae-diagram-1-1024x563.jpg)

In [9]:
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import transforms
from torchvision.utils import save_image

## Get data (MNIST)

In [10]:
# Hyper-parameters
image_size = 784
h_dim = 400
z_dim = 20
num_epochs = 15
batch_size = 128
learning_rate = 1e-3

# MNIST dataset
dataset = torchvision.datasets.MNIST(root='../../data',
                                     train=True,
                                     transform=transforms.ToTensor(),
                                     download=True)

# Data loader
data_loader = torch.utils.data.DataLoader(dataset=dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [Errno 111] Connection refused>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ../../data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9.91M/9.91M [00:00<00:00, 42.2MB/s]


Extracting ../../data/MNIST/raw/train-images-idx3-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [Errno 111] Connection refused>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28.9k/28.9k [00:00<00:00, 1.26MB/s]


Extracting ../../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
<urlopen error [Errno 111] Connection refused>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1.65M/1.65M [00:00<00:00, 11.0MB/s]


Extracting ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
<urlopen error [Errno 111] Connection refused>

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4.54k/4.54k [00:00<00:00, 8.54MB/s]

Extracting ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw






## Define VAE

In [11]:
# VAE model
class VAE(nn.Module):
  def __init__(self, image_size=784, h_dim=400, z_dim=20):
    super(VAE, self).__init__()
    # Encoder part
    self.fc1 = nn.Linear(image_size, h_dim)
    self.fc2 = nn.Linear(h_dim, z_dim)
    self.fc3 = nn.Linear(h_dim, z_dim)
    # Decoder part
    self.fc4 = nn.Linear(z_dim, h_dim)
    self.fc5 = nn.Linear(h_dim, image_size)

  def encode(self, x):
    h = F.relu(self.fc1(x))
    return self.fc2(h), self.fc3(h)

  def reparameterize(self, mu, log_var):
    std = torch.exp(log_var/2)
    eps = torch.randn_like(std)
    return mu + eps * std

  def decode(self, z):
    h = F.relu(self.fc4(z))
    return F.sigmoid(self.fc5(h))

  def forward(self, x):
    mu, log_var = self.encode(x)
    z = self.reparameterize(mu, log_var)
    x_reconst = self.decode(z)
    return x_reconst, mu, log_var

### Train Autoencoder

In [12]:
model = VAE(image_size, h_dim, z_dim).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [13]:
# Start training
mse_loss = nn.MSELoss()
for epoch in range(num_epochs):
    for i, (x, _) in enumerate(data_loader):
        # Forward pass
        x = x.to(device).view(-1, image_size)
        x_reconst, mu, log_var = model(x)

        # Compute reconstruction loss and kl divergence
        reconst_loss = mse_loss(x_reconst, x)
        kl_div = - 0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())

        # Backprop and optimize
        loss = reconst_loss + kl_div
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 10 == 0:
            print ("Epoch[{}/{}], Step [{}/{}], Reconst Loss: {:.4f}"
                   .format(epoch+1, num_epochs, i+1, len(data_loader), reconst_loss.item()))

    with torch.no_grad():
        # Save the sampled images
        z = torch.randn(batch_size, z_dim).to(device)
        out = model.decode(z).view(-1, 1, 28, 28)
        save_image(out,'./sampled-{}.png'.format(epoch+1))

        # Save the reconstructed images
        out, _, _ = model(x)
        x_concat = torch.cat([x.view(-1, 1, 28, 28), out.view(-1, 1, 28, 28)], dim=3)
        save_image(x_concat, './reconst-{}.png'.format(epoch+1))

Epoch[1/15], Step [10/469], Reconst Loss: 0.1220
Epoch[1/15], Step [20/469], Reconst Loss: 0.0863
Epoch[1/15], Step [30/469], Reconst Loss: 0.0700
Epoch[1/15], Step [40/469], Reconst Loss: 0.0705
Epoch[1/15], Step [50/469], Reconst Loss: 0.0682
Epoch[1/15], Step [60/469], Reconst Loss: 0.0699
Epoch[1/15], Step [70/469], Reconst Loss: 0.0669
Epoch[1/15], Step [80/469], Reconst Loss: 0.0659
Epoch[1/15], Step [90/469], Reconst Loss: 0.0714
Epoch[1/15], Step [100/469], Reconst Loss: 0.0717
Epoch[1/15], Step [110/469], Reconst Loss: 0.0661
Epoch[1/15], Step [120/469], Reconst Loss: 0.0691
Epoch[1/15], Step [130/469], Reconst Loss: 0.0698
Epoch[1/15], Step [140/469], Reconst Loss: 0.0683
Epoch[1/15], Step [150/469], Reconst Loss: 0.0680
Epoch[1/15], Step [160/469], Reconst Loss: 0.0689
Epoch[1/15], Step [170/469], Reconst Loss: 0.0687
Epoch[1/15], Step [180/469], Reconst Loss: 0.0682
Epoch[1/15], Step [190/469], Reconst Loss: 0.0691
Epoch[1/15], Step [200/469], Reconst Loss: 0.0703
Epoch[1/1

**Task :** Add tensorboard to log the encoder loss and weights

In [None]:
# TODO: Add tensorboard to log the encoder loss and weights
from torch.utils.tensorboard import SummaryWriter

writer = SummaryWriter('./logs')

for epoch in range(num_epochs):
    # Log loss after each epoch
    writer.add_scalar('Loss/train', loss.item(), epoch)

    # Add reconstructed images
    grid = torchvision.utils.make_grid(reconstructed_images)
    writer.add_image('Reconstructed Images', grid, epoch)

writer.close()


## Resources

* [Auto-Encoding Variational Bayes](https://arxiv.org/pdf/1312.6114.pdf)
* [Variational inference: A review for statisticians](https://arxiv.org/pdf/1601.00670.pdf)
* [Tutorial on variational autoencoders](https://arxiv.org/pdf/1606.05908.pdf)
* [Stochastic Backpropagation and Approximate Inference in Deep Generative Models](https://arxiv.org/pdf/1401.4082.pdf)