### Importing Libraries

In [1]:
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torchvision import transforms
from torchvision.utils import save_image

#### Defining Hyper parameters

**Hyperparameters** in Variable Autoencoders: Hyperparameters are configurable variables set prior to training a Variable Autoencoder model. Their values influence the overall architecture and functioning of the model.

  



**Role of Hyperparameters**: Setting appropriate hyperparameter values is crucial because they affect the model’s ability to learn and generalize effectively. For instance, choosing the right number of hidden layers, neurons per layer, encoding dimensions, and decoding dimensions can impact the model’s capacity to represent complex patterns in the data. Additionally, selecting suitable learning rates, batch sizes, activation functions, and optimization algorithms contribute to efficient training and optimal convergence.



**Impact of Hyperparameters:** Effective tuning of hyperparameters leads to improved performance and better representation learning abilities in Variable Autoencoder models. On the contrary, poorly chosen hyperparameters might lead to underfitting, overfitting, slow convergence, or even failure to train the model. Therefore, understanding their role and proper selection is essential for achieving satisfactory results when working with Variable Autoencoders.

In [2]:
# Define hyperparameters
image_size = 784
hidden_dim = 400
latent_dim = 20
batch_size = 128
epochs = 10

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='../../data',
                                           train=True,
                                           transform=transforms.ToTensor(),
                                           download=True)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_dataset = torchvision.datasets.MNIST(root='../../data',
                                          train=False,
                                          transform=transforms.ToTensor())

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)

# Create directory to save the reconstructed and sampled images (if directory not present)
sample_dir = 'results'
if not os.path.exists(sample_dir):
    os.makedirs(sample_dir)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to ../../data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 16397628.39it/s]


Extracting ../../data/MNIST/raw/train-images-idx3-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 513868.21it/s]


Extracting ../../data/MNIST/raw/train-labels-idx1-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 4469853.48it/s]


Extracting ../../data/MNIST/raw/t10k-images-idx3-ubyte.gz to ../../data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 503: Service Unavailable

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 5190879.77it/s]

Extracting ../../data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ../../data/MNIST/raw






### Variable Auto Encoder Model

**Autoencoder models** are a type of neural network designed primarily for unsupervised learning. They consist of two main components: an encoder and a decoder. The primary goal of an Autoencoder is to learn efficient representations (compress data) by reconstructing the original inputs from the encoded versions.

The encoder maps the input data into a lower-dimensional latent space called **the bottleneck or encoding layer**. It compresses the information present in the input data while retaining important features. The decoder then takes the compressed representation from the encoder and attempts to reconstruct the original input.

Autoencoders have several applications, including **dimensionality reduction, denoising data, feature extraction, anomaly detection, and generative modeling**.

In dimensionality reduction, the encoder learns a compact representation of the input data, reducing the number of features without losing significant information. Denoising involves removing noise from corrupted data by learning clean representations. Feature extraction refers to extracting meaningful features from raw data, allowing easier interpretation and further use in supervised learning tasks. Anomaly detection identifies unusual data points based on differences between the original input and the reconstruction provided by the Autoencoder. Lastly, generative modeling allows creating new samples similar to existing ones by feeding random noise to the encoder and letting the decoder generate outputs.

Variable Autoencoders extend traditional Autoencoders by introducing variability in the latent space, enabling more flexible representation learning. By adding stochastic elements to the encoder and decoder networks, Variable Autoencoders allow capturing multiple modes within the data distribution. This property makes them particularly useful for handling nonlinear relationships and complex distributions found in real-world data.

Autoencoders are a powerful type of neural network architecture used by big tech companies for various applications. Here are some key areas where you'll find them being used, along with real-world examples:

1. Anomaly Detection:

Concept: Autoencoders are trained to reconstruct their input data. When they encounter a data point with significant reconstruction error (i.e., the recreated version deviates heavily from the original), it's likely an anomaly.
Real-world Example: Netflix: Netflix uses autoencoders to detect anomalies in user viewing patterns. If a user's viewing habits suddenly deviate from their usual preferences, it could indicate fraudulent account access. This can help Netflix identify and address potential security issues.
2. Data Compression:

Concept: Autoencoders can learn a compressed representation of the input data while retaining essential features. This compressed data can be used for storage optimization or faster transmission.
Real-world Example: Facebook: Facebook utilizes autoencoders to compress images and videos uploaded to their platform. This allows them to store and transmit this data more efficiently, reducing infrastructure costs and improving user experience by enabling faster loading times.
3. Feature Extraction:

Concept: The bottleneck layer of an autoencoder, where the compressed representation resides, often captures the most critical features of the input data. This makes it valuable for feature extraction tasks used in various machine learning models.
Real-world Example: Amazon: Amazon might use autoencoders to extract key features from product images on their platform. These features can then be fed into recommendation algorithms to suggest relevant products to customers based on their past purchases or browsing behavior.
4. Image Denoising:

Concept: Autoencoders can be trained to remove noise from images while preserving the underlying structure. This is achieved by learning a mapping from noisy images to their clean counterparts.
Real-world Example: Google Photos: Google Photos potentially leverages autoencoders to improve the quality of user-uploaded images. By removing noise caused by compression or camera limitations, autoencoders can enhance the visual clarity of photos stored on Google's platform.
5. Generative Tasks:

Concept: Variational Autoencoders (VAEs) are a type of autoencoder that can be used to generate new data similar to the training data. This has applications in areas like image generation or product recommendation.
Real-world Example: Pinterest: Pinterest could potentially use VAEs to generate new image recommendations for users based on their browsing history and saved pins. This allows Pinterest to personalize the user experience and keep users engaged by suggesting visually interesting content.
These are just a few examples, and big tech companies are constantly exploring new applications for autoencoders. As machine learning models become more complex and data volumes continue to grow, autoencoders are expected to play an increasingly important role in various data processing tasks.



In [3]:
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()

        self.fc1 = nn.Linear(image_size, hidden_dim)
        self.fc2_mean = nn.Linear(hidden_dim, latent_dim)
        self.fc2_logvar = nn.Linear(hidden_dim, latent_dim)
        self.fc3 = nn.Linear(latent_dim, hidden_dim)
        self.fc4 = nn.Linear(hidden_dim, image_size)

    def encode(self, x):
        h = F.relu(self.fc1(x))
        mu = self.fc2_mean(h)
        log_var = self.fc2_logvar(h)
        return mu, log_var

    def reparameterize(self, mu, logvar):
        std = torch.exp(logvar/2)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        h = F.relu(self.fc3(z))
        out = torch.sigmoid(self.fc4(h))
        return out

    def forward(self, x):
        # x: (batch_size, 1, 28,28) --> (batch_size, 784)
        mu, logvar = self.encode(x.view(-1, image_size))
        z = self.reparameterize(mu, logvar)
        reconstructed = self.decode(z)
        return reconstructed, mu, logvar

# Define model and optimizer
model = VAE().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)


In [4]:
# Define Loss
def loss_function(reconstructed_image, original_image, mu, logvar):
    # Binary Cross Entropy
    bce = F.binary_cross_entropy(reconstructed_image, original_image.view(-1, 784), reduction = 'sum')
    # kld = torch.sum(0.5 * torch.sum(logvar.exp() + mu.pow(2) - 1 - logvar, 1))
    # K L Divergence term
    kld = 0.5 * torch.sum(logvar.exp() + mu.pow(2) - 1 - logvar)
    return bce + kld


In [5]:
# Train function
def train(epoch):
    model.train()
    train_loss = 0
    for i, (images, _) in enumerate(train_loader):
        images = images.to(device)
        reconstructed, mu, logvar = model(images)
        loss = loss_function(reconstructed, images, mu, logvar)
        optimizer.zero_grad()
        loss.backward()
        train_loss += loss.item()
        optimizer.step()

        if i % 100 == 0:
            print("Train Epoch {} [Batch {}/{}]\tLoss: {:.3f}".format(epoch, i, len(train_loader), loss.item()/len(images)))

    print('=====> Epoch {}, Average Loss: {:.3f}'.format(epoch, train_loss/len(train_loader.dataset)))



In [6]:

# Test function
def test(epoch):
    model.eval()
    test_loss = 0
    with torch.no_grad():
        for batch_idx, (images, _) in enumerate(test_loader):
            images = images.to(device)
            reconstructed, mu, logvar = model(images)
            test_loss += loss_function(reconstructed, images, mu, logvar).item()
            if batch_idx == 0:
                comparison = torch.cat([images[:5], reconstructed.view(batch_size, 1, 28, 28)[:5]])
                save_image(comparison.cpu(), 'results/reconstruction_' + str(epoch) + '.png', nrow = 5)

    print('=====> Average Test Loss: {:.3f}'.format(test_loss/len(test_loader.dataset)))

In [7]:

# Main function
for epoch in range(1, epochs + 1):
    train(epoch)
    test(epoch)
    with torch.no_grad():
        # Get rid of the encoder and sample z from the gaussian ditribution and feed it to the decoder to generate samples
        sample = torch.randn(64,20).to(device)
        generated = model.decode(sample).cpu()
        save_image(generated.view(64,1,28,28), 'results/sample_' + str(epoch) + '.png')

Train Epoch 1 [Batch 0/469]	Loss: 547.280
Train Epoch 1 [Batch 100/469]	Loss: 183.040
Train Epoch 1 [Batch 200/469]	Loss: 151.355
Train Epoch 1 [Batch 300/469]	Loss: 137.569
Train Epoch 1 [Batch 400/469]	Loss: 134.028
=====> Epoch 1, Average Loss: 163.875
=====> Average Test Loss: 128.640
Train Epoch 2 [Batch 0/469]	Loss: 128.948
Train Epoch 2 [Batch 100/469]	Loss: 120.071
Train Epoch 2 [Batch 200/469]	Loss: 125.021
Train Epoch 2 [Batch 300/469]	Loss: 120.767
Train Epoch 2 [Batch 400/469]	Loss: 121.096
=====> Epoch 2, Average Loss: 122.219
=====> Average Test Loss: 116.592
Train Epoch 3 [Batch 0/469]	Loss: 119.334
Train Epoch 3 [Batch 100/469]	Loss: 115.349
Train Epoch 3 [Batch 200/469]	Loss: 115.099
Train Epoch 3 [Batch 300/469]	Loss: 113.851
Train Epoch 3 [Batch 400/469]	Loss: 112.155
=====> Epoch 3, Average Loss: 114.947
=====> Average Test Loss: 112.369
Train Epoch 4 [Batch 0/469]	Loss: 115.396
Train Epoch 4 [Batch 100/469]	Loss: 110.299
Train Epoch 4 [Batch 200/469]	Loss: 114.632
