<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Variational AutoEncoders (VAE) in Genarative AI
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial'>A Variational Autoencoder (VAE) is a type of machine learning model designed to generate new data similar to what it has learned. Unlike traditional autoencoders, which compress data into fixed representations, VAEs encode data into probability distributions (mean and standard deviation). This means that instead of mapping an input to a single point, VAEs create a range of possible values and sample from this range to generate outputs.<br> This approach introduces randomness (stochasticity), allowing the model to create diverse and realistic outputs from the same input. This makes VAEs particularly useful for applications like image generation, data synthesis, and anomaly detection, where variety and flexibility are important.<br>
<ol style = 'font-size:16px;font-family:Arial'>Architecture of VAE:
<li>VAEs employ an encoder-decoder architecture, where the encoder transforms input data into a probability distribution in the latent space.</li>
<li>The latent code is a probabilistic encoding representing a distribution of potential representations, not just a single point.</li>
<li>The decoder reconstructs data from a sampled point in the latent distribution, refining parameters to minimize training reconstruction loss.</li>
<li >Training balances reconstruction loss and a regularization term, like Kullback-Leibler divergence, to shape the latent space distribution.</li>
<li>Adjusting parameters iteratively during training results in latent space representation, results in precise data reconstruction.</li>
    <center><img src="images/vae.png" alt="VAE"></center>
</ol><
<ol style = 'font-size:16px;font-family:Arial'>Usecases of VAE:
<li><b>Image Generation</b>: Generate new faces, fashion images, artwork, etc.</li>
<li><b>Anomaly Detection</b>: Identify unusual patterns in data (e.g., fraud detection).</li>
<li><b>Video Prediction</b>: Generate future frames based on past frames.               </li>
<li><b>Data Compression</b>: Compress images, audio, and text efficiently.              </li>
<li><b>Drug Discovery</b>: Generate new molecular structures in pharmaceuticals.        </li>
</ol>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial'>1. Configuring the environment</b>

<p style = 'font-size:18px;font-family:Arial'><b>1.1 Install the required libraries</b></p>

In [None]:
%%capture
!pip install torchvision

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Note: </b><i>Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
import matplotlib.pyplot as plt
import os
from PIL import Image

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<p style = 'font-size:18px;font-family:Arial'><b>1.3 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
execute_sql('''SET query_band='DEMO=VAE_Python.ipynb;' UPDATE FOR SESSION;''')

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial'>2. Load Image Dataset</b>

<p style = 'font-size:18px;font-family:Arial'><b>2.1 Setting device if GPU is available</b></p>

In [None]:
# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

<p style = 'font-size:18px;font-family:Arial'><b>2.2 Loading the Dataset</b></p>

In [None]:
# Custom Dataset Class
class XrayDataset(Dataset):
    def __init__(self, root, transform=None):
        self.root = root
        self.transform = transform
        self.images = [os.path.join(root, fname) for fname in os.listdir(root) if self.is_image_file(fname)]
        print(f"Found {len(self.images)} images in {root}")

    def is_image_file(self, filename):
        return any(filename.endswith(extension) for extension in ['jpeg', 'jpg', 'png'])

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        img_path = self.images[index]
        with open(img_path, 'rb') as f:
            img = Image.open(f).convert('L')  # Convert to grayscale
        if self.transform:
            img = self.transform(img)
        return img, 0  # Return a dummy label since we are not using labels

In [None]:
# Data loading
data_dir = 'xrays'
transform = transforms.Compose([
    transforms.Resize((128, 128)),  # Resize images
    transforms.ToTensor(),  # Convert to tensor
    # Keep pixel values in range [0, 1]
])

dataset = XrayDataset(root=data_dir, transform=transform)
train_loader = DataLoader(dataset, batch_size=64, shuffle=True)
print(f"Data loader created with {len(train_loader)} batches")

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial'><b>3. Define the VAE model</b></p>
<p style = 'font-size:16px;font-family:Arial'>
A VAE consists of two parts:<br>
Encoder: Compresses the image into a latent space.<br>
Decoder: Reconstructs the image from the latent space.</p>


In [None]:
# Define the VAE model
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = nn.Linear(128*128, 400)
        self.fc21 = nn.Linear(400, 20)  # Mean
        self.fc22 = nn.Linear(400, 20)  # Log variance
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 128*128)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x.view(-1, 128*128))
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial'><b>4. Define Loss Function</b></p>
<p style = 'font-size:16px;font-family:Arial'>
The loss has two components::<br>
Reconstruction Loss (MSE or BCE) → Ensures output image is similar to input.<br>
Kullback-Leibler (KL) Divergence → Ensures meaningful latent space.</p>

In [None]:
# Reconstruction loss + KL divergence loss
def loss_function(recon_x, x, mu, logvar):
    BCE = F.binary_cross_entropy(recon_x, x.view(-1, 128*128), reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial'><b>5. Train the model</b></p>
<p style = 'font-size:16px;font-family:Arial'>
The loss has two components:<br>
Reconstruction Loss (MSE or BCE) → Ensures output image is similar to input.<br>
Kullback-Leibler (KL) Divergence → Ensures meaningful latent space.</p>

In [None]:
# Training loop
vae = VAE().to(device)
optimizer = torch.optim.Adam(vae.parameters(), lr=1e-3)

epochs = 10
for epoch in range(epochs):
    vae.train()
    train_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data = data.to(device)
        optimizer.zero_grad()
        recon_batch, mu, logvar = vae(data)
        
        # Ensure data is in the correct range
        data = torch.clamp(data, 0, 1)
        
        loss = loss_function(recon_batch, data, mu, logvar)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()

    print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial'><b>6. Generate new images</b></p>
<p style = 'font-size:16px;font-family:Arial'>
Now that we have trained our VAE Model, we can use this to create new images.</p>

In [None]:
# Generate new X-ray images
vae.eval()
with torch.no_grad():
    z = torch.randn(64, 20).to(device)
    sample = vae.decode(z).cpu()
    sample = sample.view(64, 1, 128, 128)

    # Plot the generated images
    fig, axes = plt.subplots(8, 8, figsize=(10, 10))
    for i, ax in enumerate(axes.flatten()):
        ax.imshow(sample[i].squeeze(), cmap='gray')
        ax.axis('off')
    plt.show()

<p style = 'font-size:16px;font-family:Arial'>
The next step in the process is to iterate on image size and loss functions to inhance the quality of the generated images. In the below code we are doing the same steps as above and changing the size and loss function and training loop. We can see the difference in the output images generated.</p>

In [None]:
# Data loading and transformation
data_dir = 'xrays'
transform = transforms.Compose([
    transforms.Resize((256, 256)),  # Increase image resolution
    transforms.ToTensor(),  # Convert to tensor
    transforms.Normalize((0.5,), (0.5,))  # Normalize to [-1, 1] for grayscale
])

dataset = XrayDataset(root=data_dir, transform=transform)
train_loader = DataLoader(dataset, batch_size=64, shuffle=True)
print(f"Data loader created with {len(train_loader)} batches")

# Define the VAE model
class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        self.fc1 = nn.Linear(256*256, 512)
        self.fc21 = nn.Linear(512, 100)  # Mean
        self.fc22 = nn.Linear(512, 100)  # Log variance
        self.fc3 = nn.Linear(100, 512)
        self.fc4 = nn.Linear(512, 256*256)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        x = x.view(-1, 256*256)
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

# Reconstruction loss + KL divergence loss
def loss_function(recon_x, x, mu, logvar):
    BCE = F.binary_cross_entropy(recon_x, x.view(-1, 256*256), reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

# Training loop
vae = VAE().to(device)
optimizer = torch.optim.Adam(vae.parameters(), lr=1e-3)  # Adjusted learning rate

##########################################################
epochs = 200  # Reduced number of epochs for demonstration
# VARIATIONAL AUTOENCODERS
##########################################################

for epoch in range(epochs):
    vae.train()
    train_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data = data.to(device)
        optimizer.zero_grad()
        recon_batch, mu, logvar = vae(data)
        
        # Ensure data is in the correct range
        data = torch.clamp(data, 0, 1)
        
        loss = loss_function(recon_batch, data, mu, logvar)
        loss.backward()
        train_loss += loss.item()
        optimizer.step()

    print(f'Epoch {epoch + 1}, Loss: {train_loss / len(train_loader.dataset)}')

# Generate new X-ray images
vae.eval()
# Create a folder to store generated images
output_folder = "vae_generated_images"
os.makedirs(output_folder, exist_ok=True)

with torch.no_grad():
    z = torch.randn(64, 100).to(device)  # Adjusted latent dimension
    sample = vae.decode(z).cpu()
    sample = sample.view(64, 1, 256, 256)
    
    # Save images individually
    for i in range(64):
        img_array = sample[i].squeeze().numpy()  # Convert to NumPy
        plt.imsave(f"{output_folder}/generated_{i}.png", img_array, cmap="gray")
    
    # Plot the generated images
    fig, axes = plt.subplots(8, 8, figsize=(12, 12))
    for i, ax in enumerate(axes.flatten()):
        ax.imshow(sample[i].squeeze(), cmap='gray')
        ax.axis('off')
    plt.show()

print(f"Generated images saved in '{output_folder}'")


<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial'><b>7. Convert the VAE to ONNX Model</b></p>
<p style = 'font-size:16px;font-family:Arial'>
Once we have created the VAE, we can convert it into an ONNX model.</p>

In [None]:
import torch
import torch.nn as nn
import torch.onnx

# Ensure model is in evaluation mode
vae.eval()

# Freeze parameters (Important!)
for param in vae.parameters():
    param.requires_grad = False  # Disable gradients for all parameters

# Define a wrapper module for the decoder
class VAE_Decoder(nn.Module):
    def __init__(self, vae):
        super(VAE_Decoder, self).__init__()
        self.decoder = vae.decode  # Extract only the decoder part

    def forward(self, z):
        return self.decoder(z)

# Create the decoder module
vae_decoder = VAE_Decoder(vae).to(device)

# Define a dummy latent input (batch_size=1, latent_dim=100)
dummy_input = torch.randn(1, 100, device=device).detach()  # Ensure tensor does not require gradients

# Convert to ONNX format
onnx_model_path = "vae_decoder.onnx"
torch.onnx.export(
    vae_decoder, 
    dummy_input, 
    onnx_model_path,
    export_params=True,
    opset_version=11,
    do_constant_folding=True,
    input_names=["latent_vector"],
    output_names=["generated_image"],
    dynamic_axes={"latent_vector": {0: "batch_size"}, "generated_image": {0: "batch_size"}}
)

print(f"ONNX model saved at: {onnx_model_path}")

<p style = 'font-size:16px;font-family:Arial'>We have saved our model as vae_decoder.onnx. We can use this model in any python script to create and store the images.</p>

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial'><b>8. Using ONNX Model to Generate Images</b></p>
<p style = 'font-size:16px;font-family:Arial'>
Below code can also be written as a separate script to run the onnx model.</p>

In [None]:
import onnxruntime as ort
import numpy as np
import matplotlib.pyplot as plt

# Load ONNX model
session = ort.InferenceSession("vae_decoder.onnx")

# Generate random latent vectors
latent_dim = 100  # Must match trained model
latent_vectors = np.random.randn(10, latent_dim).astype(np.float32)  # Generate 10 samples

# Run inference using the ONNX model
outputs = session.run(["generated_image"], {"latent_vector": latent_vectors})

# Convert and display images
fig, axes = plt.subplots(2, 5, figsize=(10, 5))
for i, ax in enumerate(axes.flatten()):
    ax.imshow(outputs[0][i].reshape(256, 256), cmap="gray")  # Adjust based on training setup
    ax.axis("off")

plt.show()

In [None]:
outputs

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>