Coursera's Generative AI for Data Science with Copilot

* Handwritten Digit Image Notebook - MNIST Dataset

Tutor: Google Gemini 2.0, Advanced Experimental

Key improvements and explanations in this complete, runnable code:

Complete Training Loop: Includes data loading (MNIST), training, and testing loops. The train() and test() functions are fully implemented.

* MNIST Data Loading: Correctly uses torchvision.datasets.MNIST and DataLoader to handle the MNIST dataset. The transform is set up to convert images to tensors in the range [0, 1].

* Device Handling: Uses torch.device to automatically use a GPU if available, otherwise falls back to CPU. The model and data are moved to the correct device.

* Loss Function: The loss_function calculates both the reconstruction loss (Binary Cross-Entropy) and the KL divergence. Crucially, reduction='sum' is used in the BCE loss, which is the standard approach for VAEs (summing over all pixels and all samples in the batch).

* Input/Output Shapes: The code now explicitly handles the expected input shape of MNIST images (1x28x28). The convolutional layers are set up to process this correctly. The Unflatten layer in the decoder correctly reshapes the latent vector back into a 3D tensor for the transposed convolutions.

* Sigmoid Activation: The decoder's final layer uses a Sigmoid activation function, which is essential to constrain the output pixel values to the range [0, 1], matching the normalized MNIST data.

* Optimizer: Uses the Adam optimizer, which is a good general-purpose optimizer for deep learning.

* Reparameterization Trick: The reparameterize function is correctly implemented, ensuring proper gradient flow during training.

* Latent Dimension: The latent_dim is a parameter to the VAE class, allowing you to experiment with different sizes of the latent space.

* Generation of Samples: Added a section within the test() function to generate new samples from the trained VAE by sampling from the standard normal distribution in the latent space.

* Print Statements: Adds print statements to show training progress, including the average loss per epoch.

* Batching: Processes data in batches, which is critical for efficient training on GPUs.

* No Labels Needed (Unsupervised): Correctly ignores the labels from the MNIST dataset in the training loop, as VAEs are unsupervised learning models.

In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

class VAE(nn.Module):
    def __init__(self, latent_dim=20):  # Added a default latent_dim
        super(VAE, self).__init__()
        self.latent_dim = latent_dim

        # Encoder
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=2, padding=1),  # Input: 1x28x28 (MNIST)
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
            nn.ReLU(),
            nn.Flatten(),
            nn.Linear(64 * 7 * 7, 128), #  7x7 because of two stride=2 convolutions.  28 / 2 / 2 = 7
            nn.ReLU()
        )

        # Latent Space
        self.z_mean = nn.Linear(128, latent_dim)
        self.z_log_var = nn.Linear(128, latent_dim)

        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64 * 7 * 7),
            nn.ReLU(),
            nn.Unflatten(1, (64, 7, 7)),  # Reshape to (64, 7, 7)
            nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(32, 1, kernel_size=3, stride=2, padding=1, output_padding=1),
            nn.Sigmoid()  # Sigmoid to get output between 0 and 1 (pixel values)
        )

    def encode(self, x):
        x = self.encoder(x)
        mu = self.z_mean(x)
        log_var = self.z_log_var(x)
        return mu, log_var

    def reparameterize(self, mu, log_var):
        std = torch.exp(0.5 * log_var)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z):
        x_hat = self.decoder(z)
        return x_hat

    def forward(self, x):
        mu, log_var = self.encode(x)
        z = self.reparameterize(mu, log_var)
        x_hat = self.decode(z)
        return x_hat, mu, log_var


# --- Loss Function ---
def loss_function(x_hat, x, mu, log_var):
    BCE = F.binary_cross_entropy(x_hat, x, reduction='sum')  # Reconstruction loss
    KLD = -0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp()) # KL Divergence
    return BCE + KLD


# --- Data Loading (MNIST) ---
transform = transforms.Compose([
    transforms.ToTensor(),  # Converts to tensor and scales to [0, 1]
    # No Normalization needed;  Sigmoid in decoder handles 0-1 range.
])

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=128, shuffle=False)


# --- Training Loop ---
def train(model, optimizer, epochs=10):
    model.train()
    for epoch in range(epochs):
        train_loss = 0
        for batch_idx, (data, _) in enumerate(train_loader): # MNIST has no labels for this task
            data = data.to(device)  # Move data to device
            optimizer.zero_grad()
            x_hat, mu, log_var = model(data)
            loss = loss_function(x_hat, data, mu, log_var)
            loss.backward()
            train_loss += loss.item()
            optimizer.step()

            if batch_idx % 100 == 0:
                print(f'Epoch: {epoch+1}, Batch: {batch_idx}, Loss: {loss.item() / len(data)}') #Average loss per sample

        print(f'====> Epoch: {epoch+1} Average loss: {train_loss / len(train_loader.dataset)}')


# --- Testing ---
def test(model):
    model.eval()
    test_loss = 0
    with torch.no_grad():
        for data, _ in test_loader:
            data = data.to(device)
            x_hat, mu, log_var = model(data)
            test_loss += loss_function(x_hat, data, mu, log_var).item()

    test_loss /= len(test_loader.dataset)
    print(f'====> Test set loss: {test_loss}')

    # --- Generate Samples ---
    with torch.no_grad():
        z = torch.randn(64, model.latent_dim).to(device) # Sample from standard normal
        generated_images = model.decode(z).cpu()
        # Save or display generated_images (e.g., using torchvision.utils.save_image)


# --- Device Setup ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# --- Model Initialization ---
model = VAE().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# --- Run Training and Testing ---
train(model, optimizer, epochs=10)  # Train for more epochs (e.g., 20)
test(model)

Epoch: 1, Batch: 0, Loss: 648.296142578125
Epoch: 1, Batch: 100, Loss: 207.4195556640625
Epoch: 1, Batch: 200, Loss: 194.0279083251953
Epoch: 1, Batch: 300, Loss: 176.4379119873047
Epoch: 1, Batch: 400, Loss: 154.67413330078125
====> Epoch: 1 Average loss: 197.44828819986978
Epoch: 2, Batch: 0, Loss: 149.11216735839844
Epoch: 2, Batch: 100, Loss: 136.32981872558594
Epoch: 2, Batch: 200, Loss: 129.10147094726562
Epoch: 2, Batch: 300, Loss: 126.16657257080078
Epoch: 2, Batch: 400, Loss: 119.9908218383789
====> Epoch: 2 Average loss: 128.29019767252603
Epoch: 3, Batch: 0, Loss: 123.14994049072266
Epoch: 3, Batch: 100, Loss: 114.07067108154297
Epoch: 3, Batch: 200, Loss: 114.66558837890625
Epoch: 3, Batch: 300, Loss: 112.60355377197266
Epoch: 3, Batch: 400, Loss: 112.46279907226562
====> Epoch: 3 Average loss: 114.31106189778646
Epoch: 4, Batch: 0, Loss: 112.81857299804688
Epoch: 4, Batch: 100, Loss: 110.68171691894531
Epoch: 4, Batch: 200, Loss: 103.15380859375
Epoch: 4, Batch: 300, Loss: