For each practical exercise (TP), please work in groups of two or three. Then, create a **private GitHub repository** and add me (my GitHub is **arthur-75**) to your project. Finally, share the link to your project (or TP) under  [Practical Exercises](https://docs.google.com/spreadsheets/d/1V-YKgHn71FnwjoFltDhWsPJS7uIuAh9lj6SP2DSCvlY/edit?usp=sharing) and make sure to choose your **team name** :-)

# **Variational Autoencoders on CelebA Images**

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms, utils

In [None]:
# --- Hyperparameters ---
IMAGE_SIZE = 32
CHANNELS = 3
BATCH_SIZE = 128
NUM_FEATURES = 128
Z_DIM = 200
LEARNING_RATE = 0.0005
EPOCHS = 30
BETA = 2000  # Weight on reconstruction loss
LOAD_MODEL = False
DEVICE = "cuda" "mps"

## **1: Dataset Preparation and Exploration**

1. **Obtain the CelebA dataset** from Kaggle and **unzip** it into a suitable directory (e.g., `img_align_celeba`).  
   * [CelebFaces Attributes (CelebA) Dataset](https://www.kaggle.com/datasets/jessicali9530/celeba-dataset)   
2. **Check the data structure**:  
   * Typically, **ImageFolder** expects subfolders for each class, but here you only have one folder with all images. Adapt your **folder path** accordingly.  
3. **Create a Transform** to:  
   * **Resize** images to 32×32 (smaller dimension for quicker training).  
   * **Convert** to tensors (and optionally normalize if you like).  
4. **Wrap the dataset** in a **DataLoader** with:  
   * `batch_size`, `shuffle=True`.  
   * `drop_last=True` to ensure each batch is exactly the specified size (especially for large images).  
5. **Visualize** a small batch to ensure everything is loading correctly (e.g., using a grid display).

In [None]:
# Path to CelebA images
DATA_PATH = xxx

# Transforms: resize to 32x32 and normalize to [0,1]
transform = transforms.Compose([
    xxx.xxx((xxx, xxx)),
    xxx.xxx()  # values in [0,1]
])

# Dataset & DataLoader
train_dataset = datasets.ImageFolder(
    root=os.path.dirname(DATA_PATH),  # Folder containing subfolder "img_align_celeba"
    transform=transform
)
# NB: If you have a specific folder structure, adjust accordingly

train_loader = DataLoader(
    xxx,
    xxx=xxx,
    shuffle=True,
    num_workers=2,
    drop_last=True
)

# Utility: show a batch of faces
def show_batch(images, title=""):
    grid_img = utils.make_grid(images, nrow=8)
    plt.figure(figsize=(12, 6))
    plt.imshow(grid_img.permute(1, 2, 0).cpu().numpy())
    plt.title(title)
    plt.axis("off")
    plt.show()

# Test: Show a sample batch
data_iter = iter(train_loader)
images, _ = next(data_iter)
show_batch(images, title="Training Samples")