<a href="https://colab.research.google.com/github/alex-jk/painting-lora-finetune/blob/main/neural_style_transfer_photos_to_paintings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Install dependencies**

In [1]:
# Colab usually has recent torch/torchvision, but this is safe.
!pip -q install --upgrade torch torchvision pillow
import torch, torchvision, PIL
print("Torch:", torch.__version__, "| Vision:", torchvision.__version__)
print("CUDA available:", torch.cuda.is_available())

Torch: 2.8.0+cu126 | Vision: 0.23.0+cu126
CUDA available: True


In [3]:
import math
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models, transforms
from PIL import Image

### **Set device and ImageNet normalization (for VGG-19 features)**

- **Device selection:** Use GPU (`cuda`) if available; otherwise fall back to CPU. All tensors/ops must be on the **same device**.
- **ImageNet normalization:** VGG-19 expects RGB inputs scaled to **[0,1]** and normalized per channel with:
  - mean = `[0.485, 0.456, 0.406]`
  - std  = `[0.229, 0.224, 0.225]`
<br>We apply it as: `(x - mean) / std` (broadcast per channel over H×W).
- **Why:** Feeding the exact normalization used in training keeps VGG feature distributions correct; skipping it can cause unstable optimization or odd colors.
- **`.to(device)` on mean/std:** Puts these constants on the same device as your images to avoid device-mismatch errors.

In [4]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
IMAGENET_MEAN = torch.tensor([0.485, 0.456, 0.406]).to(device)
IMAGENET_STD  = torch.tensor([0.229, 0.224, 0.225]).to(device)

#**NST Code**

### Image I/O helpers

**load_image(path, target_long_side=None)**<br>
Convert to **RGB** so every image is 3 channels as VGG expects.<br>
Resize by the **longer side** to control compute/memory while keeping aspect ratio; **LANCZOS** gives high-quality downscaling (less aliasing → cleaner features).<br>
`ToTensor()` makes a float tensor scaled to **[0,1]**, which is required for the ImageNet normalization we apply later.<br>
`unsqueeze(0)` adds a **batch dimension** → models expect shape `(N, C, H, W)` even for a single image.<br>

**save_image(tensor, path)**<br>
`detach()` drops the autograd graph since we’re done optimizing pixels and just want the data.<br>
`clamp(0,1)` enforces valid image range after optimization so saved colors are not out of bounds.<br>
`cpu()` moves data to CPU because PIL saves from CPU tensors/arrays.<br>
`squeeze(0)` removes the batch dimension; PIL expects `(C, H, W)` (or `(H, W, C)` after conversion).<br>

In [5]:
def load_image(path, target_long_side=None):
    img = Image.open(path).convert("RGB")
    if target_long_side is not None:
        w, h = img.size
        scale = target_long_side / max(w, h)
        img = img.resize((round(w*scale), round(h*scale)), Image.LANCZOS)
    x = transforms.ToTensor()(img).unsqueeze(0).to(device)  # (1,3,H,W) in [0,1]
    return x

def save_image(tensor, path):
    x = tensor.detach().clamp(0,1).cpu().squeeze(0)
    transforms.ToPILImage()(x).save(Path(path))

### Normalization module

**What it does**<br>
Applies per-channel ImageNet normalization to an input tensor: `(x - mean) / std`.<br>
Stores `mean` and `std` reshaped to `(1, 3, 1, 1)` so they broadcast over H×W.

**Why we need it**<br>
VGG-19 was trained on ImageNet-normalized RGB; matching that distribution makes its features meaningful and stable for NST.<br>
`register_buffer(...)` keeps `mean/std` on the right device (move with `.to(device)`), included in `state_dict`, and **not** trainable parameters.

In [None]:
class Normalization(nn.Module):
    def __init__(self, mean, std):
        super().__init__()
        self.register_buffer("mean", mean.view(1,3,1,1))
        self.register_buffer("std",  std.view(1,3,1,1))
    def forward(self, x): return (x - self.mean) / self.std