<a href="https://colab.research.google.com/github/alex-jk/painting-lora-finetune/blob/main/combine_photos_image_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Combine photos and generate new similar image**

### What this code does

We create a **new image** that represents the common look of several input photos.
Instead of averaging pixels (which fails when shots are different), we average
**deep features** extracted by a fixed, pretrained CNN (**VGG19**).

- **VGG19**: a convolutional neural network trained on ImageNet. We **don’t train it**;
  we only use it as a feature extractor (its internal activations capture textures,
  shapes, and layout cues).

- **How it works**:
  1) Load your photos and pass them through VGG19 to grab activations at a few layers.
  2) Compute the **centroid (mean)** of those activations across your photos.
  3) Initialize a blank-ish image and **optimize its pixels** so that, when passed
     through VGG19, its activations match that centroid.
  4) A small **total-variation (TV)** term keeps the result smooth/clean.

- **Why VGG**: Its intermediate layers form a robust, hand-crafted “perceptual space”
  learned from millions of images—good for capturing what’s *shared* across different
  viewpoints/plates without needing to train any new model.
"""

First - import images that will be used to generate a new, similar, image

In [1]:
# 0) Upload your images (4–8 works well)
from google.colab import files
uploads = files.upload()
image_paths = list(uploads.keys())

Saving 3d1d3b01f2b975f7315e32d67df547e5.jpg to 3d1d3b01f2b975f7315e32d67df547e5.jpg
Saving 66f8f9.jpg to 66f8f9.jpg
Saving grechnevaya_kasha_v_multivarke_na_moloke-560590.jpg to grechnevaya_kasha_v_multivarke_na_moloke-560590.jpg
Saving grechnevaya-kasha-s-molokom-_1290017009_0.jpg to grechnevaya-kasha-s-molokom-_1290017009_0.jpg


In [2]:
# 1) Install PyTorch + torchvision (CPU wheels are fine in Colab; will use GPU if available)
!pip -q install torch torchvision --index-url https://download.pytorch.org/whl/cpu

In [3]:
# 2) Imports & helpers
import torch, torch.nn as nn, torch.optim as optim
from torchvision import models, transforms as T
from PIL import Image, ImageOps
import numpy as np

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

#### Config (what each knob does)

- TARGET_SIZE: Output resolution (pixels). Bigger = sharper but slower.
- NUM_STEPS: Optimization iterations. More = more detail/refinement.
- LR: Learning rate for pixel updates. Lower if results jitter; higher to converge faster.
- TV_WEIGHT: Smoothness strength (total variation). Raise for cleaner/smoother look; lower for sharper/edgier detail.

**FEATURE_LAYERS (VGG19 indices)**
- 8  (relu2_2): color/texture cues
- 15 (relu3_3): mid-level structure
- 22 (relu4_2): higher-level shapes/layout
Adjust the weights to bias the result (e.g., more weight on 22 → stronger structure).

**VGG_MEAN / VGG_STD**
ImageNet normalization used by VGG. We normalize as (img − mean) / std
before extracting features so activations are comparable across images.
"""

In [None]:
# --- Config you can tweak ---
TARGET_SIZE = 512          # 384–640 is a good range (bigger = slower, sharper)
NUM_STEPS   = 400          # 300–800 typical
LR          = 0.08
TV_WEIGHT   = 2e-6         # smoothness regularizer; raise for smoother, lower for crisper

# Feature layers to match (VGG19 feature indices):
# 8 = relu2_2, 15 = relu3_3, 22 = relu4_2  (good default mix)
FEATURE_LAYERS = {8: 1.0, 15: 1.0, 22: 1.0}

# VGG normalization stats
VGG_MEAN = torch.tensor([0.485, 0.456, 0.406]).view(1,3,1,1).to(device)
VGG_STD  = torch.tensor([0.229, 0.224, 0.225]).view(1,3,1,1).to(device)