<a href="https://colab.research.google.com/github/AditeyaAItronics/devsoc-bits/blob/main/neural_style_transfer/style_transfer_vgg.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 🧠 Neural Style Transfer using VGG-19

In this task, you’ll implement **Neural Style Transfer (NST)** using a **pre-trained VGG-19** model. This will help you deepen your understanding of convolutional neural networks (CNNs), feature extraction, and how deep learning can be used for image generation.

---

### ✅ Objectives

- Build a Neural Style Transfer pipeline using **VGG-19**.
- Use only the **first convolutional layer** from **each of the five convolutional blocks** in VGG-19.
    - These layers strike a balance between general texture features and higher-level abstraction.
    - Deeper layers become too specialized for object recognition and are less effective for capturing style.

---

### ⚙️ System Requirements

VGG-19 is a **computationally heavy** model. If your system struggles to run it:

- Use **Google Collab** to access free GPU resources.
- This will allow faster computation and smoother experimentation.

### 📄 Reference Paper

We’ve linked the paper on **CNN-based image style transformation** below.

It uses VGG-19 and provides insight into the theory behind NST.

> 🔗 https://drive.google.com/file/d/1Dbxaazv-L2SbC3gY4cPlqOQmM2iGmwyB/view
>

Please read it carefully—it will help you understand what’s going on inside the model and how different layers contribute to the stylization process.

## Implementation thought process

- Load Pre-trained VGG-19
 Use a pre-trained VGG-19 model from PyTorch (torchvision.models.vgg19(pretrained=True)) or TensorFlow (tf.keras.applications.VGG19).
 Set the model to evaluation mode and freeze its weights.

- Select the First Conv Layer from Each Block
 In VGG-19, the first convolutional layers of the five blocks are typically named: conv1_1, conv2_1, conv3_1, conv4_1, conv5_1. Extract outputs from these layers for style and content representations.

- Build the NST Pipeline

  - Preprocess Images: Resize, normalize, and convert images to tensors.
  - Extract Features: Pass the content and style images through the model, capturing activations from the selected layers.
  - Compute Losses:
     - Content Loss: Usually from conv4_1.
     - Style Loss: Use Gram matrices from all five selected layers.
- Optimization: Start with the content image (or white noise) and iteratively update it to minimize the combined loss.

## NST Pipeline

In [10]:
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image

In [18]:
vgg = models.vgg19(pretrained=True).features.eval()

for param in vgg.parameters():
    param.requires_grad = False

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
vgg = vgg.to(device)

In [19]:
## layer selection
# The indices of the layers we want to use for feature extraction
selected_layers = {'0': 'conv1_1', '5': 'conv2_1', '10': 'conv3_1', '19': 'conv4_1', '28': 'conv5_1'}

In [20]:
## Preprocessing function
def image_loader(image_path, imsize=512):
    loader = transforms.Compose([
        transforms.Resize((imsize, imsize)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    image = Image.open(image_path).convert('RGB')
    image = loader(image).unsqueeze(0)
    return image

In [21]:
## Feature extraction function
def get_features(image, model, layers):
    features = {}
    x = image
    for name, layer in model._modules.items():
        x = layer(x)
        if name in layers:
            features[layers[name]] = x
    return features

In [22]:
## style representation function
def gram_matrix(tensor):
    b, c, h, w = tensor.size()
    features = tensor.view(b * c, h * w)
    G = torch.mm(features, features.t())
    return G.div(b * c * h * w)

In [23]:
## loss function
content_weight = 1e4
style_weight = 1e2

def compute_content_loss(gen_feat, content_feat):
    return torch.mean((gen_feat - content_feat) ** 2)

def compute_style_loss(gen_feats, style_feats):
    style_loss = 0
    for layer in style_feats:
        G = gram_matrix(gen_feats[layer])
        A = gram_matrix(style_feats[layer])
        style_loss += torch.mean((G - A) ** 2)
    return style_loss

In [None]:
!git clone https://github.com/AditeyaAItronics/devsoc-bits

fatal: destination path 'devsoc-bits' already exists and is not an empty directory.


In [34]:
content_img_path = '/content/devsoc-bits/neural_style_transfer/images/test_images/alien.jpg'
style_img_path = '/content/devsoc-bits/neural_style_transfer/images/style/picasso.jpg'

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load images
content_img = image_loader(content_img_path).to(device)
style_img = image_loader(style_img_path).to(device)

# Extract features
content_features = get_features(content_img, vgg, selected_layers)
style_features = get_features(style_img, vgg, selected_layers)

# generated_img = content_img.clone().requires_grad_(True).to(device)
generated_img = content_img.clone().detach().to(device)
generated_img.requires_grad_(True)

# Optimizer
optimizer = torch.optim.LBFGS([generated_img])
# optimizer = torch.optim.Adam([generated_img], lr=0.01)




In [38]:
optimizer = torch.optim.Adam([generated_img], lr=0.015)
num_steps = 1000

for step in range(num_steps):
    optimizer.zero_grad()
    gen_features = get_features(generated_img, vgg, selected_layers)
    content_loss = compute_content_loss(gen_features['conv4_1'], content_features['conv4_1'])
    style_loss = compute_style_loss(gen_features, style_features)
    total_loss = content_weight * content_loss + style_weight * style_loss
    total_loss.backward()
    optimizer.step()
    if step % 20 == 0:
        print(f"Step {step}: Content Loss: {content_loss.item():.4f}, Style Loss: {style_loss.item():.4f}, Total Loss: {total_loss.item():.4f}")

Step 0: Content Loss: 0.0061, Style Loss: 0.0003, Total Loss: 60.8689
Step 20: Content Loss: 0.0391, Style Loss: 0.0003, Total Loss: 390.8276
Step 40: Content Loss: 0.0143, Style Loss: 0.0003, Total Loss: 143.3551
Step 60: Content Loss: 0.0081, Style Loss: 0.0003, Total Loss: 81.2332
Step 80: Content Loss: 0.0058, Style Loss: 0.0003, Total Loss: 58.2111
Step 100: Content Loss: 0.0046, Style Loss: 0.0003, Total Loss: 46.4638
Step 120: Content Loss: 0.0039, Style Loss: 0.0003, Total Loss: 39.3022
Step 140: Content Loss: 0.0035, Style Loss: 0.0003, Total Loss: 34.6906
Step 160: Content Loss: 0.0031, Style Loss: 0.0003, Total Loss: 30.6571
Step 180: Content Loss: 0.0028, Style Loss: 0.0003, Total Loss: 28.3569
Step 200: Content Loss: 0.0027, Style Loss: 0.0003, Total Loss: 26.6253
Step 220: Content Loss: 0.0025, Style Loss: 0.0003, Total Loss: 25.3948
Step 240: Content Loss: 0.0025, Style Loss: 0.0003, Total Loss: 24.5711
Step 260: Content Loss: 0.0023, Style Loss: 0.0003, Total Loss: 23.0

In [39]:
def im_convert(tensor):
    image = tensor.clone().detach().cpu().squeeze(0)
    mean = torch.tensor([0.485, 0.456, 0.406]).view(3,1,1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(3,1,1)
    image = image * std + mean
    image = image.clamp(0, 1)
    return transforms.ToPILImage()(image)

output_image = im_convert(generated_img)
output_image.save('output.png')
output_image.show()