<a href="https://colab.research.google.com/github/bhavya6701/comp473-project/blob/main/comp473_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Implementation of Artistic Style Transfer Using Convolutional Neural Networks
**Authors:** Shibin Koshy [40295019], Ruturajsinh Vihol [40154693], Bhavya Manjibhai Ruparelia [40164863]

## Introduction

This notebook implements the groundbreaking paper *"A Neural Algorithm of Artistic Style"* by Gatys et al., which introduced a method to blend the **content** of one image with the **style** of another using Convolutional Neural Networks (CNNs). This method, often referred to as *Neural Style Transfer*, leverages the feature extraction capabilities of a pre-trained **VGG-19** network to achieve stunning artistic transformations.

### Key Concepts

1. **Feature Extraction**:
   - The **VGG-16**, **VGG-19**, and **ResNet** network are used to extract features from both the content and style images.
   - Features are extracted from specific layers of the network to represent different levels of abstraction.

2. **Style and Content Representation**:
   - **Content Loss**: Measures the difference between the content features of the target image and the original content image.
   - **Style Loss**: Uses the **Gram matrix** of feature maps to capture the texture and style of the reference style image.

3. **Total Loss**:
   - The final loss is a weighted sum of the style loss and content loss.
   - This total loss is minimized to iteratively update the target image, blending the desired content with the artistic style.

### Implementation Details

- **Framework**: The implementation uses **PyTorch** for flexibility and efficiency.
- **Pre-trained Model**: PyTorch's **VGG-16**, **VGG-19**, and **ResNet-18** models are used for feature extraction.
- **Optimization**:
   - The target image (initialized as noise or a copy of the content image) is iteratively optimized to minimize the total loss.
   - The optimization is performed using the **Adam optimizer** for stability and efficiency.

This notebook demonstrates how to perform Neural Style Transfer step-by-step, blending content and style seamlessly into a single artistic output.

## Imports

In [None]:
import json
import os
import torch
import warnings

from decimal import Decimal
from model_utils import (
    load_models,
    style_transfer_from_content,
    style_transfer_from_noise
)

from image_utils import load_image, tensor_to_image, plot_style_transfer, plot_images

In [None]:
# Filter out warnings
warnings.filterwarnings("ignore")

# Set the home directory
HOME = os.getcwd()

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "cpu")

## Load Pre-trained Models

VGG and ResNet models are pre-trained on the ImageNet dataset and is available in the `torchvision.models` module and are used to extract features from the content and style images.

The VGG and ResNet models is split into two parts:
1. **`features`**: Contains the convolutional layers of the network.
2. **`classifier`**: Contains the fully connected layers of the network.

For Neural Style Transfer, we only need the `features` part of the model.

- **`VGG-16`**: **VGG-16** is a variant of the VGG model with 16 layers. It is known for its simplicity and depth and is used in this implementation for feature extraction.
- **`VGG-19`**: **VGG-19** is a variant of the VGG model with 19 layers. It is known for its depth and is used in this implementation for feature extraction.
- **`ResNet-18`**: **ResNet-18** is a variant of the ResNet model with 18 layers. It is known for its residual connections and is used in this implementation for feature extraction.

In [None]:
# Load pre-trained models from torchvision
model_dict = load_models(device)

## Load Images

Load the content and style images that we want to blend using Neural Style Transfer. You can upload your own images or use the default images provided.

- **Content Image**: The image whose content we want to preserve in the final output. Modify the `CONTENT_IMAGE_PATH` variable to the path of your desired content image.
- **Style Image**: The image whose style we want to apply to the content image. Modify the `STYLE_IMAGE_PATH` variable to the path of your desired style image.

In [None]:
CONTENT_IMAGE_PATH = HOME + "/data/input_images/content/content14.jpeg"
STYLE_IMAGE_PATH = HOME + "/data/input_images/style/style22.jpeg"

content = load_image(CONTENT_IMAGE_PATH).to(device)
style = load_image(STYLE_IMAGE_PATH, shape=content.shape[-2:]).to(device)

# Display the content and style images
plot_images(
    fig_size=(10, 6),
    rows=1,
    cols=2,
    images=[tensor_to_image(content), tensor_to_image(style)],
    titles=["Content Image", "Style Image"],
    axis="off",
)

## Initialize Parameters

The parameters for the Neural Style Transfer algorithm are initialized in this section. Parameter modifications can be done in the `model_config.json` file. Restart the runtime after modifying the parameters to apply the changes.


In [None]:
# Load the JSON file
with open("model_config.json", "r") as json_file:
    data = json.load(json_file)

# Extracting data into variables
layers = data["layers"]
params = data["hyperparameters"]

iterations = params["iters"]

## VGG-16 vs VGG-19 vs ResNet-18 : A Comparative Analysis

- **Content Features**:
    - **VGG-16**: Extracted from the `Conv4_2` layer.
    - **VGG-19**: Extracted from the `Conv4_2` layer.
    - **ResNet-18**: Extracted from the `layer3` block.
- **Style Features**:
    - **VGG-16**: Extracted from multiple layers (`Conv1_1`, `Conv2_1`, `Conv3_1`, `Conv4_1`, `Conv5_1`).
    - **VGG-19**: Extracted from multiple layers (`Conv1_1`, `Conv2_1`, `Conv3_1`, `Conv4_1`, `Conv5_1`).
    - **ResNet-18**: Extracted from multiple layers (`layer1`, `layer2`, `layer3`, `layer4`).

The **loss function** has two components:
1. **Content Loss**: Measures the difference between content features of the target and content images.
2. **Style Loss**: Measures the difference between Gram matrices of the style features from the target and style images.

The total loss, a weighted sum of content and style losses, is minimized through an iterative optimization process. Starting with the content image, the target image is gradually updated using gradient descent to produce the final stylized output.

The following sections uses Adam optimizer to minimize the total loss and generate artistic style transfer outputs using both **VGG-16** and **VGG-19** models. The initial target image is set as a copy of the content image, and the style transfer process is demonstrated for both models.

### VGG-16 Style Transfer

In [None]:
# Call the style transfer function for the VGG-16 model
vgg_16_saved_images, total_losses = style_transfer_from_content(
    "vgg-16", model_dict, content, style, data, device
)

# Create a plot with the images and total loss values
plot_style_transfer(vgg_16_saved_images, total_losses, iterations)

### VGG-19 Style Transfer

In [None]:
# Call the style transfer function for the VGG-19 model
vgg_19_saved_images, total_losses = style_transfer_from_content(
    "vgg-19", model_dict, content, style, data, device
)

# Create a plot with the images and total loss values
plot_style_transfer(vgg_19_saved_images, total_losses, iterations)

### ResNet-18 Style Transfer

In [None]:
# Call the style transfer function for the ResNet-18 model
resnet_18_saved_images, total_losses = style_transfer_from_content(
    "resnet-18", model_dict, content, style, data, device
)

# Create a plot with the images and total loss values
plot_style_transfer(resnet_18_saved_images, total_losses, iterations)


### Style Transfer Analysis

The output images from the **VGG-16**, **VGG-19**, and **ResNet-18** models are compared to evaluate the quality of style transfer. It can be observed that the **VGG-19** model produces more detailed and visually appealing results, as it captures complex features from the style image. The **VGG-16** model, on the other hand, provides a smoother and less detailed output, which may be preferred for certain artistic styles. Lastly, the **ResNet-18** model fails in artistic style transfer because its residual connections prioritize sparse, high-level features over the rich spatial and texture details needed for style representation, unlike VGG models designed for diverse feature extraction. Additionally, its pre-training on classification tasks limits alignment with style transfer requirements.

The choice remains between the two VGG models and it depends on the desired level of detail and texture in the final stylized image. For intricate and complex styles, the **VGG-19** model is recommended, while the **VGG-16** model may be suitable for simpler and smoother styles.

In [None]:
plot_images(
    fig_size=(20, 8),
    rows=1,
    cols=3,
    images=[
        vgg_16_saved_images[-1],
        vgg_19_saved_images[-1],
        resnet_18_saved_images[-1],
    ],
    titles=["VGG-16 Final Image", "VGG-19 Final Image", "ResNet-18 Final Image"],
    axis="off",
)

## Content Layer Comparison

In the Neural Style Transfer process, the content features are extracted from intermediate layers of the pre-trained VGG model. The choice of content layer affects the level of detail preserved in the final stylized output.

In this section, the impact of different content layers (`Conv1_1`, `Conv2_1`, `Conv3_1`, `Conv4_1`, and `Conv5_1`) on the style transfer process is analyzed. By changing the content layer, we can control the level of abstraction and detail in the final stylized image.

**Analysis**: It can be observed that using deeper layers (`Conv4_1` and `Conv5_1`) for content extraction preserves more detailed structures from the content image. The output images exhibit finer textures and shapes, capturing intricate patterns and features. On the other hand, using shallower layers (`Conv1_1` and `Conv2_1`) results in smoother and more abstract stylized outputs, where the overall structure is maintained without fine details. The choice of content layer depends on the desired balance between structure and detail in the final stylized image.

In [None]:
compare_layers = ["conv1_1", "conv2_1", "conv3_1", "conv4_1", "conv5_1"]
layer_images = []
layer_titles = []

# Iterate through the layers
for layer in compare_layers:
    print(f"Running style transfer for layer: {layer}")
    
    # Update the layer in the JSON file
    data["content_layer"]["vgg"] = layer

    # Call the style transfer function for the VGG-19 model
    saved_images, total_losses = style_transfer_from_content(
        "vgg-19", model_dict, content, style, data, device
    )

    # Append the final image to the list
    layer_images.append(saved_images[-2])
    layer_titles.append(f"CNN Layer: {layer}")

# Create a plot with all the images obtained using different layers
plot_images(
    fig_size=(20, 4),
    rows=1,
    cols=5,
    images=layer_images,
    titles=layer_titles,
    axis="off",
)

## α/β Ratio Analysis

The α/β ratio in the Neural Style Transfer algorithm controls the relative importance of content and style in the final output. By adjusting the α/β ratio, we can emphasize either the content or style features, leading to different artistic effects.

**Low α/β Ratio (α=1, β=1e7)**: In this case, the style loss is heavily weighted compared to the content loss. The final output image is dominated by the style features, resulting in a strong artistic style with minimal content influence.

**High α/β Ratio (α=1, β=1e4)**: Conversely, when the content loss is heavily weighted, the final output image closely resembles the content image with minimal style influence. The artistic style is subtle, and the content structure is preserved.

The choice of α/β ratio depends on the desired artistic effect. A low ratio emphasizes the style features, creating bold and visually striking images, while a high ratio preserves the content structure, resulting in subtle and nuanced stylized outputs. The optimal ratio varies based on the content and style images, as well as the artistic intent of the user.

In [None]:
style_weights = [1e4, 1e5, 1e6, 1e7]
output_images = [tensor_to_image(content), tensor_to_image(style)]
image_titles = ["Content Image", "Style Image"]

# Call the style transfer function for the VGG-19 model
for sw in style_weights:
    print(f"Running style transfer algorithm with style weight = {sw}")
    data["hyperparameters"]["beta"] = sw
    saved_images, total_losses = style_transfer_from_content(
        "vgg-19", model_dict, content, style, data, device
    )

    output_images.append(saved_images[-2])
    image_titles.append(f"α/β Ratio = {Decimal(data['hyperparameters']['alpha']/sw):.0E}")

# Create a plot with all the images obtained using different style weight values
plot_images(
    fig_size=(12, 16),
    rows=3,
    cols=2,
    images=output_images,
    titles=image_titles,
    axis="off",
)

## Style Transfer with L-BFGS

In the previous sections, Neural Style Transfer was performed using the Adam optimizer for optimization. While Adam is efficient and widely used, the L-BFGS optimization algorithm offers certain advantages for style transfer tasks.

**L-BFGS** (Limited-memory Broyden-Fletcher-Goldfarb-Shanno) is a popular optimization algorithm for large-scale optimization problems. It approximates the inverse Hessian matrix using limited memory, making it memory-efficient and suitable for high-dimensional optimization tasks.

**VGG-19** model is used for style transfer with L-BFGS optimization. The initial target image is set to white noise, and the optimization process is carried out using L-BFGS to generate stylized outputs. The results are compared with those obtained using the Adam optimizer to evaluate the quality and efficiency of the L-BFGS algorithm.

In [None]:
# Modify the hyperparameters
data["hyperparameters"]["iters"] = 100
data["hyperparameters"]["beta"] = 1e5
data["hyperparameters"]["lr"] = 1


saved_images, total_losses = style_transfer_from_noise(
    "vgg-19",
    model_dict,
    content,
    style,
    data,
    device)

vgg16_output_image = saved_images[-1]

# Create a plot with the images and total loss values
plot_style_transfer(saved_images, total_losses, iterations)

## Variation in Results (Using L-BFGS Optimizer)

The L-BFGS optimizer is an alternative optimization algorithm that can be used for Neural Style Transfer. Compared to the Adam optimizer, L-BFGS is known for its memory efficiency and convergence properties, making it suitable for style transfer tasks.

The initial target image is set to be white noise, and the L-BFGS optimizer is used to minimize the total loss. The style transfer process is demonstrated using the L-BFGS optimizer.

**Observations**:
- The L-BFGS optimizer produces visually appealing results with smooth transitions between content and style features.
- There is variation in the final output images in each run, as the optimization process converges to different local minima.
- The L-BFGS optimizer is computationally intensive but provides high-quality stylized outputs with rich texture and detail.

In [None]:
output_images = [tensor_to_image(content), tensor_to_image(style)]
image_titles = ["Content Image", "Style Image"]

# Call the style transfer function for the VGG-19 model (L-BFGS)
for i in range(4):
    print(f"Trial {i+1} for style transfer algorithm using L-BFGS")
    saved_images, total_losses = style_transfer_from_noise(
        "vgg-19", model_dict, content, style, data, device
    )

    output_images.append(saved_images[-2])
    image_titles.append(f"Trial {i+1} - Total Loss: {total_losses[-1]:.2f}")

# Create a plot with all the images obtained using different style weight values
plot_images(
    fig_size=(12, 16),
    rows=3,
    cols=2,
    images=output_images,
    titles=image_titles,
    axis="off",
)