# Neural Style Transfer (NST)
This notebook contains an implementation of the NST algorithm as proposed in "A Neural Algorithm of Artistic Style" which can be found here: https://arxiv.org/pdf/1508.06576
The notebook serves as a playground to test the capabilities with different images, styles, configurations, etc.

In [None]:
import torch
from torchvision import models

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using {device}")

For NST the default model to be used for feature extraction is VGG-19 which has been pretrained for image recognition. Since it is kind of useless to train VGG again from scratch, we can use the pretrained version from the TorchVision model library.

In [None]:
vgg = models.vgg19(pretrained=True).features.to(device).eval()

# Freeze the model parameters
for param in vgg.parameters():
    param.requires_grad = False

The most important thing is of course to calculate the loss of the generated image, in order to push it towards the desired result.

There are two losses that need to be combined. First, there is the content loss which directly compares the feature maps from the content and generated images. This one ensures that the output image resembles the content image in form.
The second loss is the style loss, which compares the Gram matrices (representing the style in feature maps through repeating patterns and textures) of the style and generated images. This loss function pushes the generated image to apply the style from the style image.
By then combining these two losses we generate an image that contains the same content as the content image, in the style of the style image. By playing with the weights of these two losses, we can control how the generated images look like.

In [None]:
def calculate_gram_matrix(tensor):
    # Extract dimensions
    _, n_channels, height, width = tensor.size()
    
    # Flatten the feature map by reshaping it into (n_channels, height * width)
    features = tensor.view(n_channels, height * width)
    
    # Calculate the Gram matrix as the product of the features with its transpose
    gram = torch.mm(features, features.t())
    
    return gram

In [None]:
def calculate_loss(generated_feature_maps, content_feature_maps, style_grams, alpha, beta):
    # Calculate content loss as the MSE between the feature maps of the given layer
    # Looking at other implementations conv4_2 seems to be sufficient for the content loss
    content_loss = torch.nn.MSELoss(reduction='mean')(content_feature_maps['conv4_2'].squeeze(axis=0), generated_feature_maps['conv4_2'].squeeze(axis=0))

    # Calculate the style loss using the Gram matrices
    style_loss = 0
    for layer in style_grams.keys:
        generated_gram = calculate_gram_matrix(generated_features[layer])
        layer_style_loss = torch.nn.MSELoss(generated_gram, style_grams[layer])
        style_loss += layer_style_loss
    
    # Average style loss across layers
    style_loss /= len(style_grams.keys)

    # Combine the content and style losses with their respective weights
    total_loss = alpha * content_loss + beta * style_loss

    return total_loss, content_loss, style_loss
    