# BEST 2019 Final examination

![](https://github.com/glouppe/info8010-deep-learning/blob/master/tutorials/images/style_transfer.png?raw=1)

## Neural Style Transfer

The goal of your project is to reimplement the algorithm proposed in: [A Neural Algorithm of Artistic Style](https://arxiv.org/pdf/1508.06576.pdf)

We expect you to come up with a working implementation of the original algorithm that is based on the following steps.

This will correspond to your **minimum viable product!**.

- Given any natural and stylized images, the goal is to produce a new image which keeps the content of the first one and applies the style of the latter.

- We expect you to work with a pretrained neural network (e.g. VGG16, ResNet...) which will serve as a feature extractor: to do so you will have to come up with your own forward() function.

- You will have to reimplement the different losses presented in the paper.

- Produce at least one good-looking sample for the content image you were given.

**BONUS:**

- Play with the different layers used for extracting the features.

- Investigate how to balance the preservation of the content with respect to the transferred style.

- Come up with a way that extends the algorithm to the use of multiple styles.

- Investigate the differences between starting with a pretrained network and one which is randomly initialized.



## Non-exhaustive summary of [A Neural Algorithm of Artistic Style](https://arxiv.org/pdf/1508.06576.pdf)

The authors propose to cast the style transfer problem as an optimization procedure over the pixels of the target image. To do so they define a double loss composed of a content loss and style loss:

- The content loss is mathematically defined as the mean squared error (MSE) between the two feature maps (target and content images) over the layers. 
- The style loss can be computed with the MSE between the gram matrices of the vectorized feature maps (one vector per channel). 

The two losses are then combined and minimized with gradient descent.

In [1]:
from torchvision import models
from torchvision import transforms
from PIL import Image
import argparse
import torch
import torchvision
import torch.nn as nn
import numpy as np
from types import SimpleNamespace

### Experimental Setting

In [2]:
config = SimpleNamespace()
config.content = 'content_image.jpg'
config.style = 'style_image.jpg'
config.max_size = 400
config.total_step = 2000
config.log_step = 1#10
config.sample_step = 10
config.style_weight = 100
config.lr = .003

In [3]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [4]:
device

device(type='cpu')

### Architecture

Define your favourite neural network which will serve as [features extractor](https://pytorch.org/docs/stable/torchvision/models.html).

Redefine your forward pass for computing the features from the network.


In [4]:
class PretrainedNet(nn.Module):
    def __init__(self):
        """Select conv1_1 ~ conv5_1 activation maps."""
        super(PretrainedNet, self).__init__()
        
        self.select = [5, 10, 17, 21, 28] #set of selected feature maps
        self.select = [str(el) for el in self.select]
        self.pretrainedNet = models.vgg16(pretrained=True).features
        
    def forward(self, x):
        """Extract multiple (5 is good) convolutional feature maps."""
        features = []
        
        for name, layer in self.pretrainedNet._modules.items():
            #print(name, layer)
            x = layer(x)
            if name in self.select:
                features.append(x)
        return features

### Image loader

In [5]:
def load_image(image_path, transform=None, max_size=None, shape=None):
    """Load an image and convert it to a torch tensor."""
    image = Image.open(image_path)
    
    if max_size:
        scale = max_size / max(image.size)
        size = np.array(image.size) * scale
        image = image.resize(size.astype(int), Image.ANTIALIAS)
    
    if shape:
        image = image.resize(shape, Image.LANCZOS)
    
    if transform:
        image = transform(image).unsqueeze(0)
    
    return image.to(device)

The pytorchvision pretrained models are trained on ImageNet where images are normalized by `mean=[0.485, 0.456, 0.406]` and `std=[0.229, 0.224, 0.225]`. We use the same normalization statistics here.

In [6]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=(0.485, 0.456, 0.406), 
                                                                            std=(0.229, 0.224, 0.225))])

Load content and style images.
Make the style image same size as the content image.


In [7]:
content = load_image(config.content, transform, max_size=config.max_size)
style = load_image(config.style, transform, shape=[content.size(2), content.size(3)])

### Optimization

Initialize a target image with the content image

In [8]:
net = PretrainedNet().to(device).eval()# Initialize your pretrained neural net and don't forget to put it in evaluation mode.

In [9]:
target = content.clone().requires_grad_(True)

In [10]:
optimizer = torch.optim.Adam([target], lr=config.lr)

The main training loop of the algorithm: we separately deal with the content and style losses (be careful to vectorize your feature maps and follow the formulas presented in the paper).

In [11]:
old_target = target.clone().squeeze()
old_style = style.clone().squeeze()

In [None]:
for step in range(config.total_step):
    # Extract multiple(5) feature maps
        
    target_features = net(target)
    content_features = net(content)
    style_features = net(style)

    style_loss = 0
    content_loss = 0
    
    for f1, f2, f3 in zip(target_features, content_features, style_features):
        # Compute content loss with target and content images
        content_loss += torch.mean((f1 - f2) ** 2)
        
        _, c, h, w = f1.shape 
        f1 = f1.view(c, h * w)
        f3 = f3.view(c, h * w)

        # Compute gram matrix

        target_gram = torch.mm(f1, f1.t())
        style_gram = torch.mm(f3, f3.t())
                
        style_loss += torch.sum((target_gram - style_gram) ** 2) / (4 * (c * h * w) ** 2)
        #style_loss += torch.mean(torch.pow(target_gram - style_gram, 2)) / (4 * (c * h * w) ** 2)
        
    # Compute total loss, backprop and optimize (4 lines of code in total)
    loss = content_loss + config.style_weight * style_loss
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Output
    if (step+1) % config.log_step == 0:
        # Change this print into a log with tensorboardx
        print ('Step [{}/{}], Content Loss: {:.4f}, Style Loss: {:.4f}' 
               .format(step+1, config.total_step, content_loss.item(), style_loss.item()))

    if (step+1) % config.sample_step == 0:
        # Save the generated image (you can also change it to see it with tensorboardx)
        denorm = transforms.Normalize((-2.12, -2.04, -1.80), (4.37, 4.46, 4.44))
        img = target.clone().squeeze()
        img = denorm(img).clamp_(0, 1)
        torchvision.utils.save_image(img, 'output-{}.png'.format(step+1))

Step [1/2000], Content Loss: 0.0000, Style Loss: 6.3722
Step [2/2000], Content Loss: 0.1678, Style Loss: 6.1999
Step [3/2000], Content Loss: 0.5983, Style Loss: 6.0252
Step [4/2000], Content Loss: 1.2125, Style Loss: 5.8433
Step [5/2000], Content Loss: 1.9695, Style Loss: 5.6548
Step [6/2000], Content Loss: 2.8134, Style Loss: 5.4685
Step [7/2000], Content Loss: 3.6911, Style Loss: 5.2897
Step [8/2000], Content Loss: 4.5307, Style Loss: 5.1209
Step [9/2000], Content Loss: 5.2898, Style Loss: 4.9613
Step [10/2000], Content Loss: 5.9721, Style Loss: 4.8099
Step [11/2000], Content Loss: 6.6064, Style Loss: 4.6650
Step [12/2000], Content Loss: 7.2124, Style Loss: 4.5262
Step [13/2000], Content Loss: 7.8019, Style Loss: 4.3932
Step [14/2000], Content Loss: 8.3796, Style Loss: 4.2655
Step [15/2000], Content Loss: 8.9441, Style Loss: 4.1424
Step [16/2000], Content Loss: 9.4919, Style Loss: 4.0236
Step [17/2000], Content Loss: 10.0251, Style Loss: 3.9091
Step [18/2000], Content Loss: 10.5412, 

## Bonus

<div class="alert alert-danger">
<b>EXERCISE</b>:


Play with the different layers used for extracting the features.


</div>

<div class="alert alert-danger">
<b>EXERCISE</b>:
    
Investigate how to balance the preservation of the content with respect to the transferred style.


</div>

<div class="alert alert-danger">
<b>EXERCISE</b>:

Come up with a way that extends the algorithm to the use of multiple styles.

</div>

<div class="alert alert-danger">
<b>EXERCISE</b>:

 Investigate the differences between starting with a pretrained network and one which is randomly initialized.


</div>