In [0]:
%matplotlib inline

## Style Transfer - with PyTorch and Google Colab

Created for TMCS software development course, 2019

To perform a style transfer is to take any arbitrary image and process it such that it takes on the stylistic features of another image fed in to the algorithm. The procedure for doing this is described in *A Neural Algorithm of Artistic Style*, paper of L.A. Gatys, A.S. Ecker, and M. Bethge - linked
[here](https://arxiv.org/pdf/1508.06576.pdf) .

![what style transfer looks like](https://cdn-images-1.medium.com/max/2000/1*uIlgYKjp-1ZboXK8ff6ztg.jpeg)
My reason for choosing to use a Google Colab notebook for this project is that you will be performing operations on large tensor objects. Such tasks are highly parallelisable and Google provides GPUs that you all can use for free via a browser, without having to deal with CUDA, which should hopefully speed up the development process since you won't have to wait long for the network to train.

This notebook will provide a framework that you will use to write your own modules that, once they pass the tests given below, will come together on Friday  - BYOP (bring your own pictures).

The algorithm itself is simple - it takes a style image, a content image, and an input (seed) image and uses a pre-trained VGG neural net to output a style transfered image. We define two distances - content and style - that measure how far the input image is from the other two, then use gradient descent to minimise both.

# Packages, GPUs and TPUs

The packages that we will be using to build this program are:

*   ``pytorch`` - necessary for working with neural networks ([documentation](https://pytorch.org/docs/stable/index.html))
*   ``pillow`` - THE python imaging library ([documentation](https://pillow.readthedocs.io/en/stable/))
* ``numpy`` and ``matplotlib.pyplot`` - standard, and necessary for pythonic number crunching and plotting

Image processing takes a lot less time to run on a GPU, and when you work in Colab you can use them for free - go to *Runtime -> Change Runtime Type* and set Hardware accelerator to GPU/TPU.

#Load and prepare the input images

This function ought to import the style and content images. It must ensure that both images are the same size and return a torch tensor that can be used for subsequent operations. Note that you will need to insert a fake batch dimension into the output tensor, which is required to fit the VGG neural network's input dimensions - that is (batch_size, num_channels, height, width).

Then, build a function that displays the images, to ensure they've loaded correctly.

In [0]:
def image_loader():
  # Write your function here!
  

In [0]:
# Testing the function

assert type(style_img) == torch.Tensor, "The data type of the output is incorrect"
assert style_img.size() == content_img.size(), "The two input images are not the same size"
assert len(style_img.size()) == 4, "Wrong tensor dimension for NN"

#Build the loss functions
**Content Loss**
is a function that represents the distance of the input image $I$ from the content image $C$ for an individual layer of the NN. We pass both images to the NN to get the intermediate feature representations at layer $l$. The loss function $L_{content}(I, S)$ should return the distance $|F_{Il} - F_{Cl}|^2$ which is the mean square error between the two sets of feature maps.


**Style Loss** is a little more involved to calculate, as can be seen in the paper. It involves the calculation of the gram matrix for that layer, for the style $S$ and input $I$ images. The gram matrix is the inner product between the vectorised feature map $i$ and $j$ in layer $l$: it represents the correlation between those feature maps.

The contribution of each layer to the total style loss is described by
$$E_l = \frac{1}{4N_l^2M_l^2} \sum_{i,j}|G^{Sl}_{ij} - G^{Il}_{ij}|^2$$

where $G^{Il}_{ij}$ and $G^{Sl}_{ij}$ are the respective gram representations in layer $l$ of $I$ and $S$. $N_l$ describes the number of feature maps, each of size $M_l = height * width$. Thus, the total style loss across each layer is 
$$L_{style}(I, S) = \sum w_l E_l$$
where we weight the contribution of each layer's loss by some factor $w_l$.

#Import and normalise the neural network

Here, we import a pre-trained 19-layer VGG neural network, like that used in the paper. The VGG19 was developed for an image classification competition, scoring 7.3% error (that's very good). It contains 5 stacks of convolutional layers, with 2-4 layers in each  - they're named conv1_2 to conv5_4. A good explanation of how convolutional neural nets work can be found [here](https://www.youtube.com/watch?v=YRhxdVk_sIs).

![structure of VGG-19](https://cdn-images-1.medium.com/max/1600/1*cufAO77aeSWdShs3ba5ndg.jpeg)

PyTorch’s implementation of VGG is a module divided into two modules: features (containing convolution and pooling layers - maxpool and conv), and classifier (containing fully connected layers - FC1, FC2). We will use the features module because we need the output of the individual convolution layers to measure content and style loss. Some layers have different behavior during training than evaluation, so we must set the network to evaluation mode using .eval().

Note that VGG networks are trained on images with each channel normalized by mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]. You will need to normalize the image before sending it into the network.

#Gradient descent function

We want to train the input image in order to minimise the content/style losses. It's a good idea to use the existing L-BFGS algorithm in ``torch`` to run the gradient descent. Create a L-BFGS optimizer function that  accepts the image as a tensor, and returns an optimizer object.

In [0]:
# testing the optimizer object

assert type(optimizer) == torch.optim.lbfgs.LBFGS, "Look at the lbfgs docs in torch"

Once all the above functions exist and work together, you can chain them into a single function that runs the style transfer!

In [0]:
# Build your run_style_transfer() function here!