# Neural Style Transfer with Keras

---
## Introduction
In this post, we are going to use Keras to implement neural style transfer. In neural style transfer, we take a content image and a style image and generate a new version of the content image based on the textures in the style image. For example, take a look at the following photo. 

![Example of neural style transfer](https://sunshineatnoon.github.io/assets/posts/2017-05-19-a-brief-summary-on-neural-style-transfer/1.png)

Any pretrained convnet can be used to implement neural style transfer, but we will be basing our implementation on the [original paper](https://arxiv.org/pdf/1508.06576.pdf), so we will use the pretrained VGG19 network that is included with Keras. 

There are essentially three steps to the process
1. First, we have to configure our network to compute layer activations for the content image, the style image, and the generated image simultaneously. We can do this by treating the three images as a single mini-batch. 
2. Second, we have to use the outputs of the layers to define a meaningful loss function so that when we minimize it, it creates an image with the content of our content image and the style of our style image. 
3. Finally, we just run gradient-descent and minimize the loss function by continuously updating the pixel values of our generated image. 

---
## Preparing our Data

### Pratical Tip
So far, all the projects have worked fine on a CPU. Neural style transfer, however, runs really, _really_ slow without a GPU. Running this on my MacBook Pro took approximately 525 seconds per iteration, but it only took 22 seconds per iteration using the free GPUs available through [Google Colaboratory](https://colab.research.google.com/). If you're unsure how to set this up, I have a post on it in the Appendix for this project. Once you've read that, be sure to run the code below to access local files in your Drive. 

In [0]:
from google.colab import drive
drive.mount('/content/drive/')

### Loading the Data 
Next, provide the paths to your target image and style image. You can see examples of the images I used in the `inputs` and `style` directories for this notebook.

In [0]:
from keras.preprocessing.image import load_img, img_to_array

# CHANGE: Set this based on your images
content_image_path = '/content/drive/My Drive/Colab Notebooks/cnn06/inputs/IU_portrait.png'
style_image_path = '/content/drive/My Drive/Colab Notebooks/cnn06/style/the-starry-night.jpg'

# Dimensins for the generated picture
width, height = load_img(content_image_path).size
img_height = 400
img_width = int(width * img_height / height)

### Transforming the Data
As always, we have to do some preprocessing before we can feed our data into a neural network. In the case of VGG19, we have to use the `preprocess_input()` function, which you can se in the [Keras Applications Documentation](https://keras.io/applications/#usage-examples-for-image-classification-models). Furthermore, we have to undo some transformations that VGG19 performs, so we will write a `deprocess_image()` helper as well.

In [0]:
import numpy as np
from keras.applications import vgg19 

def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img 

def deprocess_image(image):
    # Reverses a transformation in vgg19.preprocess_input()
    image[:, :, 0] += 103.939
    image[:, :, 1] += 116.779
    image[:, :, 2] += 123.68 
    # Convert from BGR to RGB (also vgg19.preprocess_input())
    image = image[:, :, ::-1]
    return np.clip(image, 0, 255).astype('uint8')

### Loading VGG19 and Applying to Three Images
Finally, we can load the VGG19 network just as we've done before. Previously, we've been configure our networks to work on random minibatchs of training data, but for neural style transfer we are always going to feed in the same three images. The target and style image are never going to change so we load them as constants. We use a placeholder to pass in the constantly changing generated image. In order to feed these all through the network at the same time, we concatenate them into a minibatch. 

As a final note, since our loss function is defined using the outputs of convolutional layers, we don't need to load the densely connected layers on top of the network. 

In [0]:
from keras import backend as K 

# Generated image is a placeholder others are constant 
content_image = K.constant(preprocess_image(content_image_path))
style_image = K.constant(preprocess_image(style_image_path))
generated_image = K.placeholder((1, img_height, img_width, 3))

# Combine the images into a "batch" 
input_tensor = K.concatenate([content_image, style_image, generated_image], axis=0)

# Load the model 
model = vgg19.VGG19(input_tensor=input_tensor, weights='imagenet', include_top=False)

---
## Defining the Loss Function 
The loss function for neural style transfer is a little weird, and it has three separate components: (1) the content loss, (2) the style loss, and (3) the variation loss. The content loss measures how well our image resembles the content image, the style loss measures how well our image resembles the style image, and the variation loss punishes our generated image for poor continuity – that is, it encourages smooth transitions in the pixels. We will assign weights to each of these loss functions then sum them together as our overall loss.

### Content Loss 
Content loss is probably the simplest to understand, we can simply take the distance between the content image and the generated image using the sum of squared differences. 

In [0]:
def content_loss(target, generated):
    return K.sum(K.square(generated - target))

### Style Loss
Style loss is a bit harder to understand. Here we have to take the sum of squared differences between the gram matrix of the style image and the gram matrix of the generated image. The gram matrix essentially measures the correlation between each pair of channels which is intended to tell us what textures should occur together. 

In [None]:
def gram_matrix(x):
    """Captures the 'style' of an image."""
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram 

def style_loss(style, generated):
    S = gram_matrix(style)
    G = gram_matrix(generated)
    channels = 3 
    size = img_height * img_width
    return K.sum(K.square(S - G)) / (4. * (channels ** 2) * (size ** 2))

### Variational Loss
Finally, the variational loss looks at the continuity of pixel values in both the vertical and horizontal directions to ensure that the generated image is not becoming too pixelated.

In [None]:
def total_variation_loss(x):
    a = K.square(x[:, :img_height - 1, :img_width - 1, :] -
                 x[:, 1:, :img_width - 1, :])
    b = K.square(x[:, :img_height - 1, :img_width - 1, :] -
                 x[:, :img_height - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))

### Choosing our Layers
We need to combine the content loss, style loss, and variational loss into a single function, but first we have to decide which layer outputs to use for content and style. 

For content, we want to use a layer that captures a lot of high level features in the image to preserve as much of the original content as possible. For this reason, we are going to use a deep layer for content loss. In this case, we choose `block5_conv2` as our content layer.

For style layers, we are going to combine the style layers from various depth to capture styles at different levels. For instance, capturing the edges, patterns, textures, and perhaps some higher level stylistic features as well. 

In [None]:
# Dictionary: layer name => layer output 
layers_dict = {layer.name: layer.output for layer in model.layers}

# Our layer selections
content_layer = 'block5_conv2'
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']

### The Loss Function
Finally, we combine all of the loss functions together. It is up to use to determine how we want to weight the style, content, and variation losses in our final computation. These weights are hyperparemeters, so feel free to change them. I got pretty good results with the ones below, but I also have fine results with drastically different parameters. Note that the content loss is only between the content image and generated image, the style loss is only between the style image and generated image, and the variational loss only involves the generated image.  

In [0]:
# Weights in the weighted average 
total_variation_weight = 1e-2 
style_weight = 0.5 
content_weight = 0.5

# Overall Loss 
loss = K.variable(0.)

# Loss += Content Loss 
layer_features = layers_dict[content_layer]
content_image_featues = layer_features[0, :, :, :]
generated_image_features = layer_features[2, :, :, :]
loss = loss + content_weight * content_loss(content_image_featues,
                                            generated_image_features)

# Loss += Style Loss 
for layer_name in style_layers: 
    layer_features = layers_dict[layer_name]
    style_image_features = layer_features[1, :, :, :]
    generated_image_features = layer_features[2, :, :, :]
    temp_style_loss = style_loss(style_image_features, 
                                 generated_image_features)
    loss = loss + (style_weight / len(style_layers)) * temp_style_loss
    
# Loss += Variation Loss 
loss = loss + total_variation_weight * total_variation_loss(generated_image)

---
## Gradient-Descent 
As we've done before, we use the Keras Backend to get the gradients of the loss function with respect to the generated image. Then, we define a function that can provide the current loss and gradients for the generated image. 

One tricky thing is the optimization... In the original paper, the L-BFGS optimization algorithm is used. We will use it too, and you can read about it on [Wikipedia](https://en.wikipedia.org/wiki/Limited-memory_BFGS), but fortunately we don't have to implement it ourselves! Unfortunately, it isn't provided in Keras, so we have to use Scipy. Furthermore, the Scipy requires two separate functions to retrieve the loss and gradients. We have a single function `fetch_loss_and_grads()` for this. To avoid duplicated computations, we can use this evaluator class to store the result and only make one function call. Eventually I may change this, but for now let's stick with the original.

In [0]:
grads = K.gradients(loss, generated_image)[0]
fetch_loss_and_grads = K.function([generated_image], [loss, grads])

class Evaluator(object):

    def __init__(self):
        self.loss_value = None
        self.grads_values = None

    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, img_height, img_width, 3))
        outs = fetch_loss_and_grads([x])
        loss_value = outs[0]
        grad_values = outs[1].flatten().astype('float64')
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

evaluator = Evaluator()

---
## Neural Style Transfer
Now, all we have to do is specify the number of desired iterations, provide our initial image, and continuosly perform gradient descent on our loss function. Each iteration actually performs 20 steps of gradient descent (`maxfun=20`). 

In [55]:
import os
from scipy.optimize import fmin_l_bfgs_b
import imageio

# Provide your output path and intended iterations 
base_path = '/content/drive/My Drive/Colab Notebooks/cnn06/outputs/'
iterations = 30

# Our generated image is initially the content image
x = preprocess_image(content_image_path)
x = x.flatten()

# Save the first image 
original_path = os.path.join(base_path, '0.png')
original_img = deprocess_image(x.reshape((img_height, img_width, 3)))
imageio.imwrite(original_path, original_img)

# Perform the intended number of iterations
for i in range(1, iterations + 1):
    print('Start of iteration', i)
    x, min_val, _ = fmin_l_bfgs_b(evaluator.loss, x, fprime=evaluator.grads, maxfun=20)
    print('Current loss value:', min_val)
    img = x.copy().reshape((img_height, img_width, 3))
    img = deprocess_image(img)
    fname = os.join.path(base_path, str(i) + '.png')
    imageio.imwrite(fname, img)

# Save the final output
final_path = os.join.path(base_path, 'final.png')
imageio.imwrite(final_path, img)

Start of iteration 1
Current loss value: 1192790300.0
Start of iteration 2
Current loss value: 576547700.0
Start of iteration 3
Current loss value: 413295300.0
Start of iteration 4
Current loss value: 337355140.0
Start of iteration 5
Current loss value: 294651900.0
Start of iteration 6
Current loss value: 264794980.0
Start of iteration 7
Current loss value: 238312660.0
Start of iteration 8
Current loss value: 222517490.0
Start of iteration 9
Current loss value: 208912460.0
Start of iteration 10
Current loss value: 193770380.0
Start of iteration 11
Current loss value: 184535520.0
Start of iteration 12
Current loss value: 178183970.0
Start of iteration 13
Current loss value: 171971380.0
Start of iteration 14
Current loss value: 166173570.0
Start of iteration 15
Current loss value: 162028960.0
Start of iteration 16
Current loss value: 158924140.0
Start of iteration 17
Current loss value: 155823730.0
Start of iteration 18
Current loss value: 153069360.0
Start of iteration 19
Current loss v

---
## Creating GIFs from Outputs 
Before showing you some of my results, here is a helper function you can use to transform a directory of numerically ordered photos (1.png, 2.png, 3.jpg, etc.) into a GIF. This allows us to see the photo transforming over time, which is pretty cool.

In [41]:
import os 
import re 
import imageio

# only keep files that followed the `number.png` convention
file_format = re.compile(r'(\d+).png')

# Sort the images numerically (e.g. 1.png, 2.png, ...)
def sort_key(filename):
    """Return the numeric portion of filename."""
    return int(filename.split('.')[0])

# Transform the files in the directory into a gif 
def images_to_gif(source_dir, name='final.gif'):
    """Turn a directory of numerically labeled photos into gifs."""
    files = sorted([fname for fname in os.listdir(source_dir) 
                    if file_format.match(fname)], key=sort_key)
    images = [imageio.imread(os.path.join(source_dir, fname)) 
              for fname in files]
    imageio.mimsave(os.path.join(source_dir, name), images, 
                    'GIF', duration=0.05)

I used this to generate an image for all of my photos.

In [46]:
source = '/content/drive/My Drive/Colab Notebooks/cnn06/outputs/img7/'
images_to_gif(source)

---
## Results

### Example 1
```python 
# Images
content_image = './inputs/germany.jpg'
style_image = './style/the_shipwreck_of_the_minotaur.jpg'

# Parameters
total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025
```

![Neckarfront in Tübingeng, Germany](./outputs/img1/final.gif)

### Example 2
```python 
# Images
content_image = './inputs/IU_portrait.png'
style_image = './style/femme_nue_assise_pablo_picasso.jpg'

# Parameters
total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025
```

![IU + Femme Nue Assise](./outputs/img2/final.gif)

### Example 3
```python 
# Images
content_image = './inputs/IU_portrait.png'
style_image = './style/composition_vii.jpg'

# Parameters
total_variation_weight = 1e-2
style_weight = 0.5
content_weight = 0.5
```

![IU + composition vii](./outputs/img3/final.gif)

### Example 4
```python 
# Images
content_image = './inputs/IU_portrait.png'
style_image = './style/the-starry-night.jpg'

# Parameters
total_variation_weight = 1e-2
style_weight = 0.5
content_weight = 0.5
```

![IU + Starry Night](./outputs/img4/final.gif)

---
## Summary 
In my opinion, this is a lot cooler than DeepDream because we have way more influence over what the final image look likes. I actually think it would be better to actual slow the learning down so that we can create GIFs with much smoother transitions. If you look a the outputs,by just the third iteration the generated image is usually a lot closer to the final output than the content image. 

### Original Neural Style Transfer Paper
You can find the original neural style transfer paper [here](https://arxiv.org/pdf/1508.06576.pdf)

### Content Images 
The first content image is of the Neckarfront in Tübingen, Germany. This is one of the images used in the original paper. The second content image is a portrait of IU – a Korean singer-songwriter and actress. If you're interested... [IU - 삐삐](https://youtu.be/nM0xDI5R50E).

### Style Images 
All of the style images were used in the original NST paper as well. 
+ _The Shipwreck of the Minotaur_ by JMW Turner (1805)
+ _Femme nue assise_ by Pablo Picasso (1910)
+ _Composition VII_ by Wassily Kandinsky (1913)
+ _The Starry Night_ by Vincent van Gogh (1889)