# 8.3 Neural style transfer
In addition to DeepDream, another major development in deep-learning-driven image modification is neural style transfer. The neural style transfer algorithm has undergone many refinements and spawned many variations since its original introduction in the summer of 2015, and it has made its way into many smartphone photo apps.

Neural style transfer consists of applying the style of a reference image to a target image while conserving the content of the target image. Here is an example:

![style_transfer](images/8_3_0_styletransfer.png)

*Style* essentially means textures, colors, and visual patterns in the image, at various spatial scales, and the content is a higher-level macrostructure of the image. For example, blue-and-yellow circular brushstrokes are considered to be the style used by Vincent Van Gogh in the painting above, and the buildings in the Tubingen photograph are considered to be the content.

As it turns out, the deep-learning-based implementations of style transfer offer results unparalleled by what had been previously achieved with classical computer-vision techniques, and they triggered an amazing renaissance in creative applications of computer vision.

The key notion behind implementing style transfer is the same idea that’s central to all deep-learning algorithms: define a loss function to specify what we want to achieve, and then we minimize this loss. We know what we want to achieve: conserving the content of the original image while adopting the style of the reference image. If we were able to mathematically define content and style, then an appropriate loss function to minimize would be the following:

`loss = distance(style(reference_image) - style(generated_image)) +
       distance(content(original_image) - content(generated_image))`
       
Here, `distance` is a norm function such that the L2 norm, `content` is a function that takes an image and computes a representation of its content, and `style` is a function that takes an image and computes a representation of its style. Minimizing this loss causes `style(generated_image)` to be close to `style(reference_image)`, and `content(generated_image)` is close to `content(generated_image)`, thus achieving style transfer as we defined it.

A fundamental observation made was that deep convolutional neural networks offer a way to mathematically define the `style` and `content` functions. Let's see how:

## 8.3.1 The content loss
Activations from earlier layers in a network contain local information about the image, whereas activations from higher layers contain increasingly global, abstract information. We would expect the content of an image, which is more global and abstract, to be captured by the representations of the upper layers in a convnet.

A good candidate for content loss is the L2 norm between the activations of an upper layer in a pretrained convnet, computed over the target image, and the activations of the same layer computed over the generated image. This guarantees that the generated image will look similar to the original target image.

## 8.3.2 The style loss
Whereas the content loss only uses a single upper layer, the style loss uses multiple layers of a convnet. The style loss aims to preserve similar internal correlations within the activations of different layers, across the style-reference image and the generated image. This guarantees that the textures found at different spatial scales look similar across the style-reference image and the generated image. We can use a pretrained convnet to define a loss that will do the following:

 - Preserve content by maintaining similar high-level layer activations between the target content image and the generated image. The convnet should “see” both the target image and the generated image as containing the same things.
 - Preserve style by maintaining similar correlations within activations for both low-level layers and high-level layers.

Now, let’s look at a Keras implementation of the original 2015 neural style transfer algorithm. It shares many similarities with the DeepDream implementation.

## 8.3.3 Neural style transfer in Keras
Neural style transfer can be implemented using any pretrained convnet. We’ll use the VGG19 network which is a simple variant of the VGG16 network, with three more convolutional layers. This is the general process:

 1. Set up a network that computes VGG19 layer activations for the style-reference image, the target image, and the generated image at the same time.
 2. Use the layer activations computed over these three images to define the loss function described earlier, which we’ll minimize in order to achieve style transfer.
 3. Set up a gradient-descent process to minimize this loss function.
 
We will start by defining the paths to the style-reference image and the target image. To make sure that the processed images are a similar size (different sizes make style transfer more difficult), we will resize them all to a shared height of 400 px.

For the target image, I used a picture of Abraham Lincoln (below).

![abe](images/abe.jpg)

And the style-reference image is of characters from the video game Fortnite (below). Our goal is to generate a new image that looks like Abe Lincoln, but in the style of Fortnite. Let's get to coding and see what we are able to create!

![fortnite](images/fortnite.jpg)

**DEFINING INITIAL VARIABLES**

In [1]:
from keras.preprocessing.image import load_img, img_to_array

target_image_path = 'images/abe.jpg' # path to image we are going to transform
style_reference_image_path = 'images/fortnite.jpg' # path to image whose style we want to mimic

# Set dimensions of the generated picture
width, height = load_img(target_image_path).size
img_height = 400
img_width = int(width * img_height / height)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


We will need some helper functions for loading, preprocessing, and postprocessing the images that go in and out of the VGG19 convnet.

**HELPER FUNCTIONS**

In [2]:
import numpy as np
from keras.applications import vgg19

def preprocess_image(image_path):
    img = load_img(image_path, target_size=(img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

def postprocess_image(x):
    # zero-centering by removing mean pxl value from ImageNet. This reverses an unwanted vgg19 transformation.
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68
    
    # Convert image from BGR to RGB
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x

Now we will set up the VGG19 network. It takes a batch of three images as input: the style-reference image, the target image, and a placeholder that will contain the generated image. The placeholder is a symbolic tensor, with values that are provided externally via Numpy arrays. The style-reference and target images are static whereas the values contained in the placeholder of the generated image will change over time.

**LOADING THE PRETRAINED VGG19 NETWORK & APPLYING IT TO 3 IMAGES**

In [3]:
from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))
style_reference_image = K.constant(preprocess_image(style_reference_image_path))
combination_image = K.placeholder((1, img_height, img_width, 3)) # Placeholder that will contain generated image

print 'Combining input images into single batch.'
# Combine the 3 images into a single batch
input_tensor = K.concatenate([target_image, style_reference_image, combination_image], axis=0)

print 'Loading model.'
# Build the VGG19 network with batch of 3 images. Model is loaded with pretrained ImageNet weights
model = vgg19.VGG19(input_tensor=input_tensor, weights='imagenet', include_top=False)

print 'Model loaded.'

Combining input images into single batch.
Loading model.
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5


Now we will define the content loss, which will ensure the top layer of the VGG19 convnet has a similar view of the target and generated images.

**CONTENT LOSS**

In [4]:
def content_loss(base, combination):
    return K.sum(K.square(combination - base))

Next, we will define the style loss. It uses an auxiliary function to compute the Gram matrix of an input matrix. In our case, the input matrix is a map of the correlations found in the original feature matrix.

**STYLE LOSS**

In [5]:
def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = img_height * img_width
    return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))

In addition to content loss and style loss, we will add the total variation loss, which operates on the pixels of the generated combination image. This helps prevent the output image from being overly pixelated. We can interpret it as a regularization loss.

**TOTAL VARIATION LOSS**

In [6]:
def total_variation_loss(x):
    a = K.square(
        x[:, :img_height - 1, :img_width - 1, :] -
        x[:, 1:, :img_width - 1, :])
    b = K.square(
        x[:, :img_height - 1, :img_width - 1, :] -
        x[:, :img_height - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))
    

The loss that we minimize is a weighted average of the content, style, and total variation losses. To compute the content loss, only use one upper layer - the block5_conv2 layer. For the style loss, we use a list of layers that spans both low-level and high-level layers. We then add the total variation loss at the end.

Depending on the style-reference image and content image we use, we’ll probably want to adjust the `content_weight` coefficient. A higher `content_weight` means the target content will be more recognizable in the generated image.

**DEFINING FINAL LOSS THAT WE WILL MINIMIZE**

In [7]:
# Dictionary that maps layer names to activation tensors
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

# Content loss layer
content_layer = 'block5_conv2'

# Style loss layers
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']

# Weights in the weighted average of the loss components
total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025 # A higher content_weight means the generated image will look more like our target img (Abe)

# Add the content loss
loss = K.variable(0.) # Define loss by adding all components to this scalar variable
layer_features = outputs_dict[content_layer]
target_image_features = layer_features[0, :, :, :]
combination_features = layer_features[2, :, :, :]
loss += content_weight * content_loss(target_image_features, combination_features)

# Add a style loss component for each target layer
for layer_name in style_layers:
    layer_features = outputs_dict[layer_name]
    style_reference_features = layer_features[1, :, :, :]
    combination_features = layer_features[2, :, :, :]
    sl = style_loss(style_reference_features, combination_features)
    loss += (style_weight / len(style_layers)) * sl
    
# Add the total variation loss
loss += total_variation_weight * total_variation_loss(combination_image)

Finally, we’ll set up the gradient-descent process. In the original Gatys paper that introduced neural style transfer, optimization is performed using the L-BFGS algorithm, so that’s what we’ll use here. This is a key difference from the DeepDream example. The L-BFGS algorithm comes packaged with SciPy, but there are two slight limitations with the SciPy implementation:

 - It requires that you pass the value of the loss function and the value of the gradients as two separate functions.
 - It can only be applied to flat vectors, whereas we have a 3D image array.

It would be inefficient to compute the value of the loss function and the value of the gradients independently, because doing so would lead to a lot of redundant computation between the two; the process would be almost twice as slow as computing them at the same time. To bypass this, we’ll set up a Python class called `Evaluator` that computes both the loss value and the gradients value at once, then returns the loss value when called the first time, and stores the gradients to be used on the next call.

**SETTING UP GRADIENT-DESCENT PROCESS**

In [8]:
# Get the gradients of the generated image with regard to the loss
grads = K.gradients(loss, combination_image)[0]

# Function to fetch values of current loss & current gradients
fetch_loss_and_grads = K.function([combination_image], [loss, grads])

# Evaluator class wraps `fetch_loss_and_grads` & lets us retrieve losses & gradients w/ 2 separate method calls...
# which is required by SciPy optimizer we will use
class Evaluator(object):

    def __init__(self):
        self.loss_value = None
        self.grads_values = None

    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, img_height, img_width, 3))
        outs = fetch_loss_and_grads([x])
        loss_value = outs[0]
        grad_values = outs[1].flatten().astype('float64')
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

evaluator = Evaluator()

Finally, we can run the gradient-ascent process using SciPy’s L-BFGS algorithm. We will save the current generated image at each iteration of the algorithm (here, a single iteration represents 20 steps of gradient ascent).

**STYLE-TRANSFER LOOP**

In [9]:
from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave
import time

In [10]:
result_prefix = 'abe_fortnite'
iterations = 20

# Initial state: the target image
x = preprocess_image(target_image_path)

# Flatten image so it can be processed by SciPy
x = x.flatten()

# Run L-BFGS over pixels of generated img to minimize neural style loss
# Note: have to pass function that computes loss & function that computes gradients separately
for i in range(iterations):
    print('Start of iteration', i)
    start_time = time.time()
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x, fprime=evaluator.grads, maxfun=20)
    print('Current loss value:', min_val)
    
    # Save the current generated image
    img = x.copy().reshape((img_height, img_width, 3))
    img = postprocess_image(img)
    fname = result_prefix + '_at_iteration_%d.png' % i
    imsave(fname, img)
    print('Image saved as', fname)
    end_time = time.time()
    print('Iteration %d completed in %ds' % (i, end_time - start_time))

('Start of iteration', 0)
('Current loss value:', 4539376600.0)
('Image saved as', 'abe_fortnite_at_iteration_0.png')
Iteration 0 completed in 639s
('Start of iteration', 1)


`imsave` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imwrite`` instead.


('Current loss value:', 1744123100.0)
('Image saved as', 'abe_fortnite_at_iteration_1.png')
Iteration 1 completed in 587s
('Start of iteration', 2)
('Current loss value:', 1086659500.0)
('Image saved as', 'abe_fortnite_at_iteration_2.png')
Iteration 2 completed in 604s
('Start of iteration', 3)
('Current loss value:', 813086600.0)
('Image saved as', 'abe_fortnite_at_iteration_3.png')
Iteration 3 completed in 599s
('Start of iteration', 4)
('Current loss value:', 670284540.0)
('Image saved as', 'abe_fortnite_at_iteration_4.png')
Iteration 4 completed in 618s
('Start of iteration', 5)
('Current loss value:', 577944450.0)
('Image saved as', 'abe_fortnite_at_iteration_5.png')
Iteration 5 completed in 768s
('Start of iteration', 6)
('Current loss value:', 510998180.0)
('Image saved as', 'abe_fortnite_at_iteration_6.png')
Iteration 6 completed in 844s
('Start of iteration', 7)
('Current loss value:', 459679650.0)
('Image saved as', 'abe_fortnite_at_iteration_7.png')
Iteration 7 completed in 

And now let's see some of the results! After 20 iterations, here is our Fortnite Abe creation!

![fortniteabe](abe_fortnite_at_iteration_19.png)

Some additional examples are shown below. It's important to keep in mind that this technique merely achieves a form of image retexturing, or texture transfer. It works best with style-reference images that are strongly textured and highly self-similar, and with content targets that don’t require high levels of detail in order to be recognizable. It typically can’t achieve fairly abstract feats such as transferring the style of one portrait to another. The algorithm is closer to classical signal processing than to AI, so don’t expect it to work like magic!

![neural_transfer](images/8_3_3_neural_transfer.png)

Additionally, note that running this style-transfer algorithm is slow. But the transformation operated by the setup is simple enough that it can be learned by a small, fast feedforward convnet. Fast style transfer can be achieved by initially spending several compute cycles to generate input-output training examples for a fixed style-reference image, using the method outlined here, and then training a simple convnet to learn this style-specific transformation. Once that’s done, stylizing a given image is instantaneous: it’s just a forward pass of this small convnet.

## 8.3.4. Wrapping up
 - Style transfer consists of creating a new image that preserves the contents of a target image while also capturing the style of a reference image.
 - Content can be captured by the high-level activations of a convnet.
 - Style can be captured by the internal correlations of the activations of different layers of a convnet.
 - Hence, deep learning allows style transfer to be formulated as an optimization process using a loss defined with a pretrained convnet.
 - Starting from this basic idea, many variants and refinements are possible.