# Neural Style Transfer

Neural style transfer consists of applying the style of a reference image to a target image while conserving the content of the target image.

![capture](https://user-images.githubusercontent.com/13174586/51967686-fde6d000-2495-11e9-878c-f675bacb01c1.JPG)

In this context, style essentially means textures, colors, and visual patterns in the image, at various spatial scales; and the content is the higher-level macrostructure of the image. For instance, blue-and-yellow circular brushstrokes are considered to be the style in using *Starry Night by Vincent Van Gogh*, and the buildings in the *Tübingen* photograph are considered to be the content.

The idea of style transfer, which is tightly related to that of texture generation, has had a long history in the image processing community prior to the development of neural style transfer in 2015. But as it turns out, the deep-learning-based implementations of style transfer offer results unparalleled by what had been previously achieved with classical computer-vision techniques, and they triggered an amazing renaissance in creative applications of computer vision.

The key notion behind implementing style transfer is the same idea that’s central to all deep-learning algorithms: we define a loss function to specify what we want to achieve, and we minimize this loss. We know what we want to achieve: conserving
the content of the original image while adopting the style of the reference image. If we were able to mathematically define content and style, then an appropriate loss function to minimize would be the following:

`loss = distance(style(reference_image) - style(generated_image)) +
distance(content(original_image) - content(generated_image))`

Here, `distance` is a norm function such as the `L2` norm, content is a function that takes an image and computes a representation of its content, and style is a function that takes an image and computes a representation of its style. Minimizing this loss causes `style(generated_image)` to be close to `style(reference_image)`, and `content(generated_image)` is close to `content(generated_image)`, thus achieving style transfer as we defined it.

A fundamental observation made by Gatys et al. was that deep convolutional neural networks offer a way to mathematically define the `style` and `content` functions.

Let’s see how.

### The Content Loss

As we already know, activations from earlier layers in a network contain local information about the image, whereas activations from higher layers contain increasingly global, abstract information. Formulated in a different way, the activations of the different layers of a convnet provide a decomposition of the contents of an image over different spatial scales. Therefore, we’d expect the content of an image, which is more global and abstract, to be captured by the representations of the upper layers in a convnet.

A good candidate for content loss is thus the L2 norm between the activations of an upper layer in a pretrained convnet, computed over the target image, and the activations of the same layer computed over the generated image. This guarantees that, as seen from the upper layer, the generated image will look similar to the original target image. Assuming that what the upper layers of a convnet see is really the content of their input images, then this works as a way to preserve image content.

### The Style Loss

The content loss only uses a single upper layer, but the style loss as defined by Gatys et al. uses multiple layers of a convnet: you try to capture the appearance of the stylereference image at all spatial scales extracted by the convnet, not just a single scale. For the style loss, Gatys et al. use the Gram matrix of a layer’s activations: the inner product of the feature maps of a given layer. This inner product can be understood as representing a map of the correlations between the layer’s features. These feature correlations capture the statistics of the patterns of a particular spatial scale, which empirically
correspond to the appearance of the textures found at this scale.

Hence, the style loss aims to preserve similar internal correlations within the activations of different layers, across the style-reference image and the generated image. In turn, this guarantees that the textures found at different spatial scales look similar across the style-reference image and the generated image.

In short, we can use a pretrained convnet to define a loss that will do the following:
 - Preserve content by maintaining similar high-level layer activations between the target content image and the generated image. The convnet should “see” both the target image and the generated image as containing the same things.
 - Preserve style by maintaining similar correlations within activations for both low level layers and high-level layers. Feature correlations capture textures : the generated image and the style-reference image should share the same textures at different spatial scales.

Now, let’s look at a Keras implementation of the original 2015 neural style transfer algorithm.

### Neural Style Transfer in Keras

Neural style transfer can be implemented using any pretrained convnet. Here, we’ll use the VGG19 network used by Gatys et al. VGG19 is a simple variant of the VGG16 network introduced in chapter 5, with three more convolutional layers.

This is the general process:
 - Set up a network that computes VGG19 layer activations for the style-reference image, the target image, and the generated image at the same time.
 - Use the layer activations computed over these three images to define the loss function described earlier, which we’ll minimize in order to achieve style transfer.
 - Set up a gradient-descent process to minimize this loss function.

Let’s start by defining the paths to the style-reference image and the target image. To make sure that the processed images are a similar size (widely different sizes make style transfer more difficult), we’ll later resize them all to a shared height of 400 px.

### Define Initial Variables

In [1]:
from keras.preprocessing.image import load_img, img_to_array

target_image_path= 'Style_Transfer\\content.jpg' #Path to the image we want to transform
style_reference_image_path= 'Style_Transfer\\Van_Gogh_Starry_Night_Google_Art_Project.jpg' #Path to the style image

width, height= load_img(target_image_path).size     #Dimensions of the
img_height= 400
img_width= int(width*img_height/height)             #generated picture

Using TensorFlow backend.


We need some auxiliary functions for loading, preprocessing, and postprocessing the images that go in and out of the VGG19 convnet.

### Auxiliary Functions

In [2]:
import numpy as np
from keras.applications import vgg19

def preprocess_image(image_path):
    img= load_img(image_path, target_size=(img_height, img_width))
    img= img_to_array(img)
    img= np.expand_dims(img, axis=0)
    img= vgg19.preprocess_input(img)
    return img

def deprocess_image(x):
    x[:,:,0] += 103.939                 #Zero-centering by removing the mean pixel value
    x[:,:,1] += 116.779                 #from ImageNet. This reverses a transformation
    x[:,:,2] += 123.68                  #done by vgg19.preprocess_input
    x= x[:,:,::-1]           #Converts images from 'BGR' to 'RGB'. This is also part of the reversal of vgg19.preprocess_input.
    x= np.clip(x, 0, 255).astype('uint8')
    return x

Let’s set up the VGG19 network. It takes as input a batch of three images: the style reference image, the target image, and a placeholder that will contain the generated image. A placeholder is a symbolic tensor, the values of which are provided externally via Numpy arrays. The style-reference and target image are static and thus defined using `K.constant`, whereas the values contained in the placeholder of the generated image will change over time.

### Load The Pretrained VGG19 Network and Apply it to The Three Images

In [3]:
from keras import backend as K

target_image= K.constant(preprocess_image(target_image_path))
style_reference_image= K.constant(preprocess_image(style_reference_image_path))
combination_image= K.placeholder((1,img_height, img_width,3))   #Placeholder that will contain the generated image

In [4]:
input_tensor= K.concatenate([target_image, style_reference_image, combination_image], axis=0) #Combines the three
                                                                                               #images in a single batch

In [5]:
print('Loading Model...')
model= vgg19.VGG19(input_tensor= input_tensor,                      #Builds the VGG19 network with
                   weights='imagenet',                              #the batch of three images as input. The model will be
                   include_top= False)                              #loaded with pretrained ImageNet weights.#

model.summary()
print('...Model Loaded')

Loading Model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0   

Let’s define the content loss, which will make sure the top layer of the VGG19 convnet has a similar view of the target image and the generated image.

### Content Loss

In [8]:
def content_loss(base, combination):
    return K.sum(K.square(combination-base))

Next is the style loss. It uses an auxiliary function to compute the Gram matrix of an input matrix: a map of the correlations found in the original feature matrix.

### Style Loss

In [9]:
def gramn_matrix(x):
    features= K.batch_flatten(K.permute_dimension(x,(2,0,1)))
    gram= K.dot(features, K.transpose(features))
    return gram

In [11]:
def style_loss(style, combination):
    s= gram_matrix(style)
    c= gram_matrix(combination)
    channels= 3
    size=img_height*img_width
    return K.sum(K.square(S-C))/(4.0*(channels**2)*(size**2))

To these two loss components, we add a third: the total variation loss, which operates on the pixels of the generated combination image. It encourages spatial continuity in the generated image, thus avoiding overly pixelated results. You can interpret it as a regularization loss.

### Total Variation Loss