------

### Neural Style Transfer 

-----

- Neural Style transfer consists of applying the style of a reference image to a target image while conserving the content of the target image.
- Style means, textures, colors and visual patterns in the image at various scales; and the content is the higher level macro-structure of the image.
- Example, in the starry night by Van Gogh the style refers to the blue and yellow circular brushstrokes. 
- The root of the algorithm is the same as all the deep learning algorithms, i.e. defining a meaningful loss function and then optimizing it. 
- If were able to mathematically define "content" and "style" then the loss function would look like following

> loss = distance(style(reference_image) - style(generated_image)) + distance(content(original_image) - content(generated_image))

So, how do we define the content and style mathematically?


***The content loss***

- Activations from earlier layers in the network contain local information about the image, whereas activations from higher layers contain increasingly global abstract information. 
- In other words, the activations of different layers of a convnet provide a decomposition of the contents of an image over different spatial scales.
- **So a good candidate for content loss is thus the L2 Norm between the activations of an upper layer in a pretrained convnet, computed over target image, and the activations of the same layer computed over generated image.**
- The content loss only uses a single upper layer. 

***The style loss***

- The style loss uses multiple layers of a convnet. 
- We try to capture the appearence of the style reference image at all spatial scales extracted by the convnet. 
- For style loss, **Gram Matrix** of a layer's activations: the inner product of the feature maps of given layer. 
- Gram Matrix equation is very similar to Co-relation equation, the major differnce being that the means are not subtracted while calculating Gram matrix. 
- Inner product can be understood as representing a map of the correlations between layer's features.
- These feature correlations capture the statistics of the patterns of a particular spatial scale, which emperically correspond to textures found at this scale. 
- **The style loss aims to preserve similar internal correlations within the activations of different layers, across the style reference image and the generated image. In turn this guarantees that the textures found at different spatial scales look similar across the style-reference image and generated image.**

-----

***Keras Implementation of Style-Transfer***

-----

We will use a pre-trained network to get started with the pretrained convnet. As the original paper used VGG19, we will also use the same. 

- Set up the network that computes VGG19 layer activations for the style reference image, the target image, and the generated image at the same time. 
- Use the layer activations computed over three images to define the loss function, which we will minimize. 
- Setup gradient descent process to minimize the loss function. 

--------

-------

***Defining initial Variable***

-------

In [1]:
from keras.preprocessing.image import load_img, img_to_array

target_image_path = 'style_transfer/image_scene.jpg'
style_reference_image_path = 'style_transfer/starry_night.jpg'

### Defining dimensions for the generated image
width, height = load_img(target_image_path).size 
img_height = 400
img_width = int(width*img_height/height) 

Using TensorFlow backend.


-----
***Auxillary Functions***

-----

In [2]:
import numpy as np 
from keras.applications import vgg19

def preprocess_image(image_path):
    img = load_img(image_path, target_size = (img_height, img_width))
    img = img_to_array(img)
    img = np.expand_dims(img, axis = 0)
    img = vgg19.preprocess_input(img)
    return img

def deprocess_image(x):
    ### Zero cenetering by removing the mean pixel value from imagenet.
    ### Reverses the transformation done by vgg19
    x[:,:,0] += 103.939
    x[:,:,1] += 116.779
    x[:,:,2] += 123.68
    ### Converting BGR back to RGB
    ### Also a preprocess step in vgg19 which is being reversed
    x = x[:,:,::-1]
    x = np.clip(x,0,255).astype('uint8')
    return x

Now let's setup the VGG19 network. 

- It takes as input set of three images, style reference image, target image, and placeholder for generated image. 
- The first two are static and thus will be defined using K.constant, whereas the third tensor's value will keep on changing so it is added in a placeholder. 

In [3]:
from keras import backend as K

target_image = K.constant(preprocess_image(target_image_path))
style_reference_image = K.constant(preprocess_image(style_reference_image_path))
combination_image = K.placeholder((1, img_height, img_width, 3))

input_tensor = K.concatenate([target_image, style_reference_image,combination_image], axis=0)

model = vgg19.VGG19(input_tensor = input_tensor, 
                   weights = 'imagenet', 
                   include_top = False)

In [4]:
### Defining content loss

def content_loss(base, combination):
    return K.sum(K.square(combination - base))

In [5]:
### Defining style loss

def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x,(2,0,1)))
    gram = K.dot(features, K.transpose(features))
    return gram 

def style_loss(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3 
    size = img_height * img_width
    return K.sum(K.square(S-C)) / (4. * (channels**2) * (size ** 2) )

------
***Total Variation loss*** - This is the third type of loss which operates on pixels of generated combination image. It encourages spatial ocntinuity, thus avoiding overly pixelated results. 

-----

In [6]:
def total_variation_loss(x):
    a = K.square(
        x[:,:img_height-1, :img_width-1,:]-
        x[:,1:,:img_width-1,:]
    )
    b = K.square(
        x[:,:img_height-1, :img_width-1,:]-
        x[:,:img_height-1, 1:,:]
    )
    return K.sum(K.pow(a+b, 1.25))

-----

***The loss we minimize is the weighted average of these three losses***

------

In [7]:
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

In [8]:
content_layer = 'block5_conv2'

In [9]:
style_layers = ['block1_conv1',
                'block2_conv1',
                'block3_conv1',
                'block4_conv1',
                'block5_conv1']
total_variation_weight = 1e-4
style_weight = 1.
content_weight = 0.025


In [10]:
loss = K.variable(0.)
layer_features = outputs_dict[content_layer]
target_image_features = layer_features[0,:,:,:]
combination_features = layer_features[2,:,:,:]
loss += content_weight * content_loss(target_image_features, combination_features)

Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [11]:
for layer_name in style_layers:
    layer_features = outputs_dict[layer_name]
    style_reference_features = layer_features[1,:,:,:]
    combination_features = layer_features[2,:,:,:]
    s1 = style_loss(style_reference_features, combination_features)
    loss += (style_weight/len(style_layers)) * s1 

Instructions for updating:
keep_dims is deprecated, use keepdims instead


In [12]:
loss += total_variation_weight * total_variation_loss(combination_image)

-----
***Now, we will start with Gradient Descent, using the L-BFGS Algorithm***

-----

- It is an inbuilt function in Scipy but it only works for 1D vectors, whereas we are working with 3D Image Tensor. 
- Or it requires you to pass the loss and gradients are two seperate functions. 

So, we will write a class which calculates both the loss and gradient which can be passed to the Scipy function later. 

In [13]:
### Gets the gradients of generated image with respect to loss
grads = K.gradients(loss, combination_image)[0]

### Function to fetch the current values of loss and gradients
fetch_loss_and_grads = K.function([combination_image], [loss,grads])


In [14]:
### Basically we are have a single function which returns loss and grads, 
### Instead of calling the function fetch_loss_and_grad twice the following class will
### help us cache the value and divide the loss and gradient function

class Evaluator(object):
    
    def __init__(self):
        self.loss_value = None
        self.grad_values = None
    
    def loss(self, x):
        assert self.loss_value is None
        x = x.reshape((1, img_height, img_width, 3))
        outs = fetch_loss_and_grads([x])
        loss_value = outs[0]
        grad_values = outs[1].flatten().astype('float64')
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value
    
    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values
    
evaluator = Evaluator()

----
***Now we will run the Gradient ascent using the Scipy's L-BFGS Algorithm***

----

In [15]:
from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave
import time

In [16]:
result_prefix = 'style_transfer/my_result'
iterations = 20

In [17]:
### As scipy L-BFGS can process only flat vectors

x = preprocess_image(target_image_path)
x = x.flatten()

In [18]:
for i in range(iterations):
    print('Start of Iteration', i)
    start_time = time.time()
    
    ### Runs the L-BFGS optimization over pixels of generated image to minimize the neural style loss.
    ### 20 steps of gradient ascent per iteration 
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss, x, 
                                     fprime = evaluator.grads, maxfun = 20)
    print("Current loss value:", min_val)
    img = x.copy().reshape((img_height, img_width, 3))
    img = deprocess_image(img)
    fname = result_prefix + '_iterations_%d.png'%i
    imsave(fname, img)
    end_time = time.time()
    print('Iteration Time: ',end_time-start_time)

Start of Iteration 0
Current loss value: 1179999600.0
Iteration Time:  11.190200567245483
Start of Iteration 1


`imsave` is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
Use ``imageio.imwrite`` instead.
  del sys.path[0]


Current loss value: 443627300.0
Iteration Time:  8.03190565109253
Start of Iteration 2
Current loss value: 266511000.0
Iteration Time:  8.210888147354126
Start of Iteration 3
Current loss value: 188589890.0
Iteration Time:  8.100772142410278
Start of Iteration 4
Current loss value: 151635310.0
Iteration Time:  8.202980756759644
Start of Iteration 5
Current loss value: 127149490.0
Iteration Time:  8.437873601913452
Start of Iteration 6
Current loss value: 109568660.0
Iteration Time:  8.099299907684326
Start of Iteration 7
Current loss value: 97293176.0
Iteration Time:  8.129997253417969
Start of Iteration 8
Current loss value: 88165690.0
Iteration Time:  8.063009023666382
Start of Iteration 9
Current loss value: 80935944.0
Iteration Time:  8.16801118850708
Start of Iteration 10
Current loss value: 74897920.0
Iteration Time:  8.126678466796875
Start of Iteration 11
Current loss value: 70108650.0
Iteration Time:  8.085767269134521
Start of Iteration 12
Current loss value: 65960076.0
Itera