<a href="https://colab.research.google.com/github/Sambhav300899/Neural-Style-Transfer-TF/blob/master/demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Neural Style Transfer** is a Deep Learning technique which is used to transfer the style of one image to another, at the same time retaining the content of the original image. Here the technique proposed by Leon A. Gatys, Alexander S. Ecker, Matthias Bethge in https://arxiv.org/abs/1508.06576 is used.            

NOTE - In the paper the L-BFGS was used for optimisation but Adam has been used here instead.

The basic steps are -       
1. Take a content and style image.
2. Take a target image which will be the output.
3. Calculate the content and style loss for the target image to see how much the style and content matches the original images.
4. Calculate gradients for the target image.
5. Update the target image.
6. Repeat steps 3 to 5 for n iterations.

To perform step 3 we need some way to extract the content and style of the original images, this is done by using a pretrained CNN, as a CNN is just a collection of filters which extract relevant features from an image. The layers near the input of a CNN extract features like edges and the higher layers near the output extract features like ears and noses. One more important thing to note is that a that CNN does not learn to encode what an image is but it actually learns to encode what image represents, which means that it extracts the style and content of the image. 

Here VGG19 and VGG16 have been used for feature extraction.

<img src="https://datascience-enthusiast.com/figures/louvre_generated.png" alt="drawing"/>



**VGG19 Architecture**         
![alt text](https://www.researchgate.net/profile/Clifford_Yang/publication/325137356/figure/fig2/AS:670371271413777@1536840374533/llustration-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means.jpg)

**VGG16 Architecture**                       
![alt text](https://neurohive.io/wp-content/uploads/2018/11/vgg16-1-e1542731207177.png)


Clone the repository for style and content image samples

In [0]:
!git clone https://github.com/Sambhav300899/Neural-Style-Transform-TF.git

In [0]:
!cp -r Neural-Style-Transform-TF/content .
!cp -r Neural-Style-Transform-TF/styles .

import required libraries

In [0]:
%tensorflow_version 2.x
import cv2
import argparse
import progressbar
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshow
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model
from tensorflow.keras.applications import vgg19, vgg16

**Visualising the activations**      
Let's visualise the activations of VGG19, we will use a picture of a dog due to the abundance of dog pictures in imagenet. We visualise the activations for the style and content layers, and can see that the upper layers of the network extract gradients which are similar to brush strokes in a painting and the final layer contains more information about the content. One interesting thing to note here is that as we move further in the network the activations get sparser, this is due to the fact that the network is filtering out information which is not required and only firing for the class of the object.

The layers that we will use for content loss are - 'block5_conv2'    
The layers that we will use for style loss are - 'block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1'

In [0]:
#function to get the model and its outputs
def get_model(shape, layers):
    base = vgg19.VGG19(include_top = False, weights = 'imagenet', input_shape = shape)
    base.summary()

    #get the outputs from the model
    outputs = [base.get_layer(name).output for name in layers]

    model = Model(base.input, outputs)
    return model

if __name__ == "__main__":
    #the style and content layers that we use for style transfer
    layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1', 'block5_conv2']

    #lets load the image and resize it to make representation easier
    img = cv2.imread('content/dog.jpg')
    img = cv2.resize(img, (512, 512))
    model = get_model(img.shape, layers)

    input = np.expand_dims(img.astype('float'), axis = 0)
    input = vgg19.preprocess_input(input)

    feature_maps = model(input)
    imgs_per_row = 16

    for layer, map in zip(layers, feature_maps):
        features = map.shape[-1]
        size = map.shape[1]
        n_cols = features // imgs_per_row

        display_grid = np.zeros((size * n_cols, imgs_per_row * size))

        for col in range(n_cols):
            for row in range(imgs_per_row):
                channel_image = map.numpy()[0, :, :, col * imgs_per_row + row]
                channel_image -= channel_image.mean()
                channel_image /= channel_image.std()
                channel_image *= 64
                channel_image += 128
                channel_image = np.clip(channel_image, 0, 255).astype(np.uint8)

                display_grid[col * size : (col + 1) * size, row * size : (row + 1) * size] = channel_image

        scale = 1. / size
        plt.figure(figsize=(scale * display_grid.shape[1], scale * display_grid.shape[0]))
        plt.title(layer)
        plt.grid(False)
        plt.imshow(display_grid, aspect='auto', cmap='viridis')


**Define utility functions** 

Here the preprocessing is used to make the input suitable for passing to the network.

VGG19 and VGG16 apply extra preprocessing to the image when it is passed to the network, to remove this and obtain the image in a suitable format we use the deprocessing function

In [0]:
def pre_proc(img):
    '''
    pre process the image to input to the network

    args - 
    img : input image to be pre processed
    '''
    img = np.array(img).astype('float')
    img = np.expand_dims(img, axis = 0)

    return img

def deprocess_img_vgg(processed_img):
  '''
  de process the image after
  passing through a VGG model

  args -
  preprocessed_image - image to be de processed
  '''
  x = processed_img.copy()

  if len(x.shape) == 4:
    x = np.squeeze(x, 0)

  x[:, :, 0] += 103.939
  x[:, :, 1] += 116.779
  x[:, :, 2] += 123.68
  x = x[:, :, ::-1]

  x = np.clip(x, 0, 255).astype('uint8')
  return x


**Define the losses**

**content loss**         
The content loss is just the Mean Squared Error between the feature map of the content image and the target image. We use this loss to keep the content the same in both of the images.  
The content loss is given by - 
![alt text](https://miro.medium.com/max/1050/1*34xPuexhGCHT7xZ17wVvDQ.jpeg)

**style loss**           
To find the style loss we use multiple output feature maps as specified above. For calculating the style loss we can't directly use the MSE between the feature maps, because style is more of a similarity measure and two images of the same style need not be exactly the same. We calculate the gram matrix of the feature maps and then find the difference between them to find the style loss. The basic idea of a gram matrix is that it finds the correlation between the channels of an image by finding the dot product between all of the channels. As we know dot product is a measure of correlation between two vectors. If we find the difference between the gram matrices of feature maps of target image and style image then we end up minimising the loss. 

The loss between gram matrices of one single layer is given by - 
![alt text](https://miro.medium.com/max/1188/1*IoozR3xGzaSqtEqGEKcWMQ.jpeg)

The total style loss is given by(wl is the loss weight of each layer)              
![alt text](https://miro.medium.com/max/533/1*n7wIYY399mOdO9jJGM6aoA.jpeg)

**Variation Loss**               
One downside to this implementation is that it produces a lot of high frequency artifacts. We can reduce this using variational loss which encourages spatial continuity and thus avoids over pixelated results. This loss is already implemented in the main tensorflow and can be called directly when calculating the loss

**Finally our total loss is a weighted average of these three losses.**



In [0]:
def gram_matrix(input_tensor):
  '''
  calculates the gram matrix of a given input tensor
  
  args - 
  input_tensor - input tensor
  '''
  channels = int(input_tensor.shape[-1])
  a = tf.reshape(input_tensor, [-1, channels])
  n = tf.shape(a)[0]
  gram = tf.matmul(a, a, transpose_a=True)
  return gram / tf.cast(n, tf.float32)

def calc_style_loss(base_style, gram_target):
  '''
  calculates the style loss for a single layer

  args - 
  base_style - the feature maps of the target image
  gram_target - the gram matrix of the style image
  '''
  gram_style = gram_matrix(base_style)

  return tf.reduce_mean(tf.square(gram_style - gram_target))

def calc_content_loss(img_op, img_content):
  '''
  calculate the content loss for a single layer

  args - 
  img_op : feature maps of target image
  img_content : feature maps of content image
  '''
  return tf.reduce_mean(tf.square(img_op - img_content))


**Define the style transfer class**

In [0]:

class style_transfer():
    '''
    A class containing the functions related to style transfer

    init args -
    model_name : can be different model types, eg vgg19, vgg16 etc.
    content_path : path to input content image
    style_path : path to input style image
    '''
    def __init__(self, model_name, content_path, style_path):
        #read images
        self.style = cv2.imread(style_path)
        self.content = cv2.imread(content_path)

        print ('STYLE IMAGE')
        cv2_imshow(self.style)
        print ('\n\nCONTENT IMAGE')
        cv2_imshow(self.content)

        self.style = cv2.resize(self.style, (self.content.shape[1], self.content.shape[0]))

        self.h, self.w = self.style.shape[:2]

        #get the model
        self.make_model(model_name)

    def get_feature_maps(self):
        '''
        get the intermediate feature outputs required to
        calculate the style and content loss
        '''
        #run the model on style and content images
        output = self.model(self.create_ip())

        #collect style and content features
        style_features = [style_layer[0] for style_layer in output[len(self.content_layers):]]
        content_features = [content_layer[1] for content_layer in output[:len(self.content_layers)]]

        return style_features, content_features

    def set_loss_weights(self, content_weight, style_weight, variation_weight):
        '''
        set the weights for how much each loss contributes to the total loss

        args -
        content_weight : weight param for content loss
        style_weight : weight param for style loss
        variation_weight : weight param for variation loss
        '''
        self.content_weight = content_weight
        self.style_weight = style_weight
        self.variation_weight = variation_weight

    def make_model(self, model_name):
        '''
        create the model according to the input model specified and
        select the layers to be used for style and content loss
        also set the loss weights according to the model

        args -
        model_name : type of model. e.g - vgg16, vgg19
        '''
        if model_name.lower() == 'vgg19':
            #get the pretrained model
            base = vgg19.VGG19(input_shape = (self.h, self.w, 3), include_top = False, weights = 'imagenet')

            #setting the content and style images
            self.content_layers = ['block5_conv2']
            self.style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']

            #get the outputs from the model
            content_outputs = [base.get_layer(name).output for name in self.content_layers]
            style_outputs = [base.get_layer(name).output for name in self.style_layers]
            model_outputs = content_outputs + style_outputs

            #set loss weights, deprocessing function and processing function
            self.set_loss_weights(1e3, 1e-2, 30)
            self.de_proc_func = deprocess_img_vgg
            self.pre_processing_func = vgg19.preprocess_input

        elif model_name.lower() == 'vgg16':
            base = vgg16.VGG16(input_shape = (self.h, self.w, 3), include_top = False, weights = 'imagenet')

            self.content_layers = ['block5_conv3'] #, 'block5_conv2']
            self.style_layers = ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1', 'block5_conv1']

            content_outputs = [base.get_layer(name).output for name in self.content_layers]
            style_outputs = [base.get_layer(name).output for name in self.style_layers]
            model_outputs = content_outputs + style_outputs

            self.set_loss_weights(1e5, 1e-2, 1)
            self.de_proc_func = deprocess_img_vgg
            self.pre_processing_func = vgg16.preprocess_input

        #create the model
        self.model = Model(base.input, model_outputs)
        self.model.summary()

    def calc_loss(self, combined, gram_style_features, content_features):
        '''
        calculates the combined style and content loss

        args -
        combined : the output image
        gram_style_features : gram matrices calculated for style feature maps
        content_features : content feature maps
        '''
        #get model output for the combined image
        model_op = self.model(combined)

        #get the style and content output features for the combined image
        style_op_features = model_op[len(self.content_layers):]
        content_op_features = model_op[:len(self.content_layers)]

        style_score = 0
        content_score = 0

        #calculate style score over the feature maps
        weight_per_style_layer = 1. / float(len(self.style_layers))
        for target_style, comb_style in zip(gram_style_features, style_op_features):
            style_score = style_score + weight_per_style_layer * calc_style_loss(comb_style[0], target_style)

        #calculate content score over the feature maps
        weight_per_content_layer = 1. / float(len(self.content_layers))
        for target_content, comb_content in zip(content_features, content_op_features):
            content_score = content_score + weight_per_content_layer * calc_content_loss(comb_content[0], target_content)

        style_loss = self.style_weight * style_score
        content_loss = self.content_weight * content_score
        loss = style_loss + content_loss

        return loss, style_loss, content_loss

    def create_ip(self):
        '''
        creates input to a model and performs preprocessing
        steps for the content and style image
        '''
        #change to float and expand the dims
        content_ip = pre_proc(self.content)
        style_ip = pre_proc(self.style)

        #preprocessing function of the trained model
        content_ip = self.pre_processing_func(content_ip)
        style_ip = self.pre_processing_func(style_ip)

        return np.concatenate((style_ip, content_ip))

    def combine(self, num_iter, lr, output_path, starting):
        '''
        function to combine the image. Calculates
        and updates the image by calculating the gradients

        args -
        num_iter : number of iterations to run the update loop
        lr : learing rate for the optimizer
        output_path : path to save the final image
        starting : to set the starting input as different images
        '''
        #expand dims and apply model pre processing

        if starting == 'zeros':
            combined = pre_proc(np.zeros_like(self.content))
        elif starting == 'content':
            combined = pre_proc(self.content)
        elif starting == 'style':
            combined = pre_proc(self.style)
        elif starting == 'random':
            combined = pre_proc(np.random.uniform(low = 0., high = 255.0, size = self.content.shape))
        else :
            raise Exception('{} no such starting option, please select from zeros, content, style or random'.format(starting))

        combined = self.pre_processing_func(combined)
        combined = tf.Variable(combined, dtype = tf.float32)

        #get the style and content features
        style_features, content_features = self.get_feature_maps()
        #get the gram matrices of the style features using the style feature maps
        gram_style_features = [gram_matrix(style_feature) for style_feature in style_features]
        #define out optimizer with given learning rate
        opt = Adam(lr)

        #values that the image needs to be clipped to after update
        norm_means = np.array([103.939, 116.779, 123.68])
        min_vals = -norm_means
        max_vals = 255 - norm_means

        img = None
        intermediate_images = []
        losses = []

        for i in progressbar.progressbar(range(num_iter)):
            #calculate loss and then the gradients for the combined image
            with tf.GradientTape() as tape:
                loss = self.calc_loss(combined, gram_style_features, content_features)
                variation_loss = self.variation_weight * tf.image.total_variation(combined).numpy()[0]

                total_loss, style_loss, content_loss = loss
                grads = tape.gradient(total_loss + variation_loss, combined)

            #apply the gradient updates
            opt.apply_gradients([(grads, combined)])
            clipped = tf.clip_by_value(combined, min_vals, max_vals)
            combined.assign(clipped)

            print (" total loss:{} style_loss:{} content_loss:{} variation_loss:{}".format(
                                        total_loss, style_loss, content_loss, variation_loss))

            losses.append(loss)
            img = combined.numpy()
            if i % (num_iter / 10) == 0:
                intermediate_images.append(self.de_proc_func(img[0]))

        plt.plot(range(0, num_iter), losses)
        cv2.imwrite(output_path, self.de_proc_func(img[0]))
        plt.savefig("loss.png")

        for i in range(0, len(intermediate_images)):
            intermediate_images[i] = cv2.putText(intermediate_images[i], "step {}".format(i * int(num_iter / 10)),
                                    (20, 70), cv2.FONT_HERSHEY_SIMPLEX, 3, (0, 0, 255), cv2.LINE_AA)

        row_1 = intermediate_images[0]
        for i in range(1, 5):
            row_1 = np.hstack((row_1, intermediate_images[i]))

        row_2 = intermediate_images[5]
        for i in range(6, 10):
            row_2 = np.hstack((row_2, intermediate_images[i]))

        cv2.imwrite("intermediate_images.png", np.vstack((row_1, row_2)))
        plt.show()


Now we can finally run style transfer on an image and check the results. Some examples are shown below.

In [0]:
style_transfer_net = style_transfer('vgg19', 'content/content.jpg', 'styles/style.jpg')
style_transfer_net.combine(500, 5, 'output.png', 'content')

output = cv2.imread('output.png')
intermediate_image = cv2.imread('intermediate_images.png')
w = 512
r = intermediate_image.shape[0] / w
intermediate_image = cv2.resize(intermediate_image, (int(intermediate_image.shape[1] / r), w))
cv2_imshow(output)

print ("\n\nthe intermediate outputs are :")
cv2_imshow(intermediate_image)

In [0]:
style_transfer_net = style_transfer('vgg19', 'content/content.jpg', 'styles/starry_night.jpg')
style_transfer_net.combine(500, 5, 'output.png', 'content')

output = cv2.imread('output.png')
intermediate_image = cv2.imread('intermediate_images.png')
w = 512
r = intermediate_image.shape[0] / w
intermediate_image = cv2.resize(intermediate_image, (int(intermediate_image.shape[1] / r), w))
cv2_imshow(output)

print ("\n\nthe intermediate outputs are :")
cv2_imshow(intermediate_image)

You can try out some other samples as well if you want, just clear the notebook outputs once before running again. Just change the path of the images in the code above to run on a different sample. You can also download the code from the repo and run it on your system directly.

https://github.com/Sambhav300899/Neural-Style-Transfer-TF

**References** - 
* Deep Learning with Python by François Chollet
* https://arxiv.org/abs/1705.04058
* https://towardsdatascience.comneural-style-transfer-tutorial-part-1-f5cd3315fa7f
* https://www.tensorflow.org/tutorials/generative/style_transfer
* https://towardsdatascience.com/neural-networks-intuitions-2-dot-product-gram-matrix-and-neural-style-transfer-5d39653e7916

**Some issues with the current implementation** - 
* It can't be run on very high resolution content or the GPU runs out of memory.
* This process can't be run in real time.
* A lot of sharp gradients pop up around some objects.

**Future Work** - 
* An image to image model can be trained on a set of styles and outputs generated with this technique to make it real time.
* CycleGAN can be used for real time style transfer for a given style.
* Figuring out a way to use with high resolution images(maybe downscale and then upscale with super resolution GAN, but that seems like a very naive solution).
* Adding a quality measure of the style transfer.

**Experiments which can be done** - 
* How changing the loss weights effects the output(a detailed study and not just changing and seeing visually).
* Seeing how additional pre processing such as blurring before updating the image might help.

**An experiment with starting points**       
By default the content image is passed as the initial target image to calculate the loss. What if we used some different images as the starting points instead.

**NOTE** - Please clear all outputs before running this or the notebook keeps getting disconnected. To clear all outputs (edit -> clear all outputs)

**NOTE** - Also note that running the following cells will take some time as the number of iterations has to be increased

In [0]:
'''
lets try giving the starting image as style now
'''
style_transfer_net = style_transfer('vgg19', 'content/content.jpg', 'styles/style.jpg')
style_transfer_net.combine(5000, 5, 'output.png', 'style')

output = cv2.imread('output.png')
intermediate_image = cv2.imread('intermediate_images.png')
w = 512
r = intermediate_image.shape[0] / w
intermediate_image = cv2.resize(intermediate_image, (int(intermediate_image.shape[1] / r), w))
cv2_imshow(output)

print ("\n\nthe intermediate outputs are :")
cv2_imshow(intermediate_image)

In [0]:
'''
lets try giving the starting image as zeros now
'''
style_transfer_net = style_transfer('vgg19', 'content/content.jpg', 'styles/style.jpg')
style_transfer_net.combine(5000, 5, 'output.png', 'zeros')

output = cv2.imread('output.png')
intermediate_image = cv2.imread('intermediate_images.png')
w = 512
r = intermediate_image.shape[0] / w
intermediate_image = cv2.resize(intermediate_image, (int(intermediate_image.shape[1] / r), w))
cv2_imshow(output)

print ("\n\nthe intermediate outputs are :")
cv2_imshow(intermediate_image)

In [0]:
'''
lets try giving the starting image as random now
'''
style_transfer_net = style_transfer('vgg19', 'content/content.jpg', 'styles/style.jpg')
style_transfer_net.combine(5000, 5, 'output.png', 'random')

output = cv2.imread('output.png')
intermediate_image = cv2.imread('intermediate_images.png')
w = 512
r = intermediate_image.shape[0] / w
intermediate_image = cv2.resize(intermediate_image, (int(intermediate_image.shape[1] / r), w))
cv2_imshow(output)

print ("\n\nthe intermediate outputs are :")
cv2_imshow(intermediate_image)