# <center> Implementation of Neural Style Transfer</center>

## Overview
- NST: Training an Image
- Style and Content Images
- Feature extraction from style and content images
- Gram Matrix
- Loss Function for NST
  - Style Loss
  - Content Loss
- Generate Image

#### We will first import all necessary libraries and frameworks we are going to use.

In [None]:
import tensorflow as tf
import numpy as np
import os
import cv2

from tensorflow.keras.applications import VGG19
from tensorflow.keras.applications.vgg19 import preprocess_input
from PIL import Image

### NST: Training an Image
<br>
How do we generate an Image in Neural Style Transfer?

In normal problems(i.e. Classification, Regression etc), we make a model(set of operations and operands(weights,biases and data) and train the model(weights and biases) to fit the distribution of the data), but in Neural Style Transfer, we take pixel values of the image as our weights and biases and train the image(i.e. change pixel values during training) and generate the Image

### Style and Content Images
<br><br>
#### Style Image:
Image from which we want to take the most basic information, i.e. edges, simple and fundamental shapes, basic color schemes, textures, etc.
!['Style Image'](https://miro.medium.com/max/767/1*B5zSHvNBUP6gaoOtaIy4wg.jpeg)
<br><br>
#### Content Image:
Image from which we want to take complex shapes, i.e. combination or mixture of shapes and colors. <br>For example: a dog is in the content image, we want a dog in our generated image.
!['Content Image'](https://i.ytimg.com/vi/xVJwwWQlQ1o/maxresdefault.jpg)

#### Loading an image and converting the image into a Tensor
<br>
Images comes in various sizes, some are small in size and some are very large, to perform NST on very large images is very time-consuming. Hence we will use image of maximum height/width of 512, if image has height/width of more than 512, we will rescale the image with max height/width of 512 while maintaining the ratio of height/width.

In [None]:
def load_image(image_path,max_length):
    
    if os.path.exists(image_path):
        image = cv2.imread(image_path)
        
        height,width = image.shape[0:2]
        
        if (height>width):
            if (height>max_length):
                width = int(width*(max_length/height))
                height = max_length
                image = cv2.resize(image,(width,height),interpolation=cv2.INTER_AREA)
        else:
            if (width>max_length):
                height = int(height*(max_length/width))
                width = max_length
                image = cv2.resize(image,(width,height),interpolation=cv2.INTER_AREA)
        
        image_rgb = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
        image_rgb = np.expand_dims(image_rgb,axis=0)
        image_tensor = tf.convert_to_tensor(image_rgb)
        image_tensor = tf.cast(image_tensor,tf.float32)
        image_tensor = image_tensor/255.

        return image_tensor
    else:
        raise OSError('Invalid Image path!')

In [None]:
image = load_image('drive/MyDrive/nst_workshop/neural_style_transfer/content/waterfall.jpeg',512)
print(image.shape)

(1, 512, 341, 3)


### Feature Extraction from style and content images

We need to extract features from style and content images so that we can compare features from generated image with style and content images and bring the generated image more closer to content and style images in specific ways.
<br><br>
To do feature extraction, we need to define a feature extractor, we will use VGG19 architecture of CNN(pre-trained) for this purporse. 
!['VGG19'](https://www.researchgate.net/profile/Clifford_Yang/publication/325137356/figure/fig2/AS:670371271413777@1536840374533/llustration-of-the-network-architecture-of-VGG-19-model-conv-means-convolution-FC-means.jpg)

We do not need the whole VGG16, we just need specific layers from the VGG19:
- Initial layers for Style(for basic info)
- Mid layers for Content(for more complex info)

Hence, for style and content, we will use following layers of VGG19:
- Style: [block1_conv1, block2_conv1, block3_conv1, block4_conv1,block5_conv1]
- Content: [block4_conv2]

In [None]:
def create_feature_extractor(convnet,style_layers,content_layers):
        layers = style_layers+content_layers
        outputs = [convnet.get_layer(layer_name).output for layer_name in layers]
        model = tf.keras.models.Model(convnet.inputs,outputs)
        
        return model


content_layers = ['block4_conv2']
style_layers = ['block1_conv1','block2_conv1','block3_conv1','block4_conv1','block5_conv1']

convnet = VGG19(include_top=False,weights='imagenet')
feature_extractor = create_feature_extractor(convnet,style_layers,content_layers)
feature_extractor.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)  

### Gram Matrix

We need to extract basic info from style image and compare it with basics of generated image in a specific way, that specific way is 'Gram Matrix'.

We have output(feature maps) from style layers that we defined. we flatten each feature map from a layer into a vector and stack it and make a matrix for each layer. Now we multiply this matrix with its transpose and the resultant matrix is called 'Gram Matrix'.
<br><br>
Flatten<br>
!['Flatten'](https://www.w3resource.com/w3r_images/numpy-manipulation-ndarray-flatten-function-image-1.png)
<br><br>

Intuition behind 'Gram Matrix':
<br>
We need a way to compare the relation of shapes, edges, textures between style image and generated image. In style image we saw(starry night), there are individual strokes form circular shapes, each edge of a individual stroke in way has a relation with nearby circular shape, and that is what we call style in which the image was painted.
!['Style Image'](https://miro.medium.com/max/767/1*B5zSHvNBUP6gaoOtaIy4wg.jpeg)
Now we want that relation in our generated image as well, hence we need the relation between edges of individual strokes and the circular shapes, we assume that edges of individual strokes and circular shapes would be detected individually by atleast one of filters of the 'style layers'. So we extract relation between output of all filters in a layer, in all layers.
!['Feature maps'](https://adeshpande3.github.io/assets/deconvnet.png)
<br><br>
How do we extract relation between output of all filters?

we use dot-product between vectors. Dot-product between two vectors gives us information on how similar those two vectors are.<br>
!['dot product'](https://ml-cheatsheet.readthedocs.io/en/latest/_images/khan_academy_matrix_product.png)<br>
So we flatten each output from the filters of a layer, stack them in a matrix, and do matrix multiplication of the matrix with its transpose so that all the combinations of dot-product between two vectors from the matrix takes place.
<br><br>
After calculating relations between each feature maps, we compare the relations in generated image and style image and bring them closer to each other by training the generated image. 

In [None]:
def gram_matrix(tensor):
        shape = tensor.get_shape()
        num_channels = int(shape[3])
        matrix = tf.reshape(tensor, shape=[-1, num_channels])
        gram = tf.expand_dims(tf.matmul(tf.transpose(matrix), matrix),axis=0)

        return gram


temp_input_tensor = tf.constant(np.random.randn(1,5,5,512))
gram_matrix_tensor = gram_matrix(temp_input_tensor)
print(gram_matrix_tensor.shape)

(1, 512, 512)


### Loss function for NST

Content Loss:<br>
Content Loss is calculated by difference between feature maps of content layer filters. Feature maps of content layer for generated image and content image are compared directly because we want spatial information from content image.

Style Loss:<br>
Style Loss is calculated by difference between gram matrix of all feature maps of style layers. Gram Matrix of all feature maps of style layers for generated image and content image are compared directly, so that we can get same texture in the generated image.

In [None]:
def loss(outputs,style_features,content_features,style_weight,content_weight):
        style_outputs = outputs[0]
        content_outputs = outputs[1]
        
        style_loss = []
        for index,style_output in enumerate(style_outputs):
            loss = tf.reduce_mean((style_output - style_features[index])**2)
            style_loss.append(loss)
        
        style_loss = tf.add_n(style_loss)
        style_loss *= style_weight
        
        
        content_loss = []
        for index,content_output in enumerate(content_outputs):
            loss = tf.reduce_mean((content_output - content_features[index])**2)
            content_loss.append(loss)
        
        content_loss = tf.add_n(content_loss)
        content_loss *= content_weight
        
        loss = style_loss + content_loss
        return loss

### Generate Image

We have Loss function, images, feature extractor. Let's put it all together and make a NST machine.

In [None]:
class style_transfer(tf.keras.models.Model):
    
    def __init__(self,convnet,input_preprocessor,style_layers,content_layers,style_weight,content_weight):
        
        super(style_transfer, self).__init__()
        
        self.feature_extractor = self.create_feature_extractor(convnet,style_layers,content_layers)
        self.feature_extractor.trainable = False
        self.input_preprocessor_ = input_preprocessor
        self.style_layers = style_layers
        self.content_layers = content_layers
        self.num_style_layers = len(style_layers)
        self.style_weight = style_weight/self.num_style_layers
        self.content_weight = content_weight/len(content_layers)
        

    def call(self,inputs):
        
        inputs = inputs*255
        preprocessed_inputs = self.input_preprocessor_(inputs)
        
        outputs = self.feature_extractor(preprocessed_inputs)
        style_outputs = outputs[:self.num_style_layers]
        content_outputs = outputs[self.num_style_layers:]
        
        style_outputs = [self.gram_matrix(style_output) for style_output in style_outputs]
        
        return [style_outputs,content_outputs]
    
    
    @staticmethod
    def create_feature_extractor(convnet,style_layers,content_layers):
        layers = style_layers+content_layers
        outputs = [convnet.get_layer(layer_name).output for layer_name in layers]
        model = tf.keras.models.Model(convnet.inputs,outputs)
        
        return model
    
    @staticmethod
    def gram_matrix(tensor):
        shape = tensor.get_shape()
        num_channels = int(shape[3])
        matrix = tf.reshape(tensor, shape=[-1, num_channels])
        gram = tf.expand_dims(tf.matmul(tf.transpose(matrix), matrix),axis=0)

        return gram
    
    
    @staticmethod
    def loss(outputs,style_features,content_features,style_weight,content_weight):
        style_outputs = outputs[0]
        content_outputs = outputs[1]
        
        style_loss = []
        for index,style_output in enumerate(style_outputs):
            loss = tf.reduce_mean((style_output - style_features[index])**2)
            style_loss.append(loss)
        
        style_loss = tf.add_n(style_loss)
        style_loss *= style_weight
        
        
        content_loss = []
        for index,content_output in enumerate(content_outputs):
            loss = tf.reduce_mean((content_output - content_features[index])**2)
            content_loss.append(loss)
        
        content_loss = tf.add_n(content_loss)
        content_loss *= content_weight
        
        loss = style_loss + content_loss
        return loss
    
    
    @staticmethod
    def clip_pixels(image):
        clipped_image = tf.clip_by_value(image,clip_value_min = 0.0,clip_value_max = 1.0)
        return clipped_image  

A function to convert TensorFlow tensor to Image

In [None]:
def tensor_to_image(tensor):
		tensor = tensor * 255
		tensor = np.array(tensor, dtype=np.uint8)

		if np.ndim(tensor) > 3:
			tensor = tensor[0]

		return Image.fromarray(tensor)

Defining content-layers and style-layers.

We also define style_weight, content_weight: the amount by which we want style to be added and content to be added in generated image.

There is also one more weight here, tv_weight(total_variational_weight). This weight is applied to total_variational_loss. This loss measures the amount of noise in an image, and reducing this loss reduces the noise and gives a smooth image.

We define #epochs and #steps_per_epoch(number of times we will change the pixel values of the generated image per epoch)

In [None]:
content_layers = ['block4_conv2']
style_layers = ['block1_conv1','block2_conv1','block3_conv1','block4_conv1','block5_conv1']

style_weight = 10.0
content_weight = 1e7
tv_weight = 20.0

epochs = 15
steps_per_epoch = 100

We make folders/directories for style,content,generated image in our drive

In [None]:
if not os.path.exists('drive/MyDrive/neural_style_transfer'):
  os.mkdir('drive/MyDrive/neural_style_transfer')

if not os.path.exists('drive/MyDrive/neural_style_transfer/content'):
  os.mkdir('drive/MyDrive/neural_style_transfer/content')

if not os.path.exists('drive/MyDrive/neural_style_transfer/style'):
  os.mkdir('drive/MyDrive/neural_style_transfer/style')

if not os.path.exists('drive/MyDrive/neural_style_transfer/outputs'):
  os.mkdir('drive/MyDrive/neural_style_transfer/outputs')

if not os.path.exists('drive/MyDrive/neural_style_transfer/outputs/intermediate'):
  os.mkdir('drive/MyDrive/neural_style_transfer/outputs/intermediate')

if not os.path.exists('drive/MyDrive/neural_style_transfer/outputs/final'):
  os.mkdir('drive/MyDrive/neural_style_transfer/outputs/final')

Define the path to content, style and generated image

In [None]:
content_image_path = 'drive/MyDrive/neural_style_transfer/content/Einstein.jpg' #Einstein.jpg
style_image_path = 'drive/MyDrive/neural_style_transfer/style/la_muse.jpg' #la_muse.jpg

final_output_path = 'drive/MyDrive/neural_style_transfer/outputs/final/'
intermediate_output_path = 'drive/MyDrive/neural_style_transfer/outputs/intermediate/'

Loading content and style image

In [None]:
content_image = load_image(content_image_path,512)
style_image = load_image(style_image_path,512)

In [None]:
print('Content_image size:',content_image.shape)
print('Style_image size:',style_image.shape)

Content_image size: (1, 389, 320, 3)
Style_image size: (1, 512, 512, 3)


We load predefined convnet for feature extraction and make an instance of NST machine.

We also load the optimizer for training the generated image

In [None]:
convnet = VGG19(include_top=False,weights='imagenet')

nst = style_transfer(convnet,preprocess_input,style_layers,content_layers,style_weight,content_weight)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01,beta_1=0.99,epsilon=1e-1)

We calculate the feature maps of content-layer for content image and Gram-Matrix of feature maps of style-layers for style image 

In [None]:
style_features = nst(style_image)[0]
content_features = nst(content_image)[1]

We make a copy of content image as generated image and define it as a Tensorflow variable so that we can train it.  

In [None]:
generated_image = tf.Variable(content_image)

we define path to final and intermediate outputs for generated-image based on name of content and style images

In [None]:
style_image_name = style_image_path.split('/')[-1].split('.')[0]
content_image_name = content_image_path.split('/')[-1].split('.')[0]

if not os.path.exists(intermediate_output_path+content_image_name+'_'+style_image_name):
  os.mkdir(intermediate_output_path+content_image_name+'_'+style_image_name)

we define a function 'train_step'. In this function, we calcuated the feature maps of content-layer and Gram-Matrix of feature maps of style-layers for generated image, calculate the style-loss and content-loss, calculate gradients according to the loss and apply it to the generated image.

And we call this function for #epochs x #steps_per_epoch times and we have the final image of NST!

In [None]:
style_weight = 10.0
content_weight = 1e4


@tf.function
def train_step(image,nst,optimizer,style_features,content_features,tv_weight):
    
    with tf.GradientTape() as tape:
        outputs = nst(image)
        loss = nst.loss(outputs,style_features,content_features,nst.style_weight,nst.content_weight)
        loss += tv_weight * tf.image.total_variation(image)
    
    gradients = tape.gradient(loss,image)
    optimizer.apply_gradients([(gradients,image)])
    image.assign(nst.clip_pixels(image))


generated_image = tf.Variable(content_image)


for epoch in range(epochs):
  for step in range(steps_per_epoch):

    train_step(generated_image,nst,optimizer,style_features,content_features,tv_weight)
  print(f'\nEpoch: {epoch+1}\nTotal Steps: {(epoch+1)*steps_per_epoch}')
  tensor_to_image(generated_image).save(intermediate_output_path+content_image_name+'_'+style_image_name+'/'+str(epoch+1)+'.jpg')

print('\nSaving Final Image:')
tensor_to_image(generated_image).save(final_output_path+content_image_name+'_'+style_image_name+'.jpg')
print('Final Image Saved')


Epoch: 1
Total Steps: 100

Epoch: 2
Total Steps: 200

Epoch: 3
Total Steps: 300

Epoch: 4
Total Steps: 400

Epoch: 5
Total Steps: 500

Epoch: 6
Total Steps: 600

Epoch: 7
Total Steps: 700

Epoch: 8
Total Steps: 800

Epoch: 9
Total Steps: 900

Epoch: 10
Total Steps: 1000

Epoch: 11
Total Steps: 1100

Epoch: 12
Total Steps: 1200

Epoch: 13
Total Steps: 1300

Epoch: 14
Total Steps: 1400

Epoch: 15
Total Steps: 1500

Saving Final Image:
Final Image Saved
