<a href="https://colab.research.google.com/github/Simodiri/Vision-Perception/blob/main/Fast_Style_Transfer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Importing modules
Import all necessary modules:

*  `numpy`: work with arrays
*  `tensorflow`: tensor operations
*  `tensorflow.keras`: creating neural networks
*  `pillow`: converting an image to a numpy array and viceversa`
*  `time`: calculating time of each iteration
*  `matplotlib`:displaying images and graphs in notebook
*  `request`,`base64`,`io`: downloading and loading images from url
* `os`: operating system level commands


In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.applications import vgg19
from tensorflow.keras.models import load_model,Model
from PIL import Image
import time
import matplotlib.pyplot as plt
import matplotlib
import requests
import base64
import os
from pathlib import Path
from io import BytesIO
matplotlib.rcParams['figure.figsize'] = (12,12)
matplotlib.rcParams['axes.grid'] = False

#Define utility functions

- *load_image* that is used to load image path and then convert it into a numpy array

In [None]:
def load_image(image_path, dim=None, resize=False):
    img= Image.open(image_path)
    if dim:
        if resize:
            img=img.resize(dim)
        else:
            img.thumbnail(dim)
    img= img.convert("RGB")
    return np.array(img)

- *load_url_image* which loads images from url and converts into a numpy array

In [None]:
def load_url_image(url,dim=None,resize=False):
    img_request=requests.get(url)
    img= Image.open(BytesIO(img_request.content))
    if dim:
        if resize:
            img=img.resize(dim)
        else:
            img.thumbnail(dim)
    img= img.convert("RGB")
    return np.array(img)

- *array_to_img* converts an array to an image

In [None]:
def array_to_img(array):
    array=np.array(array,dtype=np.uint8)
    if np.ndim(array)>3:
        assert array.shape[0]==1
        array=array[0]
    return Image.fromarray(array)

- *show_image* plots a single image

In [None]:
def show_image(image,title=None):
    if len(image.shape)>3:
        image=tf.squeeze(image,axis=0)
    plt.imshow(image)
    if title:
        plt.title=title

-*plot_images_grid* plots batches of images in grid

In [None]:
def plot_images_grid(images,num_rows=1):
    n=len(images)
    if n > 1:
        num_cols=np.ceil(n/num_rows)
        fig,axes=plt.subplots(ncols=int(num_cols),nrows=int(num_rows))
        axes=axes.flatten()
        fig.set_size_inches((15,15))
        for i,image in enumerate(images):
            axes[i].imshow(image)
    else:
        plt.figure(figsize=(10,10))
        plt.imshow(images[0])

#Create the fast style transfer
The training model is an encoder-decoder architecture that has residual layes. The output has the same size of the input and spits the generated image. This model is trained on a loss called **perceptual loss**, which is used when the task is to compare two different images that look similar and it is used to find content and style discrepancies between those images.
In order to feed the training model, a dataset of different images is used and in this case it's the **coco dataset**, which contains 328k images. In this code the dataset **kaggle challenge dataset** which has different images of landscapes. It's obvious that also a style image will be needed in order to learn its style using the autoencoder.

The process of how this fast style transfer works is described in this image: \
**METTI L'IMMAGINE** (cambia il nome della rete nell'immagine) \
For training the model, batch of input training images are sent into the autoencoder, which provides an output that will be the *styled image*. While training, the output images batches into the loss model (in this case it's the vgg19) and features from different layers are extracted. The aim of this features is to calculate style loss and content loss, whose sum produce the perceptual loss mentioned before that trains the network.

The main highlights of network:

* Residual Layers
* Encoder Decoder Model
* output from decoder is passed to loss model(VGG) to calculate loss
* training needs compute as these images are being passed to two networks on every step



#Compute Loss
A pretrained model calculates style loss and content loss, here the **vgg19**(vedere se cambiarlo).

In [None]:
vgg=vgg19.VGG19(weights='imagenet',include_top=False)
vgg.summary()

Then define the layers:

In [None]:
content_layers=['block4_conv2']

style_layers=['block1_conv1',
            'block2_conv1',
            'block3_conv1',
            'block4_conv1',
            'block5_conv1']

Lets define a class that creates loss model with some additional methods for accessing feature maps from network.

In [None]:
class LossModel:
  def __init__(self,pretrained_model,content_layers,style_layers):
    self.model=pretrained_model
    self.content_layers=content_layers
    self.style_layers=style_layers
    self.loss_model=self.get_model()

  def get_model(self):
    self.model.trainable=False
    layers_names=self.style_layers+self.content_layers
    outputs=[self.model.get_layer(name).output for name in layer_names]
    new_model=Model(inputs=self.model.input,outputs=outputs)
    return new_model

  def get_activations(self,inputs):
    inputs=inputs*255.0
    style_length=len(self.style_layers)
    outputs=self.loss_model(vgg19.preprocess_input(inputs))
    style_output,content_output=outputs[:style_length],outputs[style_length:]
    content_dict={name:value for name,value in zip(self.content_layers,content_output)}
    style_dict={name:value for name,value in zip(self.stle_layers,style_output)}
    return {'content':content_dict,'style':style_dict}
    

Now create the loss model using the class defined before:

In [None]:
loss_model = LossModel(vgg, content_layers, style_layers)

In order to calculate the two types of losses (content and style), two functions are made:

In [None]:
def content_loss(placeholder,content,weight):
    assert placeholder.shape == content.shape
    return weight*tf.reduce_mean(tf.square(placeholder-content))

def gram_matrix(x):
    gram=tf.linalg.einsum('bijc,bijd->bcd', x, x)
    return gram/tf.cast(x.shape[1]*x.shape[2]*x.shape[3],tf.float32)

def style_loss(placeholder,style, weight):
    assert placeholder.shape == style.shape
    s=gram_matrix(style)
    p=gram_matrix(placeholder)
    return weight*tf.reduce_mean(tf.square(s-p))

The **percentual loss** is computed with weighted averaging of these losses:

In [None]:
def preceptual_loss(predicted_activations,content_activations,
                    style_activations,content_weight,style_weight,
                    content_layers_weights,style_layer_weights):
    pred_content = predicted_activations["content"]
    pred_style = predicted_activations["style"]
    c_loss = tf.add_n([content_loss(pred_content[name],content_activations[name],
                                  content_layers_weights[i]) for i,name in enumerate(pred_content.keys())])
    c_loss = c_loss*content_weight
    s_loss = tf.add_n([style_loss(pred_style[name],style_activations[name],
                                style_layer_weights[i]) for i,name in enumerate(pred_style.keys())])
    s_loss = s_loss*style_weight
    return c_loss+s_loss

#Creating the autoencoder
In order to create an autoencoder, it will be needed:
* `ReflectionPadding2D`: aplly reflection padding to images in convolutional networks
* `InstanceNormalization`: normalizes inputs across channel
* `ConvLayer`:  combine the three classes mentioned before
* `ResidualLayer`: residual layer with two ConvLayer Blocks 
* `UpsampleLayer`: upsample the bottleneck representation in autoencoder (it acts as a deconvolution).

In [None]:
class ReflectionPadding2D(tf.keras.layers.Layer):
    def __init__(self, padding=(1, 1), **kwargs):
        super(ReflectionPadding2D, self).__init__(**kwargs)
        self.padding = tuple(padding)
    def call(self, input_tensor):
        padding_width, padding_height = self.padding
        return tf.pad(input_tensor, [[0,0], [padding_height, padding_height], 
                                     [padding_width, padding_width], [0,0] ], 'REFLECT')

In [None]:
class InstanceNormalization(tf.keras.layers.Layer):
    def __init__(self,**kwargs):
        super(InstanceNormalization, self).__init__(**kwargs)
    def call(self,inputs):
        batch, rows, cols, channels = [i for i in inputs.get_shape()]
        mu, var = tf.nn.moments(inputs, [1,2], keepdims=True)
        shift = tf.Variable(tf.zeros([channels]))
        scale = tf.Variable(tf.ones([channels]))
        epsilon = 1e-3
        normalized = (inputs-mu)/tf.sqrt(var + epsilon)
        return scale * normalized + shift

In [None]:
class ConvLayer(tf.keras.layers.Layer):
    def __init__(self,filters,kernel_size,strides=1,**kwargs):
        super(ConvLayer,self).__init__(**kwargs)
        self.padding=ReflectionPadding2D([k//2 for k in kernel_size])
        self.conv2d=tf.keras.layers.Conv2D(filters,kernel_size,strides)
        self.bn=InstanceNormalization()
    def call(self,inputs):
        x=self.padding(inputs)
        x=self.conv2d(x)
        x=self.bn(x)
        return x

In [None]:
class ResidualLayer(tf.keras.layers.Layer):
    def __init__(self,filters,kernel_size,**kwargs):
        super(ResidualLayer,self).__init__(**kwargs)
        self.conv2d_1=ConvLayer(filters,kernel_size)
        self.conv2d_2=ConvLayer(filters,kernel_size)
        self.relu=tf.keras.layers.ReLU()
        self.add=tf.keras.layers.Add()
    def call(self,inputs):
        residual=inputs
        x=self.conv2d_1(inputs)
        x=self.relu(x)
        x=self.conv2d_2(x)
        x=self.add([x,residual])
        return x

In [None]:
class UpsampleLayer(tf.keras.layers.Layer):
    def __init__(self,filters,kernel_size,strides=1,upsample=2,**kwargs):
        super(UpsampleLayer,self).__init__(**kwargs)
        self.upsample=tf.keras.layers.UpSampling2D(size=upsample)
        self.padding=ReflectionPadding2D([k//2 for k in kernel_size])
        self.conv2d=tf.keras.layers.Conv2D(filters,kernel_size,strides)
        self.bn=InstanceNormalization()
    def call(self,inputs):
        x=self.upsample(inputs)
        x=self.padding(x)
        x=self.conv2d(x)
        return self.bn(x)

Having these classes created, the autoencoder will have this architecture:
* 3 ConvLayer
* 5 ResidualLayer
* 3 UpsampleLayer

In [None]:
class StyleTransferModel(tf.keras.Model):
    def __init__(self,**kwargs):
        super(StyleTransferModel, self).__init__(name='StyleTransferModel',**kwargs)
        self.conv2d_1= ConvLayer(filters=32,kernel_size=(9,9),strides=1,name="conv2d_1_32") #first three conv layers, dobling the filters
        self.conv2d_2= ConvLayer(filters=64,kernel_size=(3,3),strides=2,name="conv2d_2_64")
        self.conv2d_3= ConvLayer(filters=128,kernel_size=(3,3),strides=2,name="conv2d_3_128")
        self.res_1=ResidualLayer(filters=128,kernel_size=(3,3),name="res_1_128")
        self.res_2=ResidualLayer(filters=128,kernel_size=(3,3),name="res_2_128")
        self.res_3=ResidualLayer(filters=128,kernel_size=(3,3),name="res_3_128")
        self.res_4=ResidualLayer(filters=128,kernel_size=(3,3),name="res_4_128")
        self.res_5=ResidualLayer(filters=128,kernel_size=(3,3),name="res_5_128")
        self.deconv2d_1= UpsampleLayer(filters=64,kernel_size=(3,3),name="deconv2d_1_64")
        self.deconv2d_2= UpsampleLayer(filters=32,kernel_size=(3,3),name="deconv2d_2_32")
        self.deconv2d_3= ConvLayer(filters=3,kernel_size=(9,9),strides=1,name="deconv2d_3_3")
        self.relu=tf.keras.layers.ReLU()
    def call(self, inputs):
        x=self.conv2d_1(inputs)
        x=self.relu(x) #use relu as activation function
        x=self.conv2d_2(x)
        x=self.relu(x)
        x=self.conv2d_3(x)
        x=self.relu(x)
        x=self.res_1(x)
        x=self.res_2(x)
        x=self.res_3(x)
        x=self.res_4(x)
        x=self.res_5(x)
        x=self.deconv2d_1(x)
        x=self.relu(x)
        x=self.deconv2d_2(x)
        x=self.relu(x)
        x=self.deconv2d_3(x)
        x = (tf.nn.tanh(x) + 1) * (255.0 / 2)
        return x
    
    ## used to print shapes of each layer to check if input shape == output shape
   
    def print_shape(self,inputs):
        print(inputs.shape)
        x=self.conv2d_1(inputs)
        print(x.shape)
        x=self.relu(x)
        x=self.conv2d_2(x)
        print(x.shape)
        x=self.relu(x)
        x=self.conv2d_3(x)
        print(x.shape)
        x=self.relu(x)
        x=self.res_1(x)
        print(x.shape)
        x=self.res_2(x)
        print(x.shape)
        x=self.res_3(x)
        print(x.shape)
        x=self.res_4(x)
        print(x.shape)
        x=self.res_5(x)
        print(x.shape)
        x=self.deconv2d_1(x)
        print(x.shape)
        x=self.relu(x)
        x=self.deconv2d_2(x)
        print(x.shape)
        x=self.relu(x)
        x=self.deconv2d_3(x)
        print(x.shape)

Define the input shape and batch size:

In [None]:
input_shape=(256,256,3)
batch_size=4

Creat style model using the `StyleTransferModel` created before:

In [None]:
style_model = StyleTransferModel()

style_model.print_shape(tf.zeros(shape=(1,*input_shape))) #check input shape and output shape