# Second Part: Experiments' results
* In this notebook we analyse the different reconstructions that we obtained with the Decoders that were trained.  
* Although the backbone Encoder was common to all the networks each of them reinterpreted the image reconstruction with its own way.  
* Unfortunately were used only 20000 pairs of image during the training and the images tend to be very noisy, probably due to that.  
Here some examples, and comparison with the [original paper results](https://arxiv.org/pdf/1703.06868.pdf).....

![image](https://drive.google.com/uc?export=view&id=12mm2E-tN8I-vnjCoXNnPf-eErxX3ETUm)  
  
* The first thing we notice is the lower quality of my experiments, that was experienced even though i used the same network of the original research.  
In fact, Huang and Belongie used the same fixed Encoder, but the only dataset length difference was critical for the performances.
* ResNet34 is way harder to optimize with respect to vgg19, and for this reason we probably see the higher amount of noise, mostly in the no residual case, where the quality is probably the lowest. 
* The pros of ResNet34 (with residuals) are that it is able to reconstruct a more detailed image thanks to the better gradient flow (residual blocks). We could infer this, through a simple comparison with its alias network with no residual blocks. As we can see from the training results, there was a slower and worse optimization procedure, and there was not any possibility to get comparable results.

* A possible explanation for these results on Resnet34 (w/o residuals) is the lackage of a good gradient flow (we can imagine this by comparison with ResNet34), and the shortage of data, that affected the performances of the other Decoders as well.



#### The following cells are the same as in the first notebook, they are necessary to get the networks started.

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

In [3]:
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import torch.nn.functional as F
import plotly.graph_objects as go
from torch.optim import lr_scheduler
from PIL import ImageFile
from architectures import *
from utils import *
import ipywidgets as widgets
ImageFile.LOAD_TRUNCATED_IMAGES = True
Image.MAX_IMAGE_PIXELS = 100000000000 


In [4]:
import IPython
js_code = '''
function ClickConnect(){
console.log("Working");
document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(ClickConnect,60000)
'''
display(IPython.display.Javascript(js_code))

<IPython.core.display.Javascript object>

In [None]:
!nvidia-smi -L

In [None]:
!pip3 install pytorch-lightning==1.5.10
import pytorch_lightning as pl

SEED = 2005

pl.seed_everything(SEED)

# Preprocessing of test data

As anticipated earlier we could use any size images, because the main architecture is fully convolutional, so there are not feed forward layer with a fixed size.  
* I have defined 6 different path (3 for the content, and other 3 for the style images), because i wanted to display different paths.

In [7]:
transform = transforms.Compose([transforms.Resize((512, 512)),
                               transforms.ToTensor()])

pers_style1 = torchvision.datasets.ImageFolder(root = "./paths/style1", transform = transform)
pers_style2 = torchvision.datasets.ImageFolder(root = "./paths/style2", transform = transform)
pers_style3 = torchvision.datasets.ImageFolder(root = "./paths/style3", transform = transform)

pers_content1 = torchvision.datasets.ImageFolder(root = "./paths/content1", transform = transform)
pers_content2 = torchvision.datasets.ImageFolder(root = "./paths/content2", transform = transform)
pers_content3 = torchvision.datasets.ImageFolder(root = "./paths/content3", transform = transform)



In [8]:
# This is the engineering code, here we defining the architecture of the AdaIn network, also using the Encoder and Decoder defineed in the architectures.py file in the repository. #

class Neural_style_network(pl.LightningModule):
  def __init__(self, lr, dec_path, alpha, num_epochs, device, first_train = False, net = 'vgg', residuals = True):
    super(Neural_style_network,self).__init__()
    
    self.enc = Encoder(device) # The Encoder is common to all the networks.
    self.path = dec_path       # The path is chosen to be where it was saved the model, or where we are about to save the model.
    if net == 'vgg':
      self.dec = Decoder()     # The Decoder is the mirror of the Encoder
    elif net == 'res':
      self.dec = DecodedRes(residuals) # The Decoder for resnet is different, so it was defined as a different class, and it alsoo change based on the presence of the residuals.

    self.first = first_train           # We need to specify if it is the first trial with a network or not
    self.lr = lr
    self.alpha = alpha                 # alpha is the style weight to apply in the total loss function.
    self.epochs = num_epochs           # These are the number of epochs
    self.loss_list = []
    self.optimizer = torch.optim.Adam(self.dec.parameters(), lr=self.lr)
    self.scheduler = lr_scheduler.CosineAnnealingLR(self.optimizer, T_max=self.epochs, eta_min= 0, last_epoch= -1, verbose=True)
    
    if first_train!=True:
      self.checkpoint = load_model(self.path, self, device)
      self.scheduler.load_state_dict(self.checkpoint['scheduler'])

    
  
  def forward(self, content_image, style_image, test = None):  #This method implements the architecture as sketched in the structure.
    enc_image = self.enc.forward(content_image)
    enc_style = self.enc.forward(style_image, lista = True)  # The list that we give to the encoder means that we want to retrieve from the layers of the networks the levels associated to relu_1_1, relu_2_1, relu_3_1, relu_4_1.

    adapted_image = self.AdaIn(enc_image, enc_style[-1])
    
    if test!=None: # At test time we want to return the decoded image
      
      decoded_adapt =(1-test)*enc_image +test*(adapted_image) # We can decide the level of styleness to apply at the decoded image.
      
      return self.dec(decoded_adapt)
    
    decoded_adapt =  self.dec(adapted_image)
    renc = self.enc.forward(decoded_adapt, lista = True)

    content_loss = self.Content_loss(renc[-1], adapted_image)
    style_loss = self.Style_loss(renc, enc_style)  

    return self.total_loss(content_loss, style_loss)
  
  def training_step(self, batch, batch_idx):
    content, style = batch
    loss = self.forward(content, style)
    self.log("train_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
    
    return loss

  def validation_step(self, batch, batch_idx):
    content, style = batch
    loss = self.forward(content, style)
    self.log("test_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
    self.loss_list.append(loss.item())
    return loss

  def on_train_epoch_end(self, *args, **kwargs):
    if self.path!=None:  # If the target PATH is not defined we don't save the model.
      model = self.state_dict()
      loss = sum(self.loss_list)/len(self.loss_list)
      checkpoint = {}
      checkpoint['model_state'] = model
      checkpoint['scheduler'] = self.scheduler.state_dict()
      if self.first: # we initialize the checkpoint loss list
        checkpoint['loss'] = [loss]
        self.first = False # Now we know that for a given path there will be already some information stored. 

      else:
        checkpoint['loss'] = self.checkpoint['loss'] + [loss] # Update the previous information.

      self.loss_list = [] # re-initialize the loss before the next epoch
      save_model(checkpoint, self.path)
      self.checkpoint = load_model(self.path, self, self.device)
      self.scheduler.load_state_dict(self.checkpoint['scheduler'])
    else:
      return



  def configure_optimizers(self):
    return [self.optimizer],[self.scheduler]

  def calc_mean_std(self, input, eps=1e-5): # given a feature maps layer, for each channel and each batch we compute its mean and variance (batch, channel, 1, 1)
    batch_size, channels = input.shape[:2]

    reshaped = input.view(batch_size, channels, -1) # Reshape channel wise
    mean = torch.mean(reshaped, dim = 2).view(batch_size, channels, 1, 1) # Calculate mean and reshape
    std = torch.sqrt(torch.var(reshaped, dim=2)+eps).view(batch_size, channels, 1, 1) # Calculate variance, add epsilon (avoid 0 division),
                                                                                      # calculate std and reshape
    return mean, std

  def total_loss(self, content_loss, style_loss): # This is the total loss
    return content_loss + self.alpha*style_loss


  def AdaIn(self, content, style):
    assert content.shape[:2] == style.shape[:2] # Only first two dim, such that different image sizes is possible
    batch_size, n_channels = content.shape[:2]
    mean_content, std_content = self.calc_mean_std(content)
    mean_style, std_style = self.calc_mean_std(style)

    output = std_style*((content - mean_content) / (std_content)) + mean_style # Normalize, then modify mean and std
    return output

  def Content_loss(self, input, target): # Content loss is a simple MSE Loss, we want to reduce the distance of the AdaIn output, with the re-encoded stylized image
    loss = F.mse_loss(input, target)
    return loss

  def Style_loss(self, input, target):
    mean_loss, std_loss = 0, 0

    for input_layer, target_layer in zip(input, target): 
      mean_input_layer, std_input_layer = self.calc_mean_std(input_layer)
      mean_target_layer, std_target_layer = self.calc_mean_std(target_layer)

      mean_loss += F.mse_loss(mean_input_layer, mean_target_layer) # Distance in the same channels is reduced within the same layer, and then it is done for all the layers.
      std_loss += F.mse_loss(std_input_layer, std_target_layer)

    return mean_loss+std_loss


In [None]:
lr = 0
alpha = 0
num_epochs =0
universal_device = 'cuda'

PATH = './models/vgg.pt'
PATH2 = './models/resnet.pt'
PATH3 = './models/resnet_nores.pt'

FIRST = False

## These are the main architectures ##
vgg = Neural_style_network( lr, PATH, alpha, num_epochs, universal_device, first_train = FIRST)
resnet34 = Neural_style_network( lr, PATH2, alpha, num_epochs, universal_device, first_train = FIRST, net = 'res', residuals = True)
resnet34_nores = Neural_style_network( lr, PATH3, alpha, num_epochs, universal_device, first_train = FIRST, net = 'res', residuals = False)

dizM = {vgg: 'vgg19', resnet34: 'resnet34', resnet34_nores: 'resnet34 with no res'}

## 1. Experiments

In the following cell we apply different styles to the same content images and compare them, with a fixed style percentage to apply.  
Data are visualized in a grid-like fashion.  
We also consider more architectures together.

In [10]:
figsize = (15,15)

# In order to use the interactive tools we need to define the Checkbox objects as follows #
vgg19B = widgets.Checkbox(description = 'vgg19', style = {'description_width':'initial'})
resnet34B = widgets.Checkbox(description = 'resnet34', style = {'description_width':'initial'})
resnet34_noresB = widgets.Checkbox(description = 'resnet34_nores', style = {'description_width':'initial'})
test_perc = widgets.FloatSlider(
                                  value=1,
                                  min=0,
                                  max=1,
                                  step=0.1,
                                  description='Style percentage: ',
                                  disabled=False,
                                  continuous_update=False,
                                  orientation='horizontal',
                                  readout=True,
                                  readout_format='.1f',
                              )

# We construct the graph layout
box_layout = widgets.Layout(display = 'inline-flex', flex_flow = 'row', align_items = 'stretch',
                            border = 'solid', width = '100%')

ui = widgets.HBox([test_perc, vgg19B, resnet34B, resnet34_noresB], layout = box_layout)


def append_models(model1,model2,model3, style_perc): # This function takes some boolean and float, and it is called at every modification of the Checkbox.
  model_list = []
  model_name_list = []
  if model1:
    model_list.append(vgg) 
    model_name_list.append(dizM[vgg])
  if model2:
    model_list.append(resnet34) 
    model_name_list.append(dizM[resnet34])
  if model3:
    model_list.append(resnet34_nores)
    model_name_list.append(dizM[resnet34_nores])

  if len(model_list)!=0:
    plot_grid_images(model_list, style_perc, pers_content1, pers_style3, model_name_list, universal_device, figsize)

out = widgets.interactive_output(append_models, {'model1': vgg19B,
                                                 'model2': resnet34B,
                                                 'model3': resnet34_noresB,
                                                 'style_perc': test_perc})
display(ui, out)




HBox(children=(FloatSlider(value=1.0, continuous_update=False, description='Style percentage: ', max=1.0, read…

Output()

## 2. Experiments
In the following cell is analyzed the style percentage's influence in a given (content, style) pair. That is done also for a number of models.


In [11]:
figsize = (70,70)

content = pers_content2[1][0]
style = pers_style2[0][0]

# here we have the same case of before #
def analyze_image(model1,model2,model3): 
  model_list = []
  model_name_list = []
  if model1:
    model_list.append(vgg) 
    model_name_list.append(dizM[vgg])
  if model2:
    model_list.append(resnet34) 
    model_name_list.append(dizM[resnet34])
  if model3:
    model_list.append(resnet34_nores)
    model_name_list.append(dizM[resnet34_nores])

  if len(model_list)!=0:
    plot_different_styles(model_list, content, style, model_name_list, universal_device, figsize)
    
ui2 = widgets.HBox([vgg19B, resnet34B, resnet34_noresB], layout = box_layout)    
out = widgets.interactive_output(analyze_image, {'model1': vgg19B,
                                                 'model2': resnet34B,
                                                 'model3': resnet34_noresB})

display(ui2, out)

HBox(children=(Checkbox(value=True, description='vgg19', style=DescriptionStyle(description_width='initial')),…

Output()

# Training results
Now that we are aware of the performances, we can now look at what were the network performances at training time. Since they use the same loss function in the Encoder, we can compare the losses. 
* ResNet34 seems to be the one with the lowest loss, even though it has a higher amount of noise w.r.t. the Vgg19. The loss function value cannot be an indicator of quality, also considering that this task is more subjective in terms of quality, it is not easy to find an objective way to measure the quality in this case.
* ResNet34 (w/o residuals) shows its difficulties in the training step
* To train each network the time required was about 3 days. The first epoch could have lasted also up to 16 hours, so due to the colab limitation and the unpredictable assignment of the GPU, it was not possible to estabilish when the training would have been completed.
* For these reasons it was possible just to feed the networks with 20k pairs of images, otherwise the training could have been way longer.
* It is intersting to see how all the architectures are able to mantain the content while matching the style, even though they are not optimal. This shows the validity of the approach and its effectiveness also with a small amount of data

In [12]:
# Losses have been saved during the trainings, in this cell is displayed, through the interactive plotly library, the trend of these functions.
loss1 = vgg.checkpoint['loss']
loss2 = resnet34.checkpoint['loss']
loss3 = resnet34_nores.checkpoint['loss']

max_length = max(len(loss1), len(loss2), len(loss3))

fig = go.Figure()
fig.add_trace(go.Scatter(x=list(range(max_length)), y=loss1,
                    mode='lines+markers',
                    name='Vgg', showlegend = True))
fig.add_trace(go.Scatter(x=list(range(max_length)), y=loss2,
                    mode='lines+markers',
                    name='Resnet34', showlegend = True))
fig.add_trace(go.Scatter(x=list(range(max_length)), y=loss3,
                    mode='lines+markers',
                    name='Resnet34 (no res.)', showlegend = True))
fig.update_layout(title='Comparison of the training results',
                   xaxis_title='Epochs',
                   yaxis_title='Test Loss',
                  width = 1000, height = 500)

fig.show()
