<a href="https://colab.research.google.com/github/Zinni98/DL-Project/blob/main/project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

_Conventions and notes:_

- _Some small sentences of the paper are copied exactly in this document. This is done in parts when we felt that there were no reason to rewrite, better explain or summarise the concept because it was already clear and concise for us._

- _Wherever code is not commented, it is because we thought that it was something trivial_.

- _Wherever there is no markdown preceding some chunk of code, it is because it was done in class, a similar chunk of code was already explained earlier or perhaps a docstring was more suitable for the context_

# Unsupervised Domain Adaptation
In this project, our goal is to achieve an improvement in accuracy with respect to a baseline. The latter simply consists in a pre-trained ResNet34 fine-tuned supervisedly on the source training dataset. The baseline accuracy percentage has been obtained testing on the target domain test set.  
It is worth to note that when we talk about source and target datasets we refers to the whole dataset where real world and product images resides.  
For both the aforesaid domains, we used an $80\%$/$20\%$ ratio for the split into training and target sets respectively.  
For the DA implementation, we decided to deploy the approach proposed in [SymNet](https://arxiv.org/pdf/1904.04663.pdf) which presents a quite simple architecture structure but it reasons more on losses definitions level.  
Long story short, this paper is inspyred by the theory of *injecting confusion to enforce the feature extractor to learn invariant features* with respect to the domain shift. Thoose features are learned via domain-adversarial training.    
Concerning the structure, the ResNet34's one is modified cutting of the classifier `nn.Sequential()` block and appending two classifiers :
- $C_s$ : source classifier;
- $C_t$ : target classifier.

As it is possible to infer from the network name, everything evolves around building a symmetric design of source and target task classfiers from which another classifier ($C_{st}$) is built on top. The latter shares its layer neurons with $C_s$ and $C_t$.  
Both domain discrimination and domain confusion are implemented based on the constructed additional classifier.

In [None]:
from google.colab import drive  # to mount personal drive


from tqdm import tqdm   # for progress bar 
from time import sleep

import torch  # importing pytorch
import torch.optim as optim  # importing optimizer module
from torch.utils.data import Subset  # useful in defining data of interest in a dataset
import torch.nn as nn  # Neural Network tools
from torch.utils.tensorboard import SummaryWriter # to get plots of trends
from torch.utilis.data import DataLoader
# import torch.nn.functional as F

import torchvision
import torchvision.transforms as T  # to apply transformations to dataset images
from torchvision.datasets import ImageFolder  # to load and applying transformations on data
#import torchvision.transforms.functional as F

from sklearn.model_selection import train_test_split  # to split a dataset into training and test set

import math

import matplotlib.pyplot as plt
%matplotlib inline

import numpy as np

In [None]:
drive.mount('/content/gdrive/')

Drive already mounted at /content/gdrive/; to attempt to forcibly remount, call drive.mount("/content/gdrive/", force_remount=True).


## Data Extraction
In `get_data(batch_size, root_dir)` the following steps are performed :
- images transforms are defined. In particular, the adopted transformation sequence has been found there: [ResNet Transforms](https://pytorch.org/hub/pytorch_vision_resnet/);
- images from the local drive are loaded and the transforms applied;
- data splitting;
- collating individual fetched data samples into batches.
The returned objects are the real world and product domain data loaders.

In [None]:
def get_data(batch_size: int, root_dir: str) -> tuple(torch.utilis.data.DataLoader):
  """

  Params:
  ------
  batch_size: int
    batch size for the dataloader
  root_dir: str
    Directory of adaptiope_small (e.g. "something/something_else/adaptiope_small")
  """

  # Transforms for resnet found there https://pytorch.org/hub/pytorch_vision_resnet/
  transform_img = list()
  transform_img.append(T.Resize(256))
  transform_img.append(T.CenterCrop(224))
  transform_img.append(T.ToTensor())
  transform_img.append(T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]))
  transform_img = T.Compose(transform_img)

  # load data
  product_images_dataset = ImageFolder(root = f"{root_dir}/product_images/", transform = transform_img)
  rw_images_dataset = ImageFolder(root = f"{root_dir}/real_life/", transform = transform_img)

  product_train_indexes, product_test_indexes = train_test_split(list(range(len(product_images_dataset.targets))),
                                                test_size = 0.2, stratify = product_images_dataset.targets, random_state = 42)
  
  rw_train_indexes, rw_test_indexes = train_test_split(list(range(len(rw_images_dataset.targets))),
                                                test_size = 0.2, stratify = rw_images_dataset.targets, random_state = 42)
  

  product_train_data = Subset(product_images_dataset, product_train_indexes)
  product_test_data = Subset(product_images_dataset, product_test_indexes)

  rw_train_data = Subset(rw_images_dataset, rw_train_indexes)
  rw_test_data = Subset(rw_images_dataset, rw_test_indexes)

  product_train_loader = DataLoader(product_train_data, batch_size, shuffle = False)
  product_test_loader = DataLoader(product_test_data, batch_size, shuffle = False)

  rw_train_loader = DataLoader(rw_train_data, batch_size, shuffle = False)
  rw_test_loader = DataLoader(rw_test_data, batch_size, shuffle = False)

  return product_train_loader, product_test_loader, rw_train_loader, rw_test_loader




## Network initialization
The ResNet34 pretrained model is intialized. Since for the baseline we decided to perform a simple fine-tune, the original classifier layer has been overwritten and the gradients has been enabled.

In [None]:
def initialize_resnet34(num_classes: int, pretrained: Bool = True):
  """
  resnet34 initialization

  Parameters
  ----------
  num_classes: int
    number of categories the net should output

  pretrained: bool
    Specify if pretrained version of resnet should be retrieved

  """

  model = torchvision.models.resnet34(pretrained=pretrained)

  in_features = model.fc.in_features

  ##model.fc = nn.Sequential(nn.Linear(512, num_classes))#, nn.LogSoftmax(dim = 1))
  model.fc = nn.Linear(512, num_classes)
  for param in model.fc.parameters():
    param.requires_grad = True

  return model

## Cross-entropy loss for training data


In [None]:
def get_ce_cost_function() -> torch.nn.CrossEntropyLoss:
  """
  Simply returns cross entropy an object for computing the cross entropy loss
  """
  cost_function = torch.nn.CrossEntropyLoss()
  return cost_function

## Defining the optimizer
We have written the abstract class `AnnealingOptimizer` in order to define an optimizer that updates the learning rate using the annealing strategy proposed in the [Symnet paper](https://arxiv.org/pdf/1904.04663.pdf).

The strategy used is the following:
$$\eta = \frac{\eta_0}{(1+\alpha p)^\beta}$$
where:

- $\eta_0$ is the base learning rate, which by default is 0.001. Note that it has been changed with respect to the one proposed in the paper, because we noticed that it was too high.

- $p$ is the progress in training: $p = \frac{epoch}{total-epochs}$. Notice that when the function `update_lr()` the value of $p$ gets updated.

- $\alpha = 10$ is a constant
- $\beta = 0.75$ is a constant

Than the class `ResNetOptimizer` inherits from the `AnnealingOptimizer` and just defines the optimizer to be used. This optimizer will be used for ResNet34 which is the network of choice both for the baseline and the upper bound.

<br>

We decided to use the annealing optimizer also for the baseline and the upper bound, in order to compare it better with [Symnet](https://arxiv.org/pdf/1904.04663.pdf).





In [None]:
from abc import ABC, abstractmethod

class AnnealingOptimizer(torch.optim.Optimizer, ABC):
  """
  Defines and abstract class in order to implement an sgd optimizer using an annealing strategy
  """
  def __init__(self, model, nr_epochs, lr: float = 0.001, epoch: int = 0) -> None:
    if not 0.0 <= lr:
      raise ValueError(f"Invalid learning rate: {lr}")
    if not 0 <= epoch:
      raise ValueError(f"Invalid epoch value: {epoch}")
    
    self.nr_epochs = nr_epochs
    self.epoch = epoch
    self._alpha = 10
    self._beta = 0.75
    self._base_lr = lr

  def update_lr(self):
    """
    Updates the learning rate using the annealing strategy.
    In order to let the annealing strategy to work correctly, this method should be called at every epoch during the network training

    The learning rate for the classifier is 10 times bigger as proposed in the [Symnet paper](https://arxiv.org/pdf/1904.04663.pdf)
    """
    self.epoch += 1
    new_lr = self._compute_lr()
    for g in self.optimizer.param_groups:
      if g["name"] == "fe":
        g["lr"] = new_lr
      else:
        g["lr"] = new_lr*10

    
  def _compute_lr(self):
    """
    Computes the learning rate using the proposed annealing strategy

    Returns
    -------
    float
      updated learning rate
    """
    etap = 1 / ((1 + self._alpha * self.epoch / self.nr_epochs ) ** self._beta)
    return self._base_lr * etap

  def step(self):
    self.optimizer.step()
  
  def zero_grad(self):
    self.optimizer.zero_grad()
  
  

class ResNetOptimizer(AnnealingOptimizer):
  """
  Implements an annealing optimizer for Resnet
  """
  def __init__(self, model, nr_epochs, lr: float = 0.001, epoch: int = 0) -> None:
    super(ResNetOptimizer ,self).__init__(model, nr_epochs, lr, epoch)
    
    # Note that names for parameters group are important in order to update each group differently
    self.optimizer = optim.SGD([
                {'params': self.__get_fe_params(model), "name": "fe"},
                {'params': model.fc.parameters(), "lr": self._compute_lr()*10, "name": "classifier"}
            ], lr=lr, momentum=0.9)
    

  def __get_fe_params(self, model):
    """
    Takes parameters of the Resnet's feature extractor
    """
    fe_layers = list(model.children())[:-1]
    all_parameters = [param for layer in fe_layers for param in layer.parameters()]
    for param in all_parameters:
      yield param

## Baseline
### Training procedure
Briefly:
- the net is set into train mode.

- The training dataset is iteratively cycled through on groups of `batch_size` dimension. 

- For each sample in the current batch, inputs and targets are moved to the specified device, the predicted outputs and the losses computed.

- After that, an optimization step is performed in order to update the weights.

- Finally, accuracy and cumulative loss are computed

In [None]:
def training_step_baseline(net, data_loader: torch.utils.data.DataLoader, optimizer,
                           cost_function, device: str = 'cuda') -> tuple(float):

  """
  Performs the training of the network for one epoch.

  Parameters
  ----------
  net
    network model.
  
  data_loader: torch.utils.data.DataLoader
    Data loader intialized with the training set.
  
  optimizer: torch.optim.optimizer.Optimizer
    Optimizer of choice

  cost_function: torch.nn.modules._Loss
    Loss function to be used

  device: str
    Device in which computations should be performed.
    Admitted values:

    - "cpu"

    - "cuda:n" -> where n is the gpu number in case of multiple gpu configurations
  
  Returns
  -------
  tuple
    A tuple of length 2 containing:
    
    - Cumulative loss for the whole training set

    - Cumulative accuracy for the whole training set
  

  """
  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.
  
  net.train() 
 
  # iterate over the training set
  for batch_idx, (inputs, targets) in enumerate(data_loader):
    # load data into GPU
    inputs = inputs.to(device)
    targets = targets.to(device)
      
    # forward pass
    outputs = net(inputs)

    # loss computation
    loss = cost_function(outputs,targets)

    # backward pass
    loss.backward()
    
    # parameters update
    optimizer.step()

    # gradients reset
    optimizer.zero_grad()

    # fetch prediction and loss value
    samples += inputs.shape[0]
    cumulative_loss += loss.item()
    _, predicted = outputs.max(dim=1) # max() returns (maximum_value, index_of_maximum_value)

    # compute training accuracy
    cumulative_accuracy += predicted.eq(targets).sum().item()

  return cumulative_loss/samples, (cumulative_accuracy/samples)*100


### Test procedure
- The network is set to evaluation mode. 

- After this, we disable all the gradients in order to avoid keeping track of the gradients (not needed for testing). The tesiting procedure from now on is pretty much analogous to what's been done during the training with the only difference that the weights of the network get not updated.

In [None]:
def test_step_baseline(net, data_loader, cost_function, device='cuda'):
  """
  Test the network for one epoch

  Parameters:
  ----------
  net
    network model.
  data_loader
    Data loader intialized with the test set.
  cost_function: torch.nn.modules._Loss
    Loss function to be used
  device: str
    Device in which computations should be performed.
    Admitted values:

    - "cpu"

    - "cuda:n" -> where n is the gpu number in case of multiple gpu configurations
  
  Returns
  -------
  tuple
    A tuple of length 2 containing:
    
    - Cumulative loss for the whole training set

    - Cumulative accuracy for the whole training set
  
  """

  samples = 0.
  cumulative_loss = 0.
  cumulative_accuracy = 0.

  # set the network to evaluation mode
  net.eval() 

  # disable gradient computation (we are only testing, we do not want our model to be modified in this step!)
  with torch.no_grad():

    # iterate over the test set
    for batch_idx, (inputs, targets) in enumerate(data_loader):
      
      # load data into GPU
      inputs = inputs.to(device)
      targets = targets.to(device)
        
      # forward pass
      outputs = net(inputs)

      # loss computation
      loss = cost_function(outputs, targets)

      # fetch prediction and loss value
      samples+=inputs.shape[0]
      cumulative_loss += loss.item() # Note: the .item() is needed to extract scalars from tensors
      _, predicted = outputs.max(1)

      # compute accuracy
      cumulative_accuracy += predicted.eq(targets).sum().item()

  return cumulative_loss/samples, cumulative_accuracy/samples*100

### Main Function (Product → Real World)
This function is meant to be a 'wrapper' function where every aforecited function is called when needed.  
First, the parameters values are defined as arguments of the function.  
Then, sequentially :     
- extract, process and load data;
- network, optimizer and cost function initialization;
- iterating a certain number of times equal to a fixed number of epochs. In here the following steps are performed :    
  - computation of training loss and accuracy;
  - computation of test loss and accuracy;
  - informing the writer of the values obtained.

- At the end of the training, the network is tested on both test sets of source and tagret domains.

Here the network is trained on product images and then tested on real world ones

In [None]:
def main_PRD_to_RW(batch_size=128, 
         device='cuda', 
         learning_rate=0.0001, 
         weight_decay=0.000001, 
         momentum=0.9, 
         epochs=50,
         entropy_loss_weight=0.1,
         nr_classes = 20, 
         img_root="gdrive/My Drive/Colab Notebooks/data/adaptiope_small",
         runs_dir="gdrive/My Drive/Colab Notebooks/runs/exp2"
         ):

  writer = SummaryWriter(log_dir=runs_dir)

  ## DataLoader split the size of the given dataset into #of elements in the dataset/batch size
  source_train_loader, source_test_loader, target_train_loader, target_test_loader = get_data(batch_size, img_root)
  print('DataLoaders Done')
  net = initialize_resnet34(nr_classes).to(device)
  print('Network Init Done')
  #optimizer = get_optimizer_SGD(net, learning_rate, wd = weight_decay, momentum = momentum)
  optimizer = ResNetOptimizer(net, epochs)
  print('Got Optimizer')
  cost_function = get_ce_cost_function()
  print('Got Cost Function')
  print('Time to train!\n==========================BASELINE========================')

  for e in range(epochs):
    ##BASELINE


    # def training_step_baseline(net, data_loader, optimizer, cost_function, scheduler, device='cuda'):
    train_loss, train_accuracy = training_step_baseline(net, source_train_loader, optimizer, cost_function, device)
    #def test_step_baseline(net, data_loader, cost_function, device='cuda'):
    test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)

    optimizer.update_lr()

    print('Epoch: {:d}'.format(e+1))
    print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
    print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
    print('-----------------------------------------------------')
    
    # add values to logger
    """writer.add_scalar('Loss/train_loss', train_loss, e + 1)
    writer.add_scalar('Loss/test_loss', test_loss, e + 1)
    writer.add_scalar('Accuracy/train_accuracy', train_accuracy, e + 1)
    writer.add_scalar('Accuracy/test_accuracy', test_accuracy, e + 1)"""
  

  # perform final test step and print the final metrics
  print('After training:')
  train_loss, train_accuracy = test_step_baseline(net, source_train_loader, cost_function, device)
  test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')


  
  # close the logger
  writer.close()

In [None]:
main_PRD_to_RW()

In [None]:
runs = f"{runsdir_matteo}/baseline/PRD2RW"
%load_ext tensorboard
%tensorboard --logdir=runs

### Main Function (Real World → Product)
Performs exactly the same process as in `main_PRD_to_RW()` just in the other direction: train on real world images and test in product images

In [None]:
def main_RW_to_PRD(batch_size=128, 
         device='cuda', 
         learning_rate=0.0001, 
         weight_decay=0.000001, 
         momentum=0.9, 
         epochs=50,
         entropy_loss_weight=0.1,
         nr_classes = 20, 
         img_root="gdrive/My Drive/Colab Notebooks/data/adaptiope_small",
         runs_dir="gdrive/My Drive/Colab Notebooks/runs/exp2"
         ):

  writer = SummaryWriter(log_dir=runs_dir)

  ## DataLoader split the size of the given dataset into #of elements in the dataset/batch size
  target_train_loader, target_test_loader, source_train_loader, source_test_loader = get_data(batch_size, img_root)
  print('DataLoaders Done')
  net = initialize_resnet34(nr_classes).to(device)
  print('Network Init Done')
  #optimizer = get_optimizer_SGD(net, learning_rate, wd = weight_decay, momentum = momentum)
  optimizer = ResNetOptimizer(net, epochs)
  print('Got Optimizer')
  cost_function = get_ce_cost_function()
  print('Got Cost Function')
  print('Time to train!\n==========================BASELINE========================')

  for e in range(epochs):
    ##BASELINE


    # def training_step_baseline(net, data_loader, optimizer, cost_function, scheduler, device='cuda'):
    train_loss, train_accuracy = training_step_baseline(net, source_train_loader, optimizer, cost_function, device)
    #def test_step_baseline(net, data_loader, cost_function, device='cuda'):
    test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)

    optimizer.update_lr()

    print('Epoch: {:d}'.format(e+1))
    print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
    print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
    print('-----------------------------------------------------')
    
    # add values to logger
    """writer.add_scalar('Loss/train_loss', train_loss, e + 1)
    writer.add_scalar('Loss/test_loss', test_loss, e + 1)
    writer.add_scalar('Accuracy/train_accuracy', train_accuracy, e + 1)
    writer.add_scalar('Accuracy/test_accuracy', test_accuracy, e + 1)"""
  

  # perform final test step and print the final metrics
  print('After training:')
  train_loss, train_accuracy = test_step_baseline(net, source_train_loader, cost_function, device)
  test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')


  
  # close the logger
  writer.close()

In [None]:
main_RW_to_PRD()

In [None]:
runs = f"{runsdir_matteo}/baseline/RW2PRD"
%load_ext tensorboard
%tensorboard --logdir=runs

## UPPER BOUND IMPLEMENTATION
The *upper bound* consists in training supervisedly using the target domain's labels and testing on the target domain itself.

### Product $\to$ Real World
H

In [None]:
def main_upper_bound(batch_size=BATCH_SIZE, 
                     device=cuda, 
                     learning_rate=0.0001, 
                     weight_decay=0.000001,
                     momentum=0.9, 
                     epochs=15,
                     entropy_loss_weight=0.1,
                     nr_classes = num_classes, 
                     img_root=rootdir_matteo,
                     runs_dir="gdrive/My Drive/Colab Notebooks/runs/exp2"
                     ):
  
  writer = SummaryWriter(log_dir=f"{runsdir_matteo}/runs_upper_bound/PRD2RW")

  source_train_loader, source_test_loader, target_train_loader, target_test_loader = get_data(batch_size, img_root)
  print('DataLoaders Done')
  net = initialize_resnet34(nr_classes).to(device)
  print('Network Init Done')
  optimizer = ResNetOptimizer()
  print('Got Optimizer')
  cost_function = get_ce_cost_function()
  print('Got Cost Function')
  
  print('Time to train!\n==========================UPPER BOUND========================')

  for e in range(epochs):
    # Inspyred by : https://towardsdatascience.com/training-models-with-a-progress-a-bar-2b664de3e13e
    with tqdm(source_train_loader, unit="batch") as tepoch:
      for data, target in tepoch:
          tepoch.set_description(f"Epoch {e}")
          # def training_step_baseline(net, data_loader, optimizer, cost_function, scheduler, device='cuda'):
          train_loss, train_accuracy = training_step_baseline(net, target_train_loader, optimizer, cost_function, device)
          #def test_step_baseline(net, data_loader, cost_function, device='cuda'):
          test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)
          tepoch.set_postfix(train_loss=train_loss, training_accuracy = train_accuracy, tst_loss = test_loss, tst_accuracy=test_accuracy)
          sleep(0.1)

    optimizer.update_lr()

    # add values to logger
    writer.add_scalar('Loss/train_loss', train_loss, e + 1)
    writer.add_scalar('Loss/test_loss', test_loss, e + 1)
    writer.add_scalar('Accuracy/train_accuracy', train_accuracy, e + 1)
    writer.add_scalar('Accuracy/test_accuracy', test_accuracy, e + 1)
  

  # perform final test step and print the final metrics
  print('After training:')
  train_loss, train_accuracy = test_step_baseline(net, target_train_loader, cost_function, device)
  test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')
  
  # close the logger
  writer.close()

In [None]:
torch.cuda.empty_cache()

In [None]:
main_upper_bound()

In [None]:
runs = f"{runsdir_matteo}/runs_upper_bound/PRD2RW"
%load_ext tensorboard
%tensorboard --logdir=runs

### Real World $\to$ Product

In [None]:
def main_UB_RW2PRD(batch_size=BATCH_SIZE, 
                     device=cuda, 
                     learning_rate=0.0001, 
                     weight_decay=0.000001,
                     momentum=0.9, 
                     epochs=15,
                     entropy_loss_weight=0.1,
                     nr_classes = num_classes, 
                     img_root=rootdir_matteo,
                     runs_dir="gdrive/My Drive/Colab Notebooks/runs/exp2"
                     ):
  
  writer = SummaryWriter(log_dir=f"{runsdir_matteo}/runs_upper_bound/RW2PRD")

  target_train_loader, target_test_loader, source_train_loader, source_test_loader = get_data(batch_size, img_root)
  print('DataLoaders Done')
  net = initialize_resnet34(nr_classes).to(device)
  print('Network Init Done')
  optimizer = ResNetOptimizer()
  print('Got Optimizer')
  cost_function = get_ce_cost_function()
  print('Got Cost Function')
  
  print('Time to train!\n==========================UPPER BOUND========================')

  for e in range(epochs):
    # Inspyred by : https://towardsdatascience.com/training-models-with-a-progress-a-bar-2b664de3e13e
    with tqdm(source_train_loader, unit="batch") as tepoch:
      for data, target in tepoch:
          tepoch.set_description(f"Epoch {e}")
          # def training_step_baseline(net, data_loader, optimizer, cost_function, scheduler, device='cuda'):
          train_loss, train_accuracy = training_step_baseline(net, target_train_loader, optimizer, cost_function, device)
          #def test_step_baseline(net, data_loader, cost_function, device='cuda'):
          test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)
          tepoch.set_postfix(train_loss=train_loss, training_accuracy = train_accuracy, tst_loss = test_loss, tst_accuracy=test_accuracy)
          sleep(0.1)
    
    optimizer.update_lr()
    
    # add values to logger
    writer.add_scalar('Loss/train_loss', train_loss, e + 1)
    writer.add_scalar('Loss/test_loss', test_loss, e + 1)
    writer.add_scalar('Accuracy/train_accuracy', train_accuracy, e + 1)
    writer.add_scalar('Accuracy/test_accuracy', test_accuracy, e + 1)
  

  # perform final test step and print the final metrics
  print('After training:')
  train_loss, train_accuracy = test_step_baseline(net, target_train_loader, cost_function, device)
  test_loss, test_accuracy = test_step_baseline(net, target_test_loader, cost_function, device)

  print('\t Training loss {:.5f}, Training accuracy {:.2f}'.format(train_loss, train_accuracy))
  print('\t Test loss {:.5f}, Test accuracy {:.2f}'.format(test_loss, test_accuracy))
  print('-----------------------------------------------------')
  
  # close the logger
  writer.close()

In [None]:
torch.cuda.empty_cache()

In [None]:
main_UB_RW2PRD()

In [None]:
runs = f"{runsdir_matteo}/runs_upper_bound/RW2PRD"
%load_ext tensorboard
%tensorboard --logdir=runs

## Domain Adaptation Technique : SymNet
The desgn of the proposed symmetric network is characterized by:

- The Feature extractor G. We decided to use the feature extractor defined by ResNet34 (i.e. Resnet34 without the last fully connected layer) in order to allow the comparison with the results obtained with the baseline and the upper bound

- Two parallel task classifiers $C_s$ and $C_t$ are both based on a single fully connected layer (as proposed in the paper) and they contain 20 neurons each, as the number of categories for the proposed problem.

| ![symnet](https://drive.google.com/uc?id=1qyPClxz8zcJvhVGF84-IKv1Owljt0h7H) |
|:--:|
| *The image shows a sketch of the network architecture, including errors which will be explained later in this notebook.* |

In [None]:
class SymNet(nn.Module):
  """
  Class representing the proposed symmetric network
  """
  def __init__(self, n_classes: int = 20) -> None:
    super(SymNet, self).__init__()
    resnet = initialize_resnet34(20, True)
    # Taking the feature extractor of resnet34
    # Reference: https://stackoverflow.com/questions/55083642/extract-features-from-last-hidden-layer-pytorch-resnet18
    self.feature_extractor = torch.nn.Sequential(*list(resnet.children())[:-1])
    self.source_classifier = nn.Linear(in_features=512, out_features=n_classes)
    self.target_classifier = nn.Linear(in_features=512, out_features=n_classes)
  

  def forward(self, x: torch.Tensor) -> tuple:
    """
    Performs the forward pass

    Parameters
    ----------
    x : torch.Tensor
      Input tensor to the network
    
    Returns
    -------
    tuple
      The returned values are respectively the result of the source classifier, target classifier and the concatenation of the two.

    """
    features = self.feature_extractor(x)
    features = features.squeeze()
    source_output = self.source_classifier(features)
    # source_output = nn.Softmax(source_output)

    target_output = self.target_classifier(features)
    # target_output = nn.Softmax(target_output)

    source_target_classifier = torch.cat((source_output, target_output), dim=1)
    
    return source_output , target_output, source_target_classifier
  
  def parameters(self) -> torch.Tensor:
    """
    Paramters of the netowork

    Yields
    ------
    torch.Tensor
      Network parameter
    """
    fe = list(self.feature_extractor.parameters())
    sc = list(self.source_classifier.parameters())
    tc = list(self.target_classifier.parameters())
    tot = fe + sc + tc
    for param in tot:
      yield param
    
  def classifier_parameters(self) -> torch.Tensor:
    """
    Parameters of the classification layer

    Yields
    ------
    torch.Tensor
      Classification layer parameter
    """
    sc = list(self.source_classifier.parameters())
    tc = list(self.target_classifier.parameters())
    tot = sc + tc
    for param in tot:
      yield param

  def feature_extractor_parameters(self) -> torch.Tensor:
    """
    Parameters of the feature extractor

    Yields
    ------
    torch.Tensor
      Feature extractor parameter
    """
    return self.feature_extractor.parameters()


### Optimizer for symnet
Symnet uses the AnnealingOptimizer strategy in order to adjust the learning rate during epochs as proposed in the paper.
It just defines the optimizer with the right parameters.
Additionaly, again following the paper, the learning rate for the combined classifiers (i.e. $C_{st}$) is set 10 times bigger than the feature extractor learning rate.

In [None]:
class SymNetOptimizer(AnnealingOptimizer):
  """
  Implements an annealing optimizer for SymNet
  """
  def __init__(self, model, nr_epochs, lr: float = 0.001, epoch: int = 0):
    super(SymNetOptimizer ,self).__init__(model, nr_epochs, lr, epoch)

    # Note that names for parameters group are important in order to update each group differently
    self.optimizer = optim.SGD([
                {'params': model.feature_extractor_parameters(), "name": "fe"},
                {'params': model.classifier_parameters(), "lr": self._compute_lr()*10, "name": "classifier"}
            ], lr=lr, momentum=0.9)
  

    

### Notation used
In order to define the losses, we define the notation used for formulas:

- The classifiers are denoted as: $C^s$ for the source classifier, $C^t$ for the target classifier and $C^{st}$ for the combined classifier (source + target)

- $K$ is the ouput dimension of each classifier, which corresponds to the number of categories
($K=K_s=K_t= \# \:\: of \:\: categories$)

- $v^s(x) \in R^K$, $v^t(x) \in R^K$ and $[v^s(x),v^t(x)] \in R^{2k}$ the output vectors of $C^s$, $C^t$ and $C^{st}$ respectevely **before the softmax operation**

- $p^s(x) \in [0,1]^K$, $p^t(x) \in [0,1]^K$ and $p^{st} \in [0,1]^{2K}$ the output vectors of $C^s$, $C^t$ and $C^{st}$ respectevely **after the softmax operation**. To denote the $k^{th}$ element of the vector the following notation is used: $p^s_k$ (resp. $p^t_k$ and $p^{st}_k$), $k \in \{1,...,K\}$<br>
_Note: $p^{st}$ is computed considering 2K classes, so it is not equal to the concatenation of $p^s$ and $p^t$_



### Definining the losses
[Symnet Paper](https://arxiv.org/pdf/1904.04663.pdf) defines two losses:
- One for updating the weights of the three classifier ($C^s$, $C^t$ and $C^{st}$). We refer to this loss as "Classifier loss"

- The other for updating the weights of the feture extractor. We refere to this loss as "feature extractor loss"

In the following sections, these two losses will be explained in greater detail

*Note: remember that the weights for $C^{st}$  are shared with the other two classifiers*

#### Classifier loss
The objective for updating the classifiers weigths is the following:

$$\min_{C^s, C^t, C^{st}} \mathcal{E}^s_{task}(G,C^s) + \mathcal{E}^t_{task}(G,C^t) + \mathcal{E}^{st}_{domain}(G,C^{st})$$

It is possible to notice that the the whole objective is composed of three errors which are defined as follows:

- Error for the task classifier is simple cross entropy but considering only the output corresponding to the true category ($y^s_i$) $$\mathcal{E}^s_{task}(G,C^s) = - \frac{1}{n_s}\sum_{i=1}^{n_s}{log(p^s_{y^s_i}(x^s_i))}$$

- The same thing it is done for the target classifier, using again the source training samples (since labels for the target are not given): $$\mathcal{E}^s_{task}(G,C^s) = - \frac{1}{n_s}\sum_{i=1}^{n_s}{log(p^t_{y^s_i}(x^s_i))}$$
The use of this loss is essential to provide the correspondance between $C^s$ and C^t in order to allow the achievement of category-level domain confusion which will be optained later using one of the errors for the update of the classifier

- By using only these two errors, $C^s$ and $C^t$ learn the exact same thing, so the third error, which acts on the combined classifier $C^{st}$, is needed to distinguish between the two: $$\mathcal{E}^{st}_{domain}(G,C^{st}) = - \frac{1}{n_t}\sum_{j=1}^{n_t}{log(\sum_{k=1}^{K}{p^{st}_{K+k}(x^t_j)})} \\ -\frac{1}{n_s}\sum_{i=1}^{n_s}{log(\sum_{k=1}^{K}{p^{st}_{k}(x^s_i)})}$$ It's is important to notice that this is completely computed in an unsupervised manner, so it is possible to take advantage also of target training samples. It is also possible to see $\sum_{k=1}^{K}{p^{st}_{K+k}(x^t_j)}$ and $\sum_{k=1}^{K}{p^{st}_{k}(x^s_i)}$ as the probability of classify an input sample x as target or source respectively.

#### Feature extractor loss

As in other strategies for adversarial training in domain adapatation, the aim is to find a feature extractor G that is invariant to the domain (So the aim is to find a feature extractor that can generalize better). To do that, the [paper](https://arxiv.org/pdf/1904.04663.pdf) proposes a "two-level domain confusion" method based on a domain-level confusion loss and a category-level confusion loss.
The objective for updating the feature extractor loss, is the following:

$$\min_{G} \mathcal{F}^{st}_{category}(G, C^{st}) + \lambda (\mathcal{F}^{st}_{domain}(G, C^{st}) + \mathcal{M}^{st}(G, C^{st}))$$

Where $\lambda \in [0,1]$ is a tradeoff parameters to suppress noisy signals of $\mathcal{F}^{st}_{domain}(G, C^{st})$ and $\mathcal{M}^{st}(G, C^{st}))$ at early stages of training. This is because at the beginning, convolutional features aren't extracting meaningful information (since the network is not trained yet), so we need better convolutional features before starting to confuse them.

As for the classifier objective, it is possible to distiniguish three distinct terms:

- 

In [None]:
def source_loss(output, label):
  """
  Returns
  -------
  Cross entropy loss
  """
  loss_fun = nn.CrossEntropyLoss()
  loss = loss_fun(output, label)
  return loss

def target_loss(output, label):
  return source_loss(output, label)

def source_target_loss(output, st = True):
  """
  st = True if train sample belongs to source, False otherwise
  """
  n_classes = int(output.size(1)/2)
  soft = nn.Softmax(dim=1)
  prob_out = soft(output)
  if st:
    loss = -(prob_out[:,:n_classes].sum(1).log().mean())
  else:
    loss = -(prob_out[:,n_classes:].sum(1).log().mean())
  return loss

def feature_category_loss(output_st, label):
  n_classes = int(output_st.size(1)/2)

  loss_fun_1 = nn.CrossEntropyLoss()
  loss_fun_2 = nn.CrossEntropyLoss()

  loss_1 = loss_fun_1(output_st[:, :n_classes], label)/2
  loss_2 = loss_fun_2(output_st[:,n_classes:], label)/2
  return loss_1 + loss_2

def feature_domain_loss(output_st):
  n_classes = int(output_st.size(1)/2)

  soft = nn.Softmax(dim=1)
  prob_out = soft(output_st)

  loss_1 = -(prob_out[:,:n_classes]).sum(1).log().mean()/2
  loss_2 = -(prob_out[:,n_classes:]).sum(1).log().mean()/2

  return loss_1 + loss_2



def entropyMinimizationPrinciple(output_st):
    nr_classes = int(output_st.size(1)/2)
    soft = nn.Softmax(dim=1)
    prob_out = soft(output_st)

    p_st_source = prob_out[:, :nr_classes]
    p_st_target = prob_out[:, nr_classes:]
    qst = p_st_source + p_st_target

    emp = -qst.log().mul(qst).sum(1).mean()

    return emp

In [None]:
def training_step_uda(net, src_data_loader, target_data_loader, optimizer, lam, e,device = 'cuda'):
  source_samples = 0.
  target_samples = 0.
  cumulative_classifier_loss = 0.
  cumulative_feature_loss = 0.
  cumulative_accuracy = 0.

  target_iter = iter(target_data_loader)

  net.train()

  # iterate over the training set
  for batch_idx, (inputs_source, labels) in enumerate(src_data_loader):
    try:
      inputs_target, _ = next(target_iter)
      inputs_target = inputs_target.to(device)
    except:
      target_iter = iter(target_data_loader)
      inputs_target, _ = next(target_iter)
      inputs_target = inputs_target.to(device)
    
    # load data into GPU
    inputs_source = inputs_source.to(device)
    labels = labels.to(device)

    length_source_input = inputs_source.shape[0]

    ## concatenation along batch dimension.
    inputs = torch.cat((inputs_source, inputs_target), dim=0)

    # forward pass
    c_s, c_t, c_st = net(inputs)

    c_s_source = c_s[:length_source_input,:]
    c_s_target = c_s[length_source_input:,:]

    c_t_source = c_t[:length_source_input,:]
    c_t_target = c_t[length_source_input:,:]

    c_st_source = c_st[:length_source_input,:]
    c_st_target = c_st[length_source_input:,:]


    # Equation 5 of the paper
    error_source_task = source_loss(c_s_source, labels)

    # Equation 6 of the paper
    error_target_task = target_loss(c_t_source, labels)

    # Equation 7 of the paper
    domain_loss_source = source_target_loss(c_st_source)
    domain_loss_target = source_target_loss(c_st_target, st = False)
    error_domain = domain_loss_source + domain_loss_target

    classifier_total_loss = error_source_task + error_target_task + error_domain

    classifier_total_loss.backward(retain_graph = True)

    for param in net.feature_extractor.parameters():
      param.grad.data.zero_()
    
    class_params = []
    for param in net.source_classifier.parameters():
      class_params.append(param.grad.data.clone())
      param.grad.data.zero_()
    for param in net.target_classifier.parameters():
      class_params.append(param.grad.data.clone())
      param.grad.data.zero_()

    # Equation 8 of the paper
    error_feature_category = feature_category_loss(c_st_source, labels)

    # Equation 9 of the paper
    error_feature_domain = feature_domain_loss(c_st_target)

    min_entropy = entropyMinimizationPrinciple(c_st_target)

    # Equations 11 of the paper
    feature_total_loss = error_feature_category + lam * (error_feature_domain + min_entropy)

    feature_total_loss.backward()

    idx = 0
    for param in net.source_classifier.parameters():
      param.grad.data = class_params[idx]
      idx += 1
    for param in net.target_classifier.parameters():
      param.grad.data = class_params[idx]
      idx += 1

    
    optimizer.step()
    optimizer.zero_grad()

    ## optimizer classifier losses composed loss
    ## order is important here!
    


    # print statistics
    source_samples+=inputs_source.shape[0]
    target_samples+=inputs_target.shape[0]
    
    cumulative_classifier_loss += classifier_total_loss.item()
    cumulative_feature_loss += feature_total_loss.item()
    _, predicted = c_s_source.max(dim = 1) ## to get the maximum probability
    cumulative_accuracy += predicted.eq(labels).sum().item()

  return cumulative_classifier_loss/source_samples, cumulative_feature_loss/target_samples, cumulative_accuracy/source_samples*100


In [None]:
def test_step(net, data_target_test_loader, device='cuda:0'):

    '''
    Params
    ------

    net : model 
    data_loader : DataLoader obj of the domain to test on
    cost_function : cost function used to address accuracies (not necessary) -> TargetClassifierLoss
    device : GPU or CPU device

    '''

    samples = 0.
    cumulative_loss = 0.
    cumulative_accuracy = 0.

    net.eval()

    with torch.no_grad():

        for batch_idx, (inputs, labels) in enumerate(data_target_test_loader):

            # load data into GPU
            inputs = inputs.to(device)
            targets = labels.to(device)
        
            # forward pass
            _, c_t, _ = net(inputs)

            # apply the loss
            loss = target_loss(c_t, targets)

            # print statistics
            samples+=inputs.shape[0]
            cumulative_loss += loss.item() # Note: the .item() is needed to extract scalars from tensors
            _, predicted = c_t.max(1)
            cumulative_accuracy += predicted.eq(targets).sum().item()

    return cumulative_loss/samples, cumulative_accuracy/samples*100

In [None]:
classes = ['backpack', 'bookcase', 'car jack', 'comb', 'crown', 'file cabinet', 'flat iron', 'game controller', 'glasses', 'helicopter', 'ice skates', 'letter tray', 'monitor', 'mug', 'network switch', 'over-ear headphones', 'pen', 'purse', 'stand mixer', 'stroller']

cuda = "cuda" if torch.cuda.is_available() else "cpu"
BATCH_SIZE = 128
num_classes = len(classes)
rootdir_matteo = '/content/gdrive/MyDrive/Colab Notebooks/Deep Learning labs/DA Project/adaptiope_small'
rootdir_alessandro = 'gdrive/My Drive/Colab Notebooks/data/adaptiope_small'
rootdir_alessandro_uni = 'gdrive/My Drive/project/data/adaptiope_small'

### Product $\to$ Real World

In [None]:
from torch.utils.tensorboard import SummaryWriter
import math

def main_uda(batch_size=128,
         device=cuda, 
         lr = 0.01,
         weight_decay=0.000001, 
         momentum=0.9, 
         epochs=40,
         entropy_loss_weight=0.1,
         nr_classes = num_classes, 
         img_root=rootdir_alessandro
         ):
    
  # writer = SummaryWriter(log_dir="gdrive/My Drive/Colab Notebooks/runs/exp2")

  ## DataLoader split the size of the given dataset into #of elements in the dataset/batch size
  source_train_loader, source_test_loader, target_train_loader, target_test_loader = get_data(batch_size, img_root)
  print('DataLoaders Done')
  net = SymNet().to(device)
  print('Network Init Done')
  optimizer = SymNetOptimizer(model = net, nr_epochs = epochs) #get_optimizer_ADAM_uda(model=net, e=0, nr_epochs = epochs,lr=lr, wd=weight_decay)
  # optimizer_2 = get_optimizer_ADAM_uda(model=net, lr=lr, wd=weight_decay, e=0, nr_epochs=epochs, classifier=False)
  print('Got optimizers')

  for e in range(epochs):
    lam = 2 / (1 + math.exp(-1 * 10 * e / epochs)) - 1
    #def training_step_uda(net, src_data_loader, target_data_loader, optimizer_1, optimizer_2, lam, device = 'cuda')
    train_ce_loss, train_en_loss, train_accuracy = training_step_uda(net=net, src_data_loader=source_train_loader, 
                                                        target_data_loader=target_train_loader, 
                                                        optimizer=optimizer, lam=lam, e=e, device=device)
    torch.cuda.empty_cache()
    
    test_loss, test_accuracy = test_step(net, target_test_loader, device)

    print('Epoch: {:d}'.format(e+1))
    print('\t Train: CE loss {:.5f}, Entropy loss {:.5f}, Accuracy {:.2f}'.format(train_ce_loss, train_en_loss, train_accuracy))
    print('\t Test: CE loss {:.5f}, Accuracy {:.2f}'.format(test_loss, test_accuracy))
    print('-----------------------------------------------------')
    optimizer.update_lr()

In [None]:
torch.cuda.empty_cache()

In [None]:
main_uda()

In [None]:
!kill 490
runs = f"{runsdir_matteo}/DA/PRD2RW"
%load_ext tensorboard
%tensorboard --logdir=runs

### Real World $\to$ Product

In [None]:
def main_uda_RW2PRD(batch_size=128,
         device=cuda, 
         lr = 0.01,
         weight_decay=0.000001, 
         momentum=0.9, 
         epochs=30,
         entropy_loss_weight=0.1,
         nr_classes = num_classes, 
         img_root=rootdir_matteo
         ):
    
  writer = SummaryWriter(log_dir=f"{runsdir_matteo}/DA/RW2PRD")

  ## DataLoader split the size of the given dataset into #of elements in the dataset/batch size
  target_train_loader, target_test_loader, source_train_loader, source_test_loader = get_data(batch_size, img_root)
  print('DataLoaders Done')
  net = SymNet().to(device)
  print('Network Init Done')
  optimizer = SymNetOptimizer(model = net, nr_epochs = epochs) #get_optimizer_ADAM_uda(model=net, e=0, nr_epochs = epochs,lr=lr, wd=weight_decay)
  # optimizer_2 = get_optimizer_ADAM_uda(model=net, lr=lr, wd=weight_decay, e=0, nr_epochs=epochs, classifier=False)
  print('Got optimizers')

  for e in range(epochs):
    lam = 2 / (1 + math.exp(-1 * 10 * e / epochs)) - 1
    #def training_step_uda(net, src_data_loader, target_data_loader, optimizer_1, optimizer_2, lam, device = 'cuda')
    train_ce_loss, train_en_loss, train_accuracy = training_step_uda(net=net, src_data_loader=source_train_loader, 
                                                        target_data_loader=target_train_loader, 
                                                        optimizer=optimizer, lam=lam, e=e, device=device)
    torch.cuda.empty_cache()
    
    test_loss, test_accuracy = test_step(net, target_test_loader, device)

    print('Epoch: {:d}'.format(e+1))
    print('\t Train: CE loss {:.5f}, Entropy loss {:.5f}, Accuracy {:.2f}'.format(train_ce_loss, train_en_loss, train_accuracy))
    print('\t Test: CE loss {:.5f}, Accuracy {:.2f}'.format(test_loss, test_accuracy))
    print('-----------------------------------------------------')


    # add values to logger
    writer.add_scalar('Loss/train_ce_loss', train_ce_loss, e + 1)
    writer.add_scalar('Loss/train_en_loss', train_en_loss, e + 1)
    writer.add_scalar('Loss/test_loss', test_loss, e + 1)
    writer.add_scalar('Accuracy/train_accuracy', train_accuracy, e + 1)
    writer.add_scalar('Accuracy/test_accuracy', test_accuracy, e + 1)


    optimizer.update_lr()
  
  writer.close()

In [None]:
torch.cuda.empty_cache()

In [None]:
main_uda_RW2PRD()