# Unsupervised Domain Adaptation Project


## 1: Data download
Load data to project from Google Drive. Copy a subset of classes of images to the path:
- `adaptiope_small/product_images`
- `adaptiope_small/real_life` 

two directories. They represent images from two different domain **product** and **real_life**

In [1]:
from os import makedirs, listdir
from tqdm import tqdm
from google.colab import drive
from os.path import join
from shutil import copytree

drive.mount('/content/gdrive')

!mkdir dataset
!cp "gdrive/My Drive/Colab Notebooks/data/Adaptiope.zip" dataset/
# !ls dataset

!unzip -qq dataset/Adaptiope.zip   # unzip file

!rm -rf dataset/Adaptiope.zip 
!rm -rf adaptiope_small

Mounted at /content/gdrive


In [2]:
!mkdir adaptiope_small
classes = listdir("Adaptiope/product_images")
print(classes)
classes = ["backpack", "bookcase", "car jack", "comb", "crown", "file cabinet", "flat iron", "game controller", "glasses",
           "helicopter", "ice skates", "letter tray", "monitor", "mug", "network switch", "over-ear headphones", "pen",
           "purse", "stand mixer", "stroller"]
domain_classes = ["product_images", "real_life"]
for d, td in zip(["Adaptiope/product_images", "Adaptiope/real_life"], ["adaptiope_small/product_images", "adaptiope_small/real_life"]):
  makedirs(td)
  for c in tqdm(classes):
    c_path = join(d, c)
    c_target = join(td, c)
    copytree(c_path, c_target)

['tape dispenser', 'comb', 'car jack', 'toothbrush', 'handcuffs', 'magic lamp', 'usb stick', 'glasses', 'tank', 'computer', 'rubber boat', 'shower head', 'telescope', 'quadcopter', 'scooter', 'tyrannosaurus', 'rc car', 'syringe', 'cordless fixed phone', 'bottle', 'vr goggles', 'razor', 'helicopter', 'hat', 'dart', 'compass', 'hoverboard', 'corkscrew', 'projector', 'file cabinet', 'smoking pipe', 'rifle', 'mug', 'fan', 'sewing machine', 'pen', 'keyboard', 'knife', 'trash can', 'tent', 'drum set', 'nail clipper', 'phonograph', 'monitor', 'toilet brush', 'skateboard', 'electric guitar', 'screwdriver', 'coat hanger', 'speakers', 'boxing gloves', 'roller skates', 'computer mouse', 'ladder', 'motorbike helmet', 'scissors', 'handgun', 'power strip', 'ruler', 'microwave', 'golf club', 'stapler', 'watering can', 'over-ear headphones', 'umbrella', 'pipe wrench', 'vacuum cleaner', 'purse', 'in-ear headphones', 'webcam', 'pikachu', 'letter tray', 'chainsaw', 'ice cube tray', 'fighter jet', 'grill'

100%|██████████| 20/20 [00:01<00:00, 15.10it/s]
100%|██████████| 20/20 [00:00<00:00, 32.13it/s]


In [3]:
product_path = 'adaptiope_small/product_images'
real_life_path = 'adaptiope_small/real_life'

## 2: Domain-Adversarial training of Neural Network 

We implement DANN UDA method [DANN](https://arxiv.org/pdf/1505.07818.pdf)  

![DANN.png](https://raw.githubusercontent.com/CrazyAlvaro/UDA/main/images/DANN.png)

As displayed in the model architecture above, DANN is consist of three component: feature extractor, domain classifier, and label predictor(classifier).

While in order to adversarial training from both label predictor and domain classifier, a gradient reversal layer(GRL) is added.

### 2.0: Import Libraries and Data Loading


In [4]:
from PIL import Image
from os.path import join
import math

img = Image.open(join(product_path, 'backpack', 'backpack_003.jpg'))
print('Image size: ', img.size)
#img

Image size:  (679, 679)


import libraries

In [5]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import softmax
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.models import vgg11, alexnet 
from torch.utils.data import DataLoader, random_split
from torchvision.transforms.transforms import ToTensor

configuration constants

In [6]:
img_size = 256
# mean, std used by pre-trained models from PyTorch
mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
config = dict(epochs=10, batch_size=64,lr=0.01, wd=0.001, momentum=0.9, alpha=10, beta=0.75, gamma=10)

Configue GPU

In [7]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


In [8]:
def get_dataset(root_path):
  '''
    Get dataset from specific data path

    # parameters:
        root_path: path to image folder

    # return: train_loader, test_loader
  '''
  # Construct image transform
  image_transform = transforms.Compose([
    transforms.Resize(img_size),
    transforms.CenterCrop(img_size),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
  ])

  # Load data from filesystem
  image_dataset = ImageFolder(root_path, transform=image_transform)

  return image_dataset

def get_dataloader(dataset, batch_size, shuffle_train=True, shuffle_test=False):
  '''
    Get DataLoader from specific data path

    # parameters:
        dataset: ImageFolder instance
        batch_size: batch_size for DataLoader
        shuffle_train: whether to shuffle training data
        shuffle_test: whether to shuffle test data
  '''
  # Get train, test number
  num_total = len(dataset)
  num_train = int(num_total * 0.8 + 1)
  num_test  = num_total - num_train

  # random split dataset
  data_train, data_test = random_split(dataset, [num_train, num_test])

  # initialize dataloaders
  loader_train = DataLoader(data_train, batch_size=batch_size, shuffle=shuffle_train)
  loader_test  = DataLoader(data_test, batch_size=batch_size, shuffle=shuffle_test)

  return loader_train, loader_test

### 2.1 Define Feature Extractor with Pretrain Network

For the feature extractor, we select pretrained AlexNet. 
The reason to choose AlexNet comparing more recent Network like 
Residural Network is because it has a good balance between model 
performance and traning complexity. Even though ResNet may perform 
better than AlexNet by a reasonable amount of gain, it takes way much longer 
to train or tune a complex network like this, which will dramatically increase the training time.

In [9]:
class FeatureExtractor(nn.Module):
  """
  FeatureExtractor

  Pretrained neural network as a backbone for later domain adaptation task
  """
  def __init__(self):
    super(FeatureExtractor, self).__init__()

    # Feature Extractor with AlexNet
    self.feature_extractor = alexnet(weights='DEFAULT')
    self.feature_dim = self.feature_extractor.classifier[-1].in_features

    # make the last layer identity
    self.feature_extractor.classifier[-1] = nn.Identity()

  def forward(self, x):
    return self.feature_extractor(x)
  
  def output_dim(self):
    return self.feature_dim

### 2.2 Define Classifier, Discriminator with RevereLayerF for training the Feature Extractor

For the classifier, we implement a three fully connected linear layer 
with LeakyReLU because of its advantage over regular ReLU activation function. 
And finally we use Logrithm Softmax function as the selection layer.

In [10]:
from torch.nn.modules.activation import Softmax
class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(Classifier, self).__init__()
        self.classifier = nn.Sequential(
            nn.Linear(input_dim, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, output_dim),
            nn.LogSoftmax(dim=1)
        )
    
    def forward(self, X):
        return self.classifier(X) 

Here we implement a ReverseLayer Function to pass thourgh the data 
as it is without doing any compuation, but on backward propagation, 
it will reverse the sign of the value to provide the capability 
to adversarial training from the later Discriminator.

In [11]:
from torch.autograd import Function

class ReverseLayerF(Function):
    @staticmethod
    def forward(ctx, tensor): 
        """
        Without doing any computation
        """
        return tensor.view_as(tensor)

    @staticmethod
    def backward(ctx, grad_output):
        """
        Change the sign of the gradient 
        """
        return grad_output.neg(), None

Here for the discriminator, we only implement a two-layer linear connection here to avoid overly complicate the Discriminator, becuase usually more complex discriminator will have a negative effect on adversarial training.

In [12]:
class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super(Discriminator, self).__init__()
        self.discriminator =  nn.Sequential(
            nn.Linear(int(input_dim), 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024,1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Sigmoid()
        )

    def forward(self, x):
        validity = self.discriminator(x)
        return validity 

Finally we connect the feature extractor, classifer, and discriminator to form a 
Domain Adversarial Neural Network. Both classifier and discriminator will take in 
data from the fearture extractor processed the imput images. The classifer take the 
number of classes as input, the output is the prediction of which class of the current 
image belong. While the discriminator are supposed to distinguish the image from two 
different domains.

The discriminator will serve as a doamin alignment unit to train the feature extractor 
to extract domain independent features from both domains.

In [13]:
class DANN(nn.Module):
  """ 
  DANN

  Implement the domain adversarial neural network that train the feature extractor from both classification and discrimination of 
  different domains.
  """
  def __init__(self, num_classes):
    """ 
    Parameter:
      @num_classes: number of classes of different images
    """
    super(DANN, self).__init__()
    self.output_dim = num_classes

    # define inner network component
    self.feature_extractor = FeatureExtractor()
    self.classifier = Classifier(self.feature_extractor.output_dim(), num_classes)
    self.discriminator = Discriminator(self.feature_extractor.output_dim())  
  
  def forward(self, x):
    feature_output = self.feature_extractor(x)

    class_pred = self.classifier(feature_output)

    # Add a ReverseLayer here for negative gradient computation
    reverse_feature = ReverseLayerF.apply(feature_output)
    domain_pred = self.discriminator(reverse_feature)

    return class_pred, domain_pred 

### 2.3 Cost function

For the classification loss function, we use the common
Cross Entropy Loss function.

In [14]:
def get_class_loss_func():
  return nn.CrossEntropyLoss()

### 2.4 Optimizer

Setting the **learning rate** according to the original [paper](https://arxiv.org/pdf/1505.07818.pdf) section 5.2.2

$$ \mu_p =  \frac{\mu_0}{(1+\alpha \cdot p)^\beta}$$

where p is the training progress linearly changing from 0 to 1.

And for the learning rate, for the pretrain weights, we set the learning rate only to be 1/10 
of the learning rate for the classifier. And we use Stochastic Gradient Descent to optimize the 
model.

In [15]:
def get_optimizer(model, config, progress, adversarial=True):
  '''
  get_optimizer

  parameter:
    @model: Neural Network to be optimizd
    @config: configuration dictionary contains parameters
    @progress: training progress to configurate learning rate
    @adersarial: if we are in adversarial traning scenario 

  return:
    @optimizer: the optimizer we use to train our model

  '''
  learning_rate = config['lr']
  learning_rate = learning_rate / ((1 + config['alpha']*progress)**config['beta'])

  weight_decay  = config['wd']
  momentum      = config['momentum']

  feature_ext   = model.get_submodule("feature_extractor")
  classifier    = model.get_submodule("classifier")
  discriminator = model.get_submodule("discriminator")

  pre_trained_weights = feature_ext.parameters()

  if adversarial:
    other_weights = list(classifier.parameters()) + list(discriminator.parameters())
  else:
    other_weights = list(classifier.parameters())

  # assign parameters to parameters
  optimizer = torch.optim.SGD([
    {'params': pre_trained_weights},
    {'params': other_weights, 'lr': learning_rate}
  ], lr= learning_rate/10, weight_decay=weight_decay, momentum=momentum)
  
  return optimizer

### 2.5 Training Loop and Testing Loop

In [16]:
def train_loop(dataloader, model, device, progress):
  """
  train_loop

  Iterate through dataloader to train the network with SGD optimizer.

  Parameters:
    @dataloader: Pytorch dataloader to iterate through training
    @model: Neural Network model that we are training
    @device: GPU or CPU
    @progress: the progress of traning based on current epoch over total epochs
  """
  size = len(dataloader.dataset)
  loss_fn = get_class_loss_func()

  optimizer = get_optimizer(model, config, progress, adversarial=False)

  for batch, (X, y) in enumerate(dataloader):
    X, y = X.to(device), y.to(device)
    
    # compute prediction and loss
    class_pred, _ = model(X)

    # classification loss
    loss = loss_fn(class_pred, y) 
    curr_loss = loss.item()
    
    # backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if batch % 10 == 0:
      current = batch * len(X)
      print(f"## Meter ## current loss: {curr_loss:>7f} [{current:>5d}/{size:>5d}]")

In [17]:
def test_loop(dataloader, model, device):
  """ 
  test_loop

  Test the model by iterate through the dataloader and compute the correctness.

  Parameters:
    @dataloader: Pytorch dataloader to iterate through training
    @model: Neural Network model that we are training
    @progress: the progress of traning based on current epoch over total epochs
  
  @return:
    @test_loss: test loss
    @correct: correctness of the test data set
  """
  test_loss, correct = 0, 0
  loss_fn = get_class_loss_func()

  with torch.no_grad():
    for X, y in dataloader:
      X, y = X.to(device), y.to(device)
      class_pred, _ = model(X)

      test_loss += loss_fn(class_pred, y).item()
      correct += (class_pred.argmax(1) == y).type(torch.float).sum().item()

  size = len(dataloader.dataset)
  num_batches = len(dataloader)

  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

  return test_loss, correct

### 2.6 Training Function

In [41]:
def training(model, train_dataloader, test_dataloader, config, device):
  """ 
  training

  Acturall training function iterate through the training epochs
  """
  epochs = config['epochs']

  for epoch in range(epochs):
    print(f"Epoch {epoch+1}\n------------------")
    progress = epoch/epochs

    train_loop(train_dataloader, model, device, progress)

  test_loop(train_dataloader, model, device)
  print("Done")

## 3 Training without using Domain Adaptation techniques

 ### 3.1 Product Domain -> Real Life

In [19]:
# Get dataloader
product_dataset   = get_dataset(product_path)
real_life_dataset = get_dataset(real_life_path)

#### 3.1.1 Training on Source Domain

In [20]:
train_dataloader, test_dataloader = get_dataloader(product_dataset, config['batch_size'])

model = DANN(len(product_dataset.classes)).to(device)

# Training
training(model, train_dataloader, test_dataloader, config, device)

Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth


  0%|          | 0.00/233M [00:00<?, ?B/s]

Epoch 1
------------------
## Meter ## current loss: 3.022623 [    0/ 1601]
## Meter ## current loss: 2.370648 [  640/ 1601]
## Meter ## current loss: 0.623379 [ 1280/ 1601]
Epoch 2
------------------
## Meter ## current loss: 0.561586 [    0/ 1601]
## Meter ## current loss: 0.368818 [  640/ 1601]
## Meter ## current loss: 0.529716 [ 1280/ 1601]
Epoch 3
------------------
## Meter ## current loss: 0.151164 [    0/ 1601]
## Meter ## current loss: 0.156962 [  640/ 1601]
## Meter ## current loss: 0.271796 [ 1280/ 1601]
Epoch 4
------------------
## Meter ## current loss: 0.132108 [    0/ 1601]
## Meter ## current loss: 0.124176 [  640/ 1601]
## Meter ## current loss: 0.094285 [ 1280/ 1601]
Epoch 5
------------------
## Meter ## current loss: 0.102055 [    0/ 1601]
## Meter ## current loss: 0.076171 [  640/ 1601]
## Meter ## current loss: 0.113716 [ 1280/ 1601]
Epoch 6
------------------
## Meter ## current loss: 0.055870 [    0/ 1601]
## Meter ## current loss: 0.056029 [  640/ 1601]
## Me

#### 3.1.2 Test on Real Life

In [21]:
loader_target_dataset = DataLoader(real_life_dataset, batch_size=config['batch_size'], shuffle=False)

# model.load_state_dict(torch.load('model_state.pt', map_location='cpu'))
test_loop(loader_target_dataset, model, device)

Test Error: 
 Accuracy: 63.5%, Avg loss: 1.235210 



(1.2352098813280463, 0.635)

In [22]:
del train_dataloader, test_dataloader, loader_target_dataset
del model
print(torch.cuda.memory_allocated())

513480192


### 3.2 Real Life -> Product


#### 3.2.1 Training on Real Life

In [23]:
train_dataloader, test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])

model = DANN(len(real_life_dataset.classes)).to(device)

# Training
training(model, train_dataloader, test_dataloader, config, device)

Epoch 1
------------------
## Meter ## current loss: 3.017269 [    0/ 1601]
## Meter ## current loss: 2.727951 [  640/ 1601]
## Meter ## current loss: 1.711068 [ 1280/ 1601]
Epoch 2
------------------
## Meter ## current loss: 2.275105 [    0/ 1601]
## Meter ## current loss: 1.220824 [  640/ 1601]
## Meter ## current loss: 1.335409 [ 1280/ 1601]
Epoch 3
------------------
## Meter ## current loss: 0.973024 [    0/ 1601]
## Meter ## current loss: 0.608552 [  640/ 1601]
## Meter ## current loss: 0.610510 [ 1280/ 1601]
Epoch 4
------------------
## Meter ## current loss: 0.376343 [    0/ 1601]
## Meter ## current loss: 0.473844 [  640/ 1601]
## Meter ## current loss: 0.344296 [ 1280/ 1601]
Epoch 5
------------------
## Meter ## current loss: 0.477025 [    0/ 1601]
## Meter ## current loss: 0.330870 [  640/ 1601]
## Meter ## current loss: 0.323125 [ 1280/ 1601]
Epoch 6
------------------
## Meter ## current loss: 0.224157 [    0/ 1601]
## Meter ## current loss: 0.283363 [  640/ 1601]
## Me

#### 3.2.2 Testing on Product


In [24]:
loader_target_dataset = DataLoader(product_dataset, batch_size=config['batch_size'], shuffle=False)

# model.load_state_dict(torch.load('model_state.pt', map_location='cpu'))
test_loop(loader_target_dataset, model, device)

Test Error: 
 Accuracy: 80.3%, Avg loss: 0.723673 



(0.7236725512193516, 0.8035)

In [25]:
del train_dataloader, test_dataloader, loader_target_dataset
del model
print(torch.cuda.memory_allocated())

513021440


## 4: Define UDA functions


### 4.1 Adversarial Discriminator Loss

Here we compute the discrimination loss of from the discriminator

In [26]:
def get_discriminator_loss(source_pred, target_pred): 
    """ 
    get_discriminator_loss

    parameters:
        @source_pred: model prediction from the source data
        @target_pred: model prediction from the target data

    return:
        @domain_loss: computed domain loss
    """
    domain_pred = torch.cat((source_pred, target_pred),dim=0).cuda()
    #print(domain_pred.shape) # [128,1024]
    source_truth = torch.zeros(len(source_pred))
    target_truth = torch.ones(len(target_pred))
    domain_truth = torch.cat((source_truth, target_truth),dim=0).cuda()
    #print(domain_truth.shape) # [128]

    domain_loss = domain_truth*torch.log(1/domain_pred)+(1-domain_truth)*torch.log(1/(1-domain_pred))
    domain_loss = domain_loss.mean()

    return domain_loss 

### 4.2 Adversarial optimizer

We are using the Stochastic Gradient Descent optimizer and 
set learning rate for the pre_trained_weights to be 1/10
of other learning rate.

In [27]:
def get_adversarial_optimizer(model, config, progress, adversarial=True):
  '''
  Get Adversarial Optimizers
  '''
  lr, wd, momtm = config['lr'], config['wd'], config['momentum']
  lr = lr / ((1 + config['alpha']*progress)**config['beta'])

  feature_ext   = model.get_submodule("feature_extractor")
  classifier    = model.get_submodule("classifier")
  discriminator = model.get_submodule("discriminator")

  pre_trained_weights   = feature_ext.parameters()
  classifier_weights    = classifier.parameters()
  discriminator_weights = discriminator.parameters()

  feature_optim       = torch.optim.SGD([{'params': pre_trained_weights}],     lr=lr/10, weight_decay=wd, momentum=momtm)
  classifier_optim    = torch.optim.SGD([{'params': classifier_weights}],      lr=lr,    weight_decay=wd, momentum=momtm)
  discriminator_optim = torch.optim.SGD([{'params': discriminator_weights}],   lr=lr,    weight_decay=wd, momentum=momtm)
  
  return feature_optim, classifier_optim, discriminator_optim 

### 4.3 Adversarial Train Loop

Setting the **domain adaptation parameter** according to the original [paper](https://arxiv.org/pdf/1505.07818.pdf) section 5.2.2

$$ \lambda_p = \frac{2}{1 + exp(-\gamma \cdot p)} - 1 $$

where p is the training progress linearly changing from 0 to 1.

So here we optimize the model by calculating the classification loss and discrimination loss. 
Then we optimize the classifier, the discriminator, and the feature extractor based
on the loss we get.

In [28]:
def adversarial_train_loop(source_loader, target_loader, model, config, progress, device):
  """
  parameters:
    @source_loader
    @target_loader
    @model
    @config
    @progress
    @device

  return:
    @best_state
    @best_loss
  """
  size = len(source_loader.dataset)
  
  # cross entropy loss
  classification_loss = get_class_loss_func()

  # Get three optimizer
  feature_optim, class_optim, discriminator_optim = get_adversarial_optimizer(model, config, progress)

  # Target data loader iterator
  iter_target = iter(target_loader)

  domain_adapt = 2 / (1 + math.exp(-config['gamma']*progress)) - 1

  for batch, (X_source, y_source) in enumerate(source_loader):
    try:
      X_target, _ = next(iter_target)
    except:
      iter_target = iter(target_loader)
      X_target, _ = next(iter_target)  

    # Some internal bug return nested tesnor with size 1
    if len(X_source) < 64:
      continue

    X_source, y_source, X_target = X_source.to(device), y_source.to(device), X_target.to(device)

    class_pred_source, domain_pred_source = model(X_source)
    _,                 domain_pred_target = model(X_target)

    class_loss   = classification_loss(class_pred_source, y_source)
    discrim_loss = get_discriminator_loss(domain_pred_source, domain_pred_target)

    feature_optim.zero_grad()

    # Update discriminator
    discriminator_optim.zero_grad()
    discrim_loss.backward(retain_graph=True)
    discriminator_optim.step()

    # Update classifier
    class_optim.zero_grad()
    class_loss.backward(retain_graph=True)
    class_optim.step()

    # Update feature extractor
    feature_optim.step()  

    # Total loss
    total_loss = class_loss - domain_adapt * discrim_loss 

    if batch % 10 == 0:
      class_loss, discrim_loss, current = class_loss.item(), discrim_loss.item(), batch * len(X_source)
      total_loss = total_loss.item()
      # print(f"## Meter  ## [{current:>5d}/{size:>5d}]")
      print(f"## Meter  ## classification loss: {class_loss:>7f} discrim loss: {discrim_loss:>7f} total loss: {total_loss:>7f}[{current:>5d}/{size:>5d}]")

    del class_loss, discrim_loss 
    del X_source, y_source, X_target, class_pred_source, domain_pred_source, domain_pred_target
  
  # return best_state, best_loss

### 4.4 Adversarial Test Loop

In [29]:
def adversarial_test_loop(dataloader, model, device, name=""):
  """ 
  adversarial_test_loop

  Test the model compute the loss and accuracy
  """
  test_loss, correct = 0, 0

  class_loss_func = get_class_loss_func()

  with torch.no_grad():
    for X, y in dataloader:
      X, y = X.to(device), y.to(device)
      class_pred, _ = model(X)

      test_loss += class_loss_func(class_pred, y).item()
      correct += (class_pred.argmax(1) == y).type(torch.float).sum().item()

  size = len(dataloader.dataset)
  num_batches = len(dataloader)

  test_loss /= num_batches
  correct /= size
  print(f"{name} Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

  return test_loss, correct

### 4.5 Adversarial Training

In [30]:
def adversarial_training(model, source_loader, source_test_loader, target_loader, config, device):
  """ 
  adversarial_training

  Training the adversarial model with the config
  """
  no_improve_count = 0

  for epoch in range(config['epochs']):
    print(f"Epoch {epoch+1}\n------------------")
    progress = epoch/config['epochs']

    adversarial_train_loop(source_loader, target_loader, model, config, progress, device)

    source_loss, _ = adversarial_test_loop(source_test_loader, model, device, "Source Test")

  print("Done")

## 5 Training with UDA Techniques

In [31]:
torch.cuda.empty_cache()

In [32]:
adv_model = DANN(len(product_dataset.classes)).to(device)

### 5.1 Product -> Real Life

#### 5.1.1 Training on Product



In [33]:
train_dataloader, train_test_dataloader = get_dataloader(product_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])

In [34]:
torch.autograd.set_detect_anomaly(True)
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

Epoch 1
------------------
## Meter  ## classification loss: 3.021090 discrim loss: 0.696953 total loss: 3.021090[    0/ 1601]
## Meter  ## classification loss: 2.344038 discrim loss: 0.696019 total loss: 2.344038[  640/ 1601]
## Meter  ## classification loss: 0.490591 discrim loss: 0.695255 total loss: 0.490591[ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 78.9%, Avg loss: 0.787940 

Epoch 2
------------------
## Meter  ## classification loss: 0.333283 discrim loss: 0.696012 total loss: 0.011644[    0/ 1601]
## Meter  ## classification loss: 0.363338 discrim loss: 0.695608 total loss: 0.041885[  640/ 1601]
## Meter  ## classification loss: 0.492627 discrim loss: 0.695451 total loss: 0.171248[ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 91.0%, Avg loss: 0.356655 

Epoch 3
------------------
## Meter  ## classification loss: 0.238965 discrim loss: 0.694827 total loss: -0.290211[    0/ 1601]
## Meter  ## classification loss: 0.219590 discrim loss: 0.694609 total loss: -0.309420[  

#### 5.1.2 Testing on Real Life

In [35]:
loader_target_dataset = DataLoader(real_life_dataset, batch_size=config['batch_size'], shuffle=False)

test_loop(loader_target_dataset, adv_model, device)

Test Error: 
 Accuracy: 63.5%, Avg loss: 1.246334 



(1.2463341187685728, 0.6355)

### 5.2 Real Life -> Product

#### 5.2.1 Training on Real Life

In [36]:
del adv_model

In [37]:
train_dataloader, train_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(product_dataset, config['batch_size'])

In [38]:
# Training
adv_model = DANN(len(product_dataset.classes)).to(device)
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

Epoch 1
------------------
## Meter  ## classification loss: 2.983728 discrim loss: 0.699459 total loss: 2.983728[    0/ 1601]
## Meter  ## classification loss: 2.726490 discrim loss: 0.695359 total loss: 2.726490[  640/ 1601]
## Meter  ## classification loss: 1.870590 discrim loss: 0.694424 total loss: 1.870590[ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 67.7%, Avg loss: 1.198479 

Epoch 2
------------------
## Meter  ## classification loss: 1.087310 discrim loss: 0.695395 total loss: 0.765956[    0/ 1601]
## Meter  ## classification loss: 1.110395 discrim loss: 0.695068 total loss: 0.789192[  640/ 1601]
## Meter  ## classification loss: 0.514551 discrim loss: 0.695518 total loss: 0.193140[ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 72.7%, Avg loss: 0.866361 

Epoch 3
------------------
## Meter  ## classification loss: 0.638297 discrim loss: 0.694762 total loss: 0.109170[    0/ 1601]
## Meter  ## classification loss: 0.457286 discrim loss: 0.694884 total loss: -0.071934[  6

#### 5.2.2 Testing on Product

In [39]:
loader_target_dataset = DataLoader(product_dataset, batch_size=config['batch_size'], shuffle=False)

test_loop(loader_target_dataset, adv_model, device)

Test Error: 
 Accuracy: 80.0%, Avg loss: 0.708086 



(0.7080864875169937, 0.7995)

In [40]:
# del source_dataset, train_dataloader, test_dataloader, target_dataset, loader_target_dataset
del adv_model

## 6 Result Analysis

### 6.1 Training Accuracy and Test Result Comparison

AlexNet(Training) serve as an upper bound of the model accuracy

| Method   | Product -> Real Life | Real Life -> Product | Avg |
|----------|:-------------:|:------:|:---:|
| AlexNet(Training) |  | |
| AlexNet |  63.5% | 80.3%  | |
| DANN(UDA) |    63.5%   |   80.0%  | |
| UDA Gain | | |


- [ ] Accuracy Gain Discuss 
- [ ] Why there is no Gain
- [ ] What potentially could be done to improve the accuracy


### 6.2 Model Selection and Hyperparameter Tuning

### 6.1 Product -> Real Life


#### 6.1.1 Purely training on Source Domain-Product

Without using any domain adaptation techniques, within 10 epochs training, the classifier network achieves  64.6% accuracy on the target domain.

#### 6.1.2 Training on both Source and Target Domains

Using DANN doamin adaptation technique, the classifier with feature extractor trained on both source and target domain achives 64.6% accuracy, which is the same as the one trained solely on the source domain, with no improvement on accuracy.

### 6.2 Real Life -> Product

#### 6.2.1 Purely Training on Source Domain-Real Life

The classifier trained solely on the source domain achives 79.2% accuracy on the target domain.

#### 6.2.2 Training on both Source and Target Domains

With feature extractor trained on both source and target domain, the classifer achives 91.9% accuracy on the target domain, which is about 12% improvement over feature extractor purely trained on source domain.