# Unsupervised Domain Adaptation Project


## 1: Data download
Load data to project from Google Drive. Copy a subset of classes of images to the path:
- `adaptiope_small/product_images`
- `adaptiope_small/real_life` 

two directories. They represent images from two different domain **product** and **real_life**

In [3]:
from os import makedirs, listdir
from tqdm import tqdm

from os.path import join
from shutil import copytree

In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

!mkdir dataset
!cp "gdrive/My Drive/Colab Notebooks/data/Adaptiope.zip" dataset/
# !ls dataset

!unzip -qq dataset/Adaptiope.zip   # unzip file

!rm -rf dataset/Adaptiope.zip 
!rm -rf adaptiope_small

Mounted at /content/gdrive


In [4]:
# !mkdir adaptiope_small
classes = listdir("Adaptiope/product_images")
print(classes)
classes = ["backpack", "bookcase", "car jack", "comb", "crown", "file cabinet", "flat iron", "game controller", "glasses",
           "helicopter", "ice skates", "letter tray", "monitor", "mug", "network switch", "over-ear headphones", "pen",
           "purse", "stand mixer", "stroller"]
domain_classes = ["product_images", "real_life"]
for d, td in zip(["Adaptiope/product_images", "Adaptiope/real_life"], ["adaptiope_small/product_images", "adaptiope_small/real_life"]):
  # makedirs(td)
  for c in tqdm(classes):
    c_path = join(d, c)
    c_target = join(td, c)
    copytree(c_path, c_target)

['sword', 'boxing gloves', 'ring binder', 'brachiosaurus', 'electric guitar', 'hand mixer', 'letter tray', 'stapler', 'diving fins', 'nail clipper', 'snow shovel', 'toilet brush', 'golf club', 'handgun', 'pipe wrench', 'spatula', 'ruler', 'power strip', 'wallet', 'computer mouse', 'acoustic guitar', 'mug', 'roller skates', 'glasses', 'compass', 'power drill', 'trash can', 'over-ear headphones', 'grill', 'bottle', 'fan', 'smartphone', 'smoking pipe', 'phonograph', 'ice skates', 'purse', 'mixing console', 'in-ear headphones', 'electric shaver', 'calculator', 'game controller', 'pen', 'keyboard', 'flat iron', 'sewing machine', 'bicycle helmet', 'car jack', 'hair dryer', 'printer', 'tyrannosaurus', 'vacuum cleaner', 'baseball bat', 'shower head', 'file cabinet', 'microwave', 'wheelchair', 'comb', 'computer', 'lawn mower', 'motorbike helmet', 'drum set', 'rubber boat', 'wristwatch', 'bicycle', 'pogo stick', 'ice cube tray', 'syringe', 'scooter', 'axe', 'crown', 'toothbrush', 'rifle', 'netwo

100%|██████████| 20/20 [00:03<00:00,  6.57it/s]
100%|██████████| 20/20 [00:05<00:00,  3.72it/s]


In [5]:
product_path = 'adaptiope_small/product_images'
real_life_path = 'adaptiope_small/real_life'

## 2: Domain-Adversarial training of Neural Network
We implement DANN UDA method [DANN](https://arxiv.org/pdf/1505.07818.pdf)

 

### 2.0: Import Libraries and Data Loading


In [6]:
from PIL import Image
from os.path import join
import math

img = Image.open(join(product_path, 'backpack', 'backpack_003.jpg'))
print('Image size: ', img.size)
#img

Image size:  (679, 679)


import libraries

In [7]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import softmax
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.models import vgg11, alexnet 
from torch.utils.data import DataLoader, random_split
from torchvision.transforms.transforms import ToTensor

configuration constants

In [8]:
img_size = 256
# mean, std used by pre-trained models from PyTorch
mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
config = dict(epochs=10, batch_size=64,lr=0.01, wd=0.001, momentum=0.9, alpha=10, beta=0.75, gamma=10)

Configue GPU

In [9]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


In [10]:
def get_dataset(root_path):
  '''
    Get dataset from specific data path

    # parameters:
        root_path: path to image folder

    # return: train_loader, test_loader
  '''
  # Construct image transform
  image_transform = transforms.Compose([
    transforms.Resize(img_size),
    transforms.CenterCrop(img_size),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
  ])

  # Load data from filesystem
  image_dataset = ImageFolder(root_path, transform=image_transform)

  return image_dataset

def get_dataloader(dataset, batch_size, shuffle_train=True, shuffle_test=False):
  '''
    Get DataLoader from specific data path

    # parameters:
        dataset: ImageFolder instance
        batch_size: batch_size for DataLoader
        shuffle_train: whether to shuffle training data
        shuffle_test: whether to shuffle test data
  '''
  # Get train, test number
  num_total = len(dataset)
  num_train = int(num_total * 0.8 + 1)
  num_test  = num_total - num_train

  # random split dataset
  data_train, data_test = random_split(dataset, [num_train, num_test])

  # initialize dataloaders
  loader_train = DataLoader(data_train, batch_size=batch_size, shuffle=shuffle_train)
  loader_test  = DataLoader(data_test, batch_size=batch_size, shuffle=shuffle_test)

  return loader_train, loader_test

### 2.1 Define Feature Extractor with Pretrain Network

In [11]:
alexnet

<function torchvision.models.alexnet.alexnet>

In [12]:
class FeatureExtractor(nn.Module):
  def __init__(self, pretrained=True):
    super(FeatureExtractor, self).__init__()

    self.feature_extractor = alexnet(pretrained=pretrained)
    self.feature_dim = self.feature_extractor.classifier[-1].in_features

    # make the last layer identity
    self.feature_extractor.classifier[-1] = nn.Identity()

  def forward(self, x):
    return self.feature_extractor(x)
  
  def output_dim(self):
    return self.feature_dim

### 2.2 Define Classifier, Discriminator with RevereLayerF for training the Feature Extractor

In [13]:
class Classifier(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(Classifier, self).__init__()
        self.classifier = nn.Sequential(
            nn.Linear(input_dim, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, output_dim)
        )
    
    def forward(self, X):
        return self.classifier(X) 

In [14]:
class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super(Discriminator, self).__init__()
        self.discriminator =  nn.Sequential(
            nn.Linear(int(input_dim), 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024,1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Sigmoid() # binary classification
        )

    def forward(self, x):
        validity = self.discriminator(x)
        return validity 

In [15]:
from torch.autograd import Function

class ReverseLayerF(Function):
    @staticmethod
    def forward(ctx, tensor):
        return tensor.view_as(tensor)

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output.neg(), None

In [16]:
class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super(Discriminator, self).__init__()
        self.discriminator =  nn.Sequential(
            nn.Linear(int(input_dim), 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024,1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Sigmoid()
        )

    def forward(self, x):
        validity = self.discriminator(x)
        return validity 

In [17]:
class DANN(nn.Module):
  # def __init__(self, num_classes, adversarial=True):
  def __init__(self, num_classes, pretrained=True):
    super(DANN, self).__init__()
    self.output_dim = num_classes

    # define inner network component
    self.feature_extractor = FeatureExtractor(pretrained=pretrained) # 4096
    self.classifier = Classifier(self.feature_extractor.output_dim(), num_classes) # 4096 -> num_classes, 20, target ++
    self.discriminator = Discriminator(self.feature_extractor.output_dim())   # 4096 -> 1 -> adversarial --
  
  def forward(self, x):
    feature_output = self.feature_extractor(x)

    class_pred = self.classifier(feature_output)

    # Add a ReverseLayer here for negative gradient computation
    reverse_feature = ReverseLayerF.apply(feature_output)  
    domain_pred = self.discriminator(reverse_feature)

    return class_pred, domain_pred 

### 2.3 Cost function

In [18]:
class BinaryDiceLoss(nn.Module):
    """Dice loss of binary class
    Args:
        smooth: A float number to smooth loss, and avoid NaN error, default: 1
        p: Denominator value: \sum{x^p} + \sum{y^p}, default: 2
        predict: A tensor of shape [N, *]
        target: A tensor of shape same with predict
        reduction: Reduction method to apply, return mean over batch if 'mean',
            return sum if 'sum', return a tensor of shape [N,] if 'none'
    Returns:
        Loss tensor according to arg reduction
    Raise:
        Exception if unexpected reduction
    """
    def __init__(self, smooth=1, p=2, reduction='mean'):
        super(BinaryDiceLoss, self).__init__()
        self.smooth = smooth
        self.p = p
        self.reduction = reduction

    def forward(self, predict, target):
        assert predict.shape[0] == target.shape[0], "predict & target batch size don't match"
        predict = predict.contiguous().view(predict.shape[0], -1)
        target = target.contiguous().view(target.shape[0], -1)

        num = torch.sum(torch.mul(predict, target), dim=1) + self.smooth
        den = torch.sum(predict.pow(self.p) + target.pow(self.p), dim=1) + self.smooth

        loss = 1 - num / den

        if self.reduction == 'mean':
            return loss.mean()
        elif self.reduction == 'sum':
            return loss.sum()
        elif self.reduction == 'none':
            return loss
        else:
            raise Exception('Unexpected reduction {}'.format(self.reduction))

In [19]:
import torch.nn.functional as F
class DiceLoss(nn.Module):
    """Dice loss, need one hot encode input
    Args:
        weight: An array of shape [num_classes,]
        ignore_index: class index to ignore
        predict: A tensor of shape [N, C, *]
        target: A tensor of same shape with predict
        other args pass to BinaryDiceLoss
    Return:
        same as BinaryDiceLoss
    """
    def __init__(self, weight=None, ignore_index=None, **kwargs):
        super(DiceLoss, self).__init__()
        self.kwargs = kwargs
        self.weight = weight
        self.ignore_index = ignore_index

    def forward(self, predict, target):
        # one hot encode input
        num_class = predict.shape[1]
        # one hot
        target = F.one_hot(target, num_classes=num_class)
        
        assert predict.shape == target.shape, 'predict & target shape do not match'
        dice = BinaryDiceLoss(**self.kwargs)
        total_loss = 0
        predict = F.softmax(predict, dim=1)

        for i in range(target.shape[1]):
            if i != self.ignore_index:
                dice_loss = dice(predict[:, i], target[:, i])
                if self.weight is not None:
                    assert self.weight.shape[0] == target.shape[1], \
                        'Expect weight shape [{}], get[{}]'.format(target.shape[1], self.weight.shape[0])
                    dice_loss *= self.weights[i]
                total_loss += dice_loss

        return total_loss/target.shape[1]

In [20]:
def get_class_loss_func(dice_loss=False):
  if dice_loss:
    return DiceLoss()
  else:
    return nn.CrossEntropyLoss()

### 2.4 Optimizer

Setting the **learning rate** according to the original [paper](https://arxiv.org/pdf/1505.07818.pdf) section 5.2.2

$$ \mu_p =  \frac{\mu_0}{(1+\alpha \cdot p)^\beta}$$

where p is the training progress linearly changing from 0 to 1.

In [21]:
def get_optimizer(model, config, progress, adversarial=True):
  '''
  Config Optimizer
  '''
  learning_rate = config['lr']
  learning_rate = learning_rate / ((1 + config['alpha']*progress)**config['beta'])

  weight_decay  = config['wd']
  momentum      = config['momentum']

  feature_ext   = model.get_submodule("feature_extractor")
  classifier    = model.get_submodule("classifier")
  discriminator = model.get_submodule("discriminator")

  pre_trained_weights = feature_ext.parameters()

  if adversarial:
    other_weights = list(classifier.parameters()) + list(discriminator.parameters())
  else:
    other_weights = list(classifier.parameters())

  # assign parameters to parameters
  optimizer = torch.optim.SGD([
    {'params': pre_trained_weights},
    {'params': other_weights, 'lr': learning_rate}
  ], lr= learning_rate/10, weight_decay=weight_decay, momentum=momentum)
  
  return optimizer

### 2.5 Training Loop and Testing Loop

In [22]:
def train_loop(dataloader, model, device, progress, dice_loss=False):
  """
    Return:
      @best_state: best performance model state parameters
      @best_loss: best performance loss
  """
  size = len(dataloader.dataset)
  loss_fn = get_class_loss_func(dice_loss)

  optimizer = get_optimizer(model, config, progress, adversarial=False)

  best_loss  = float('inf')
  best_state = None

  for batch, (X, y) in enumerate(dataloader):
    X, y = X.to(device), y.to(device)
    
    # compute prediction and loss
    class_pred, _ = model(X)
    # class_pred, domain_pred 

    # classification loss
    loss = loss_fn(class_pred, y)
    
    # store best state
    curr_loss = loss.item()
    if curr_loss < best_loss:
      best_loss  = curr_loss
      best_state = model.state_dict()
      print(f"## Update ## best_state with loss: {curr_loss:>7f}")
    
    # backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if batch % 10 == 0:
      current = batch * len(X)
      print(f"## Meter  ## current loss: {curr_loss:>7f} [{current:>5d}/{size:>5d}]")
    
  return best_state, best_loss

In [23]:
def test_loop(dataloader, model, device, dice_loss=False):
  test_loss, correct = 0, 0
  loss_fn = get_class_loss_func(dice_loss)

  with torch.no_grad():
    for X, y in dataloader:
      X, y = X.to(device), y.to(device)
      class_pred, _ = model(X)

      test_loss += loss_fn(class_pred, y).item()
      correct += (class_pred.argmax(1) == y).type(torch.float).sum().item()

  size = len(dataloader.dataset)
  num_batches = len(dataloader)

  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

  return test_loss, correct

### 2.6 Training Function

In [24]:
def training(model, train_dataloader, test_dataloader, config, device, dice_loss=False):
  epochs = config['epochs']
  # print(f"Learning_rate {config['lr']}, weight_decay {config['wd']}")

  best_state, best_loss = None, float('inf')
  no_improve_count = 0

  for epoch in range(epochs):
    print(f"Epoch {epoch+1}\n------------------")
    progress = epoch/epochs

    curr_state, _ = train_loop(train_dataloader, model, device, progress, dice_loss)

    # Test with curr_state
    model.load_state_dict(curr_state)

    test_loss, _ = test_loop(test_dataloader, model, device)

    # store the best performed state parameters
    if test_loss < best_loss:
      no_improve_count = 0

      best_state = curr_state
      best_loss  = test_loss
    else:
      no_improve_count += 1
    
    if no_improve_count > 2:
      print(f"## No Improvement on Test Set, Stopping ##")
      break

  model.load_state_dict(best_state)
  print("Done")


## 3 Training without using Domain Adaptation techniques

 ### 3.1 Product Domain -> Real Life

In [25]:
# Get dataloader
product_dataset   = get_dataset(product_path)
real_life_dataset = get_dataset(real_life_path)

#### 3.1.1 Training on Source Domain

In [26]:
train_dataloader, test_dataloader = get_dataloader(product_dataset, config['batch_size'])

model = DANN(len(product_dataset.classes)).to(device)

# Training
training(model, train_dataloader, test_dataloader, config, device)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "
Downloading: "https://download.pytorch.org/models/alexnet-owt-7be5be79.pth" to /root/.cache/torch/hub/checkpoints/alexnet-owt-7be5be79.pth


  0%|          | 0.00/233M [00:00<?, ?B/s]

Epoch 1
------------------
## Update ## best_state with loss: 3.017887
## Meter  ## current loss: 3.017887 [    0/ 1601]
## Update ## best_state with loss: 3.009115
## Update ## best_state with loss: 2.962784
## Update ## best_state with loss: 2.910611
## Update ## best_state with loss: 2.884295
## Update ## best_state with loss: 2.818291
## Update ## best_state with loss: 2.719617
## Update ## best_state with loss: 2.669361
## Update ## best_state with loss: 2.518023
## Update ## best_state with loss: 2.474796
## Update ## best_state with loss: 2.452263
## Meter  ## current loss: 2.452263 [  640/ 1601]
## Update ## best_state with loss: 2.167019
## Update ## best_state with loss: 1.939430
## Update ## best_state with loss: 1.771688
## Update ## best_state with loss: 1.713741
## Update ## best_state with loss: 1.373284
## Update ## best_state with loss: 1.130638
## Update ## best_state with loss: 0.735657
## Update ## best_state with loss: 0.696007
## Update ## best_state with loss: 0.

#### 3.1.2 Test on Real Life

In [27]:
loader_target_dataset = DataLoader(real_life_dataset, batch_size=config['batch_size'], shuffle=False)

# model.load_state_dict(torch.load('model_state.pt', map_location='cpu'))
test_loop(loader_target_dataset, model, device)

Test Error: 
 Accuracy: 41.1%, Avg loss: 2.237883 



(2.237883090041578, 0.411)

In [28]:
del train_dataloader, test_dataloader, loader_target_dataset
del model
print(torch.cuda.memory_allocated())

513480192


### 3.2 Real Life -> Product


#### 3.2.1 Training on Real Life

In [29]:
train_dataloader, test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])

model = DANN(len(real_life_dataset.classes)).to(device)

# Training
training(model, train_dataloader, test_dataloader, config, device)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "


Epoch 1
------------------
## Update ## best_state with loss: 2.990183
## Meter  ## current loss: 2.990183 [    0/ 1601]
## Update ## best_state with loss: 2.968288
## Update ## best_state with loss: 2.962405
## Update ## best_state with loss: 2.941779
## Update ## best_state with loss: 2.919952
## Update ## best_state with loss: 2.888187
## Update ## best_state with loss: 2.871280
## Update ## best_state with loss: 2.810315
## Update ## best_state with loss: 2.770933
## Update ## best_state with loss: 2.742910
## Meter  ## current loss: 2.742910 [  640/ 1601]
## Update ## best_state with loss: 2.644434
## Update ## best_state with loss: 2.567697
## Update ## best_state with loss: 2.538021
## Update ## best_state with loss: 2.392668
## Update ## best_state with loss: 2.353948
## Update ## best_state with loss: 2.150758
## Update ## best_state with loss: 1.997384
## Update ## best_state with loss: 1.986576
## Meter  ## current loss: 1.986576 [ 1280/ 1601]
## Update ## best_state with lo

#### 3.2.2 Testing on Product


In [30]:
loader_target_dataset = DataLoader(product_dataset, batch_size=config['batch_size'], shuffle=False)

# model.load_state_dict(torch.load('model_state.pt', map_location='cpu'))
test_loop(loader_target_dataset, model, device)

Test Error: 
 Accuracy: 81.0%, Avg loss: 0.687159 



(0.6871585550834425, 0.81)

In [31]:
del train_dataloader, test_dataloader, loader_target_dataset
del model
print(torch.cuda.memory_allocated())

513021440


### Dice Training

In [32]:
train_dataloader, test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])

model = DANN(len(real_life_dataset.classes)).to(device)
# Training
dice_config = dict(epochs=10, batch_size=256, lr=0.1, wd=0.001, momentum=0.9, alpha=10, beta=0.75, gamma=10)
training(model, train_dataloader, test_dataloader, dice_config, device, dice_loss=True)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "


Epoch 1
------------------
## Update ## best_state with loss: 0.026176
## Meter  ## current loss: 0.026176 [    0/ 1601]
## Update ## best_state with loss: 0.026165
## Update ## best_state with loss: 0.026159
## Meter  ## current loss: 0.026178 [  640/ 1601]
## Update ## best_state with loss: 0.026154
## Meter  ## current loss: 0.026209 [ 1280/ 1601]
## Update ## best_state with loss: 0.026130
Test Error: 
 Accuracy: 5.3%, Avg loss: 2.995358 

Epoch 2
------------------
## Update ## best_state with loss: 0.026222
## Meter  ## current loss: 0.026222 [    0/ 1601]
## Update ## best_state with loss: 0.026156
## Meter  ## current loss: 0.026167 [  640/ 1601]
## Update ## best_state with loss: 0.026155
## Update ## best_state with loss: 0.026150
## Update ## best_state with loss: 0.026135
## Meter  ## current loss: 0.026167 [ 1280/ 1601]
## Update ## best_state with loss: 0.026055
Test Error: 
 Accuracy: 6.0%, Avg loss: 2.991173 

Epoch 3
------------------
## Update ## best_state with loss

### Summary: Dice Loss Results

Dice Loss is not helpful for domain adaptation as least in our case. Our original motivation to use dice loss is to learn more the class shared information. Maybe we need more careful hyperparameters to train the model with dice loss.

## 4: Define UDA functions


### 4.1 Adversarial Discriminator Loss

In [33]:
def get_discriminator_loss(source_pred, target_pred): 
    domain_pred = torch.cat((source_pred, target_pred),dim=0).cuda()
    #print(domain_pred.shape) # [128,1024]
    source_truth = torch.zeros(len(source_pred))
    target_truth = torch.ones(len(target_pred))
    domain_truth = torch.cat((source_truth, target_truth),dim=0).cuda()
    #print(domain_truth.shape) # [128]

    domain_loss = domain_truth*torch.log(1/domain_pred)+(1-domain_truth)*torch.log(1/(1-domain_pred))
    domain_loss = domain_loss.mean()

    return domain_loss 

### 4.2 Adversarial optimizer

In [34]:
def get_adversarial_optimizer(model, config, progress, adversarial=True):
  '''
  Get Adversarial Optimizers
  '''
  lr, wd, momtm = config['lr'], config['wd'], config['momentum']
  lr = lr / ((1 + config['alpha']*progress)**config['beta'])

  feature_ext   = model.get_submodule("feature_extractor")
  classifier    = model.get_submodule("classifier")
  discriminator = model.get_submodule("discriminator")

  pre_trained_weights   = feature_ext.parameters()
  classifier_weights    = classifier.parameters()
  discriminator_weights = discriminator.parameters()

  feature_optim       = torch.optim.SGD([{'params': pre_trained_weights}],     lr=lr/10, weight_decay=wd, momentum=momtm)
  # two_head_optim      = torch.optim.SGD([{'params': [discriminator.parameters(), classifier.parameters()]}],    lr=lr, weight_decay=wd, momentum=momtm)
  classifier_optim    = torch.optim.SGD([{'params': classifier_weights}],      lr=lr,    weight_decay=wd, momentum=momtm)
  discriminator_optim = torch.optim.SGD([{'params': discriminator_weights}],   lr=lr,    weight_decay=wd, momentum=momtm)
  
  return feature_optim, classifier_optim, discriminator_optim

### 4.3 Adversarial Train Loop

Setting the **domain adaptation parameter** according to the original [paper](https://arxiv.org/pdf/1505.07818.pdf) section 5.2.2

$$ \lambda_p = \frac{2}{1 + exp(-\gamma \cdot p)} - 1 $$

where p is the training progress linearly changing from 0 to 1.

In [35]:
def adversarial_train_loop(source_loader, target_loader, model, config, progress, device):
  """
  return:
    @best_state
    @best_loss
  """
  size = len(source_loader.dataset)
  
  # cross entropy loss
  classification_loss = get_class_loss_func()

  # Get three optimizer
  feature_optim, class_optim, discriminator_optim, two_head_optim = get_adversarial_optimizer(model, config, progress)

  # Target data loader iterator
  iter_target = iter(target_loader)

  domain_adapt = 2 / (1 + math.exp(-config['gamma']*progress)) - 1

  best_loss, best_state = float('inf'), None

  for batch, (X_source, y_source) in enumerate(source_loader):
    try:
      X_target, _ = next(iter_target)
    except:
      iter_target = iter(target_loader)
      X_target, _ = next(iter_target)  

    # Some internal bug return nested tesnor with size 1
    if len(X_source) < 64:
      continue

    X_source, y_source, X_target = X_source.to(device), y_source.to(device), X_target.to(device)

    class_pred_source, domain_pred_source = model(X_source)
    _,                 domain_pred_target = model(X_target)

    class_loss   = classification_loss(class_pred_source, y_source)
    discrim_loss = get_discriminator_loss(domain_pred_source, domain_pred_target)
    
     
    feature_optim.zero_grad()

    # Update discriminator
    discriminator_optim.zero_grad()
    discrim_loss.backward(retain_graph=True)
    discriminator_optim.step()

    # Update classifier
    class_optim.zero_grad()
    class_loss.backward(retain_graph=True)
    class_optim.step()

    # Update feature extractor
    feature_optim.step()  

    # Total loss
    total_loss = class_loss - domain_adapt * discrim_loss 

    if total_loss < best_loss:
      best_loss = total_loss
      best_state = model.state_dict()
      print(f"## Update ## best_state updated with loss: {total_loss:>7f}")

    if batch % 10 == 0:
      class_loss, discrim_loss, current = class_loss.item(), discrim_loss.item(), batch * len(X_source)
      total_loss = total_loss.item()
      print(f"## Meter  ## [{current:>5d}/{size:>5d}]")
      # print(f"## Meter  ## classification loss: {class_loss:>7f} discrim loss: {discrim_loss:>7f} total loss: {total_loss:>7f}[{current:>5d}/{size:>5d}]")

    del class_loss, discrim_loss 
    del X_source, y_source, X_target, class_pred_source, domain_pred_source, domain_pred_target
  
  return best_state, best_loss

In [36]:
def adversarial_train_loop(source_loader, target_loader, model, config, progress, device):
  """
  return:
    @best_state
    @best_loss
  """
  size = len(source_loader.dataset)
  
  # cross entropy loss
  classification_loss = get_class_loss_func()

  # Get three optimizer
  feature_optim, class_optim, discriminator_optim = get_adversarial_optimizer(model, config, progress)

  # Target data loader iterator
  iter_target = iter(target_loader)

  domain_adapt = 2 / (1 + math.exp(-config['gamma']*progress)) - 1

  best_loss, best_state = float('inf'), None

  for batch, (X_source, y_source) in enumerate(source_loader):
    try:
      X_target, _ = next(iter_target)
    except:
      iter_target = iter(target_loader)
      X_target, _ = next(iter_target)  

    # Some internal bug return nested tesnor with size 1
    if len(X_source) < 64:
      continue

    X_source, y_source, X_target = X_source.to(device), y_source.to(device), X_target.to(device)

    class_pred_source, domain_pred_source = model(X_source)
    _,                 domain_pred_target = model(X_target)

    class_loss   = classification_loss(class_pred_source, y_source)
    discrim_loss = get_discriminator_loss(domain_pred_source, domain_pred_target)
    
     
    feature_optim.zero_grad()

    # Update discriminator
    discriminator_optim.zero_grad()
    discrim_loss.backward(retain_graph=True)
    discriminator_optim.step()

    # Update classifier
    class_optim.zero_grad()
    class_loss.backward(retain_graph=True)
    class_optim.step()

    # Update feature extractor
    feature_optim.step()  

    # Total loss
    total_loss = class_loss - domain_adapt * discrim_loss 

    if total_loss < best_loss:
      best_loss = total_loss
      best_state = model.state_dict()
      print(f"## Update ## best_state updated with loss: {total_loss:>7f}")

    if batch % 10 == 0:
      class_loss, discrim_loss, current = class_loss.item(), discrim_loss.item(), batch * len(X_source)
      total_loss = total_loss.item()
      print(f"## Meter  ## [{current:>5d}/{size:>5d}]")
      # print(f"## Meter  ## classification loss: {class_loss:>7f} discrim loss: {discrim_loss:>7f} total loss: {total_loss:>7f}[{current:>5d}/{size:>5d}]")

    del class_loss, discrim_loss 
    del X_source, y_source, X_target, class_pred_source, domain_pred_source, domain_pred_target
  
  return best_state, best_loss

### 4.4 Adversarial Test Loop

In [37]:
def adversarial_test_loop(dataloader, model, device, name=""):
  test_loss, correct = 0, 0

  class_loss_func = get_class_loss_func()

  with torch.no_grad():
    for X, y in dataloader:
      X, y = X.to(device), y.to(device)
      class_pred, _ = model(X)

      test_loss += class_loss_func(class_pred, y).item()
      correct += (class_pred.argmax(1) == y).type(torch.float).sum().item()

  size = len(dataloader.dataset)
  num_batches = len(dataloader)

  test_loss /= num_batches
  correct /= size
  print(f"{name} Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

  return test_loss, correct

### 4.5 Adversarial Training

In [38]:
def adversarial_training(model, source_loader, source_test_loader, target_loader, config, device):
  # print(f"Learning_rate {config['lr']}, weight_decay {config['wd']}")
  best_loss, best_state = float('inf'), None
  no_improve_count = 0

  for epoch in range(config['epochs']):
    print(f"Epoch {epoch+1}\n------------------")
    progress = epoch/config['epochs']

    curr_state, _ = adversarial_train_loop(source_loader, target_loader, model, config, progress, device)

    # Load the best state
    model.load_state_dict(curr_state)

    source_loss, _ = adversarial_test_loop(source_test_loader, model, device, "Source Test")
    target_loss, _ = adversarial_test_loop(target_loader, model, device, "Target Train")
    
    if target_loss < best_loss:
      no_improve_count = 0

      best_loss = target_loss
      best_state = curr_state
    else:
      no_improve_count += 1

    if no_improve_count > 2:
      print(f"## No Improvement on Target Set, Stopping ##")
      break

  model.load_state_dict(best_state)
  print("Done")

## 5 Training with UDA Techniques

In [39]:
torch.cuda.empty_cache()

In [40]:
adv_model = DANN(len(product_dataset.classes)).to(device)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "


### 5.1 Product -> Real Life

#### 5.1.1 Training on Product

In [41]:
train_dataloader, train_test_dataloader = get_dataloader(product_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])

In [42]:
torch.autograd.set_detect_anomaly(True)
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

Epoch 1
------------------
## Update ## best_state updated with loss: 3.016792
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 2.975507
## Update ## best_state updated with loss: 2.933060
## Update ## best_state updated with loss: 2.903765
## Update ## best_state updated with loss: 2.838359
## Update ## best_state updated with loss: 2.767416
## Update ## best_state updated with loss: 2.681646
## Update ## best_state updated with loss: 2.552303
## Update ## best_state updated with loss: 2.518499
## Update ## best_state updated with loss: 2.309509
## Meter  ## [  640/ 1601]
## Update ## best_state updated with loss: 2.241719
## Update ## best_state updated with loss: 1.952263
## Update ## best_state updated with loss: 1.790956
## Update ## best_state updated with loss: 1.665749
## Update ## best_state updated with loss: 1.433195
## Update ## best_state updated with loss: 1.162809
## Update ## best_state updated with loss: 0.853084
## Update ## best_state updated wit

#### 5.1.2 Testing on Real Life

In [64]:
loader_target_dataset = DataLoader(real_life_dataset, batch_size=config['batch_size'], shuffle=False)

test_loop(loader_target_dataset, adv_model, device)

Test Error: 
 Accuracy: 63.4%, Avg loss: 1.221483 



(1.2214834745973349, 0.634)

### 5.2 Real Life -> Product

#### 5.2.1 Training on Real Life

In [44]:
train_dataloader, train_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(product_dataset, config['batch_size'])

In [45]:
# Training
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

Epoch 1
------------------
## Update ## best_state updated with loss: 1.140018
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 0.757911
## Meter  ## [  640/ 1601]
## Update ## best_state updated with loss: 0.687860
## Update ## best_state updated with loss: 0.648907
## Meter  ## [ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 71.7%, Avg loss: 0.893429 

Target Train Test Error: 
 Accuracy: 86.0%, Avg loss: 0.430494 

Epoch 2
------------------
## Update ## best_state updated with loss: 0.294673
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 0.060093
## Meter  ## [  640/ 1601]
## Update ## best_state updated with loss: -0.033203
## Meter  ## [ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 74.7%, Avg loss: 0.760008 

Target Train Test Error: 
 Accuracy: 90.2%, Avg loss: 0.309493 

Epoch 3
------------------
## Update ## best_state updated with loss: -0.309140
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: -0.349268
#

#### 5.2.2 Testing on Product

In [46]:
loader_target_dataset = DataLoader(product_dataset, batch_size=config['batch_size'], shuffle=False)

test_loop(loader_target_dataset, adv_model, device)

Test Error: 
 Accuracy: 90.7%, Avg loss: 0.301166 



(0.30116607186573674, 0.907)

In [47]:
# del source_dataset, train_dataloader, test_dataloader, target_dataset, loader_target_dataset
del adv_model

## 6 Summary

### 6.1 Product -> Real Life


#### 6.1.1 Purely training on Source Domain-Product

Without using any domain adaptation techniques, within 10 epochs training, the classifier network achieves  64.6% accuracy on the target domain.

#### 6.1.2 Training on both Source and Target Domains

Using DANN doamin adaptation technique, the classifier with feature extractor trained on both source and target domain achives 64.6% accuracy, which is the same as the one trained solely on the source domain, with no improvement on accuracy.

### 6.2 Real Life -> Product

#### 6.2.1 Purely Training on Source Domain-Real Life

The classifier trained solely on the source domain achives 79.2% accuracy on the target domain.

#### 6.2.2 Training on both Source and Target Domains

With feature extractor trained on both source and target domain, the classifer achives 91.9% accuracy on the target domain, which is about 12% improvement over feature extractor purely trained on source domain.

## 7 UDA Ablation Study

### Dice Loss

Please refer to dice loss in above section.

### How Pretraing Helps

In [48]:
train_dataloader, train_test_dataloader = get_dataloader(product_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])
adv_model = DANN(len(product_dataset.classes), pretrained=False).to(device)
torch.autograd.set_detect_anomaly(True)
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "


Epoch 1
------------------
## Update ## best_state updated with loss: 2.992111
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 2.991035
## Meter  ## [  640/ 1601]
## Meter  ## [ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 6.0%, Avg loss: 2.996106 

Target Train Test Error: 
 Accuracy: 5.1%, Avg loss: 2.997126 

Epoch 2
------------------
## Update ## best_state updated with loss: 2.674066
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 2.673329
## Update ## best_state updated with loss: 2.672538
## Meter  ## [  640/ 1601]
## Update ## best_state updated with loss: 2.670534
## Meter  ## [ 1280/ 1601]
Source Test Test Error: 
 Accuracy: 5.3%, Avg loss: 2.996476 

Target Train Test Error: 
 Accuracy: 4.7%, Avg loss: 2.996897 

Epoch 3
------------------
## Update ## best_state updated with loss: 2.472155
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 2.466030
## Meter  ## [  640/ 1601]
## Update ## best_state updated w

We can understand that the adversarial training is trying to prevent the model learn and rely on domain specific knowledge, however, if the network is unpretrained, there is no any knowledge in such neural network, such learning process will be quite difficult and much slower.

## How UDA over different levels of feature layer be different

To test the middle layer feature adversarial training, we **copy** the model from torchvision and modify it.

In [49]:
class AlexNet(nn.Module):
    def __init__(self, num_classes: int = 1000, dropout: float = 0.5) -> None:
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(192, 384, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )# 12544 = (6*6*256)
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))
        self.classifier = nn.Sequential(
            nn.Dropout(p=dropout),
            nn.Linear(256 * 6 * 6, 4096), 
            nn.ReLU(inplace=True),
            nn.Dropout(p=dropout),
            nn.Linear(4096, 4096),  # original 4096
            nn.ReLU(inplace=True),
            nn.Linear(4096, num_classes),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        feature = self.features(x)
        x = self.avgpool(feature)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        feature = feature.view(feature.size(0), -1)
        # return x, x
        return x, feature


In [50]:
class FeatureExtractor(nn.Module):
  def __init__(self, pretrained=True):
    super(FeatureExtractor, self).__init__()
    state_dict = alexnet(pretrained=pretrained).state_dict()
    self.feature_extractor = AlexNet()
    self.feature_extractor.load_state_dict(state_dict)
    self.feature_dim = self.feature_extractor.classifier[-1].in_features
    self.adv_feature_dim = 12544
    print(self.feature_dim, self.adv_feature_dim)
    # print(f"Feature dimension: {self.feature_dim}")
    # make the last layer identity
    self.feature_extractor.classifier[-1] = nn.Identity()

  def forward(self, x):
    out = self.feature_extractor(x)
    return out
  
  def output_dim(self):
    return self.feature_dim
  
  def adv_output_dim(self):
    return self.adv_feature_dim

In [51]:
class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super(Discriminator, self).__init__()
        self.discriminator =  nn.Sequential(
            nn.Linear(int(input_dim), 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024,1),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Sigmoid()
        )

    def forward(self, x):
        validity = self.discriminator(x)
        return validity 

In [52]:
class DANN(nn.Module):
  # def __init__(self, num_classes, adversarial=True):
  def __init__(self, num_classes, pretrained=True):
    super(DANN, self).__init__()
    self.output_dim = num_classes

    # define inner network component
    self.feature_extractor = FeatureExtractor(pretrained=pretrained)
    self.classifier = Classifier(self.feature_extractor.output_dim(), num_classes)
    self.discriminator = Discriminator(self.feature_extractor.adv_output_dim())  
  
  def forward(self, x):
    # 4096, 12544
    feature_output, adv_feature = self.feature_extractor(x)
    
    class_pred = self.classifier(feature_output)

    # Add a ReverseLayer here for negative gradient computation
    reverse_feature = ReverseLayerF.apply(adv_feature)
    domain_pred = self.discriminator(reverse_feature)

    return class_pred, domain_pred 

Product -> Real Life

In [53]:
train_dataloader, train_test_dataloader = get_dataloader(product_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])
adv_model = DANN(len(product_dataset.classes), pretrained=True).to(device)
torch.autograd.set_detect_anomaly(True)
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "


4096 12544
Epoch 1
------------------
## Update ## best_state updated with loss: 3.043845
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 2.998204
## Update ## best_state updated with loss: 2.971330
## Update ## best_state updated with loss: 2.917037
## Update ## best_state updated with loss: 2.868568
## Update ## best_state updated with loss: 2.748608
## Update ## best_state updated with loss: 2.717655
## Update ## best_state updated with loss: 2.674238
## Update ## best_state updated with loss: 2.527443
## Update ## best_state updated with loss: 2.440267
## Update ## best_state updated with loss: 2.204468
## Meter  ## [  640/ 1601]
## Update ## best_state updated with loss: 2.135589
## Update ## best_state updated with loss: 1.967083
## Update ## best_state updated with loss: 1.820502
## Update ## best_state updated with loss: 1.471345
## Update ## best_state updated with loss: 1.295348
## Update ## best_state updated with loss: 1.132693
## Update ## best_state 

In [63]:
loader_target_dataset = DataLoader(real_life_dataset, batch_size=config['batch_size'], shuffle=False)

test_loop(loader_target_dataset, adv_model, device)

Test Error: 
 Accuracy: 64.0%, Avg loss: 1.236698 



(1.2366984952241182, 0.64)

Real Life -> Product     


In [65]:
config = dict(epochs=10, batch_size=64, lr=0.01, wd=0.001, momentum=0.9, alpha=10, beta=0.75, gamma=10)
train_dataloader, train_test_dataloader = get_dataloader(real_life_dataset, config['batch_size'])
target_dataloader, target_test_dataloader = get_dataloader(product_dataset, config['batch_size'])
adv_model = DANN(len(real_life_dataset.classes), pretrained=True).to(device)
torch.autograd.set_detect_anomaly(True)
adversarial_training(adv_model, train_dataloader, train_test_dataloader, target_dataloader, config, device)

  f"The parameter '{pretrained_param}' is deprecated since 0.13 and will be removed in 0.15, "


4096 12544
Epoch 1
------------------
## Update ## best_state updated with loss: 2.984358
## Meter  ## [    0/ 1601]
## Update ## best_state updated with loss: 2.974941
## Update ## best_state updated with loss: 2.974074
## Update ## best_state updated with loss: 2.964523
## Update ## best_state updated with loss: 2.937143
## Update ## best_state updated with loss: 2.914748
## Update ## best_state updated with loss: 2.902528
## Update ## best_state updated with loss: 2.827305
## Update ## best_state updated with loss: 2.759071
## Update ## best_state updated with loss: 2.720922
## Meter  ## [  640/ 1601]
## Update ## best_state updated with loss: 2.629232
## Update ## best_state updated with loss: 2.615841
## Update ## best_state updated with loss: 2.553625
## Update ## best_state updated with loss: 2.432843
## Update ## best_state updated with loss: 2.409011
## Update ## best_state updated with loss: 2.330181
## Update ## best_state updated with loss: 2.208629
## Update ## best_state 

In [67]:
loader_target_dataset = DataLoader(product_dataset, batch_size=config['batch_size'], shuffle=False)

test_loop(loader_target_dataset, adv_model, device)

Test Error: 
 Accuracy: 80.8%, Avg loss: 0.634504 



(0.6345044905901887, 0.8075)

config = dict(epochs=10, batch_size=64,lr=0.01, wd=0.001, momentum=0.9, alpha=10, beta=0.75, gamma=10)

| domain               | method  | target acc (test) | Note |
| -------------------- | -------- |  ----------------- | ---- |
| real_life -> product | baseline | 81.0              |      |
| real_life -> product | UDA   | 90.7              |      |
| real_life -> product | UDA(mid) | 80.8              |      |
| product -> real_life | baseline | 41.1              |      |
| product -> real_life | UDA   | 63.4              |      |
| product -> real_life | UDA(mid) | 64.0              |      |

