# Unsupervised Domain Adaptation Project


## Part-1: Data download
Load data to project from Google Drive. Copy a subset of classes of images to the path:
- `adaptiope_small/product_images`
- `adaptiope_small/real_life` 

two directories. They represent images from two different domain **product** and **real_life**

In [None]:
from os import makedirs, listdir
from tqdm import tqdm
from google.colab import drive
from os.path import join
from shutil import copytree

drive.mount('/content/gdrive')

!mkdir dataset
!cp "gdrive/My Drive/Colab Notebooks/data/Adaptiope.zip" dataset/
# !ls dataset

!unzip -qq dataset/Adaptiope.zip   # unzip file

!rm -rf adaptiope_small

Mounted at /content/gdrive


In [None]:
!mkdir adaptiope_small
classes = listdir("Adaptiope/product_images")
print(classes)
classes = ["backpack", "bookcase", "car jack", "comb", "crown", "file cabinet", "flat iron", "game controller", "glasses",
           "helicopter", "ice skates", "letter tray", "monitor", "mug", "network switch", "over-ear headphones", "pen",
           "purse", "stand mixer", "stroller"]
domain_classes = ["product_images", "real_life"]
for d, td in zip(["Adaptiope/product_images", "Adaptiope/real_life"], ["adaptiope_small/product_images", "adaptiope_small/real_life"]):
  makedirs(td)
  for c in tqdm(classes):
    c_path = join(d, c)
    c_target = join(td, c)
    copytree(c_path, c_target)

['grill', 'skateboard', 'crown', 'nail clipper', 'notepad', 'quadcopter', 'scooter', 'compass', 'file cabinet', 'rubber boat', 'bottle', 'rc car', 'monitor', 'network switch', 'fighter jet', 'tank', 'hat', 'screwdriver', 'flat iron', 'keyboard', 'toothbrush', 'knife', 'computer mouse', 'pogo stick', 'wristwatch', 'bookcase', 'fan', 'sleeping bag', 'rifle', 'motorbike helmet', 'hair dryer', 'scissors', 'baseball bat', 'corkscrew', 'cordless fixed phone', 'ice cube tray', 'over-ear headphones', 'ruler', 'umbrella', 'snow shovel', 'puncher', 'vr goggles', 'stroller', 'axe', 'magic lamp', 'usb stick', 'ring binder', 'wheelchair', 'pipe wrench', 'hourglass', 'toilet brush', 'game controller', 'helicopter', 'mug', 'tyrannosaurus', 'sewing machine', 'comb', 'roller skates', 'printer', 'smoking pipe', 'stand mixer', 'ladder', 'microwave', 'hoverboard', 'sword', 'hot glue gun', 'cellphone', 'trash can', 'binoculars', 'handcuffs', 'backpack', 'golf club', 'computer', 'car jack', 'syringe', 'show

100%|██████████| 20/20 [00:02<00:00,  7.31it/s]
100%|██████████| 20/20 [00:05<00:00,  3.79it/s]


## Part-2: Image Classification Neural Network

 

### Part-2.0: Data Loading

First we load the data and preprocessing them

In [None]:
product_path = 'adaptiope_small/product_images'
real_life_path = 'adaptiope_small/real_life'

In [None]:
!pwd
!ls

/content
Adaptiope  adaptiope_small  dataset  gdrive  sample_data


In [None]:
from PIL import Image
from os.path import join

img = Image.open(join(product_path, 'backpack', 'backpack_003.jpg'))
print('Image size: ', img.size)
#img

Image size:  (679, 679)


import libraries

In [None]:
import torch
from torchvision import transforms
from torchvision.datasets import ImageFolder
from torchvision.models import vgg16, resnet18, resnet34
from torch.utils.data import DataLoader, random_split

configuration constants

In [None]:
img_size = 256
# mean, std used by pre-trained models from PyTorch
mean, std = [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]
batch_size = 100
learning_rate = 0.001
num_epochs = 15

Configue GPU

In [None]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)

cuda:0


In [None]:
from torchvision.transforms.transforms import ToTensor

def get_dataset(root_path):
  '''
    Get dataset from specific data path

    # parameters:
        root_path: path to image folder

    # return: train_loader, test_loader
  '''
  # Construct image transform
  image_transform = transforms.Compose([
    transforms.Resize(img_size),
    transforms.CenterCrop(img_size),
    transforms.ToTensor(),
    transforms.Normalize(mean, std)
  ])

  # Load data from filesystem
  image_dataset = ImageFolder(root_path, transform=image_transform)

  return image_dataset

def get_dataloader(dataset, batch_size, shuffle_train=True, shuffle_test=False):
  '''
    Get DataLoader from specific data path

    # parameters:
        dataset: ImageFolder instance
        batch_size: batch_size for DataLoader
        shuffle_train: whether to shuffle training data
        shuffle_test: whether to shuffle test data
  '''
  # Get train, test number
  num_total = len(dataset)
  num_train = int(num_total * 0.8 + 1)
  num_test  = num_total - num_train

  # random split dataset
  data_train, data_test = random_split(dataset, [num_train, num_test])

  # initialize dataloaders
  loader_train = DataLoader(data_train, batch_size=batch_size, shuffle=shuffle_train)
  loader_test  = DataLoader(data_test, batch_size=batch_size, shuffle=shuffle_test)

  return loader_train, loader_test

### Part-2.1 Pretrain Network

Here we use a pretrain Neural Network to start with, then we fine tune it with the data set we have from **Adaptiope** in one domain, and test it on the target domain. Compare the two result, and set the benchmark for later UDA enriched method. 

In [None]:
pd_dataset = get_dataset(product_path)
len(pd_dataset.classes)

20

### Part-2.2 Define the Deep Residual Network

In [None]:
def initialize_model(num_classes, model_type="ResNet"):
  if model_type.startswith("ResNet"):
    model = resnet18(pretrained=True)
    in_features = model.fc.in_features
    model.fc = torch.nn.Linear(in_features=in_features, out_features=num_classes)
  else:
    model = vgg16(pretrained=True)
    in_features = model.classifier[-1].in_features
    model.classifier[-1] = torch.nn.Linear(in_features=in_features, out_features=num_classes)

  return model

In [None]:
# model = initialize_model(20, "vgg")
# count  = 0
# for name, param in model.named_parameters():
#   # if name.startswith('fc'):
#   # print(name) 
#   count += 1
# print(count)

# print(model.__class__.__name__)

### Part-2.3 Cost function

Divide parameters intro two groups, in which the last fully conneted layer with learning_rate, the other layers with 0.1 * learning_rate.

In [None]:
def get_cost_function():
  return torch.nn.CrossEntropyLoss()

### Part-2.4 Optimizer

In [None]:
def get_optimizer(model, learning_rate, weight_decay, momentum):

  # Get model name
  model_name = model.__class__.__name__

  # define final layer name by different model
  if model_name == "ResNet":
    final_layer_name = "fc"
  elif model_name == "VGG":
    final_layer_name = "classifier.6"
  else:
    raise Exception(f'## GET_OPTIMIZER ## - Undefined Model Type {model_name}')

  pre_trained_weights = []
  final_layer_weights = []

  # get all the parameters required gradient updates
  for name, param in model.named_parameters():
    if param.requires_grad == True:
      if name.startswith(final_layer_name):
        final_layer_weights.append(param)
      else:
        pre_trained_weights.append(param)

  # assign parameters to parameters
  optimizer = torch.optim.SGD([
    {'params': pre_trained_weights},
    {'params': final_layer_weights, 'lr': learning_rate}
  ], lr= learning_rate/10, weight_decay=weight_decay, momentum=momentum)
  
  return optimizer

### Part-2.5 Training and Testing Step

In [None]:
def train_loop(dataloader, model, loss_fn, optimizer, device):
  size = len(dataloader.dataset)

  for batch, (X, y) in enumerate(dataloader):
    X, y = X.to(device), y.to(device)
    
    # compute prediction and loss
    predicts = model(X)
    loss = loss_fn(predicts, y)

    # backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if batch % 100 == 0:
      loss, current = loss.item(), batch * len(X)
      print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

In [None]:
def test_loop(dataloader, model, loss_fn, device):
  test_loss, correct = 0, 0

  with torch.no_grad():
    for X, y in dataloader:
      X, y = X.to(device), y.to(device)
      predicts = model(X)
      test_loss += loss_fn(predicts, y).item()
      correct += (predicts.argmax(1) == y).type(torch.float).sum().item()

  size = len(dataloader.dataset)
  num_batches = len(dataloader)

  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

  return test_loss, correct

### Part-2.6 Training

In [None]:
def training(model, train_dataloader, test_dataloader, device, epochs=10, lr=0.001, wd=0.001, momentum=0.9):
  print(f"Learning_rate {lr}, weight_decay {wd}")
  loss_fn = get_cost_function()
  optimizer = get_optimizer(model, lr, wd, momentum)

  for epoch in range(epochs):
    print(f"Epoch {epoch+1}\n------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer, device)
    test_loop(test_dataloader, model, loss_fn, device)

  print("Done")

In [None]:
# Get train_dataloader, test_dataloader
dataset = get_dataset(product_path)
train_dataloader, test_dataloader = get_dataloader(dataset, 50)

# Get model
model = initialize_model(len(dataset.classes)).to(device)

# Training
training(model, train_dataloader, test_dataloader, device, num_epochs, learning_rate)


Learning_rate 0.001, weight_decay 0.001
Epoch 1
------------------
loss: 3.031182 [    0/ 1601]
Test Error: 
 Accuracy: 52.1%, Avg loss: 2.268782 

Epoch 2
------------------
loss: 2.164959 [    0/ 1601]
Test Error: 
 Accuracy: 78.2%, Avg loss: 1.468856 

Epoch 3
------------------
loss: 1.253728 [    0/ 1601]
Test Error: 
 Accuracy: 88.0%, Avg loss: 0.990233 

Epoch 4
------------------
loss: 0.934313 [    0/ 1601]
Test Error: 
 Accuracy: 90.0%, Avg loss: 0.746172 

Epoch 5
------------------
loss: 0.645032 [    0/ 1601]
Test Error: 
 Accuracy: 92.7%, Avg loss: 0.588191 

Epoch 6
------------------
loss: 0.566190 [    0/ 1601]
Test Error: 
 Accuracy: 93.7%, Avg loss: 0.493843 

Epoch 7
------------------
loss: 0.362208 [    0/ 1601]
Test Error: 
 Accuracy: 94.2%, Avg loss: 0.429325 

Epoch 8
------------------
loss: 0.254228 [    0/ 1601]
Test Error: 
 Accuracy: 93.2%, Avg loss: 0.396686 

Epoch 9
------------------
loss: 0.237645 [    0/ 1601]
Test Error: 
 Accuracy: 95.0%, Avg loss:

In [None]:
torch.save(model.state_dict(), 'model_state.pt')

### Part-2.7 Testing on Target Domain
#### Apply the model trained on the source domain directly to the target domain. This result will be used for comparison with the results obtained after domain adaptation.

In [None]:
target_dataset = get_dataset(real_life_path)
loader_target_dataset = DataLoader(target_dataset, batch_size=100, shuffle=False)

# model.load_state_dict(torch.load('model_state.pt', map_location='cpu'))
loss_fn = get_cost_function()
test_loop(loader_target_dataset, model, loss_fn, device)


Test Error: 
 Accuracy: 9.6%, Avg loss: 3.616436 



(3.616436266899109, 0.096)

## TODO

### TODO: Dataset unzip Google Drive, Copy to folder

TODO: Batch progress number error

Otherwise Continue UDA

## Part-3: UDA 

Here we use Contrastive Domain Adaptation method proposed [here]().
We train the previous network and run the test on both Source Domain and Target Domain. 

In [None]:
#DANN
import numpy as np

dataloader_source = DataLoader(dataset, batch_size=100, shuffle=True)
dataloader_target = DataLoader(target_dataset, batch_size=100, shuffle=False)

n_epoch = 15
for epoch in range(n_epoch):

    len_dataloader = min(len(dataloader_source), len(dataloader_target))
    # print(len_dataloader)
    data_source_iter = iter(dataloader_source)
    data_target_iter = iter(dataloader_target)
    print('train start')

    for i in range(0, len_dataloader):
        p = float(i + epoch * len_dataloader) / n_epoch / len_dataloader
        alpha = 2. / (1. + np.exp(-10 * p)) - 1

        # training model using source data
        data_source = data_source_iter.next()
        s_img, s_label = data_source

        batch_size = len(s_label)
        # 0代表源域
        domain_label = torch.zeros(batch_size).long()

## Part-4: Comparison & Discussion
Here we compare the test result from the direct method and the UDA method. 

## Part-5: Conclusion