# Welcome!

Welcome, CS5260 learners! 
To minimize the effect of potential result shifts caused by the differences between hardware architectures, your project will be evaluated here on Google Colab.

Please read the following before you start.

- All the instructions below assume you are running this notebook on Google Colab. You can run and debug this notebook using Jupyter notebook with minor modification.
- Check Google Colab's [tutorial](https://colab.research.google.com/notebooks/welcome.ipynb#).

In [0]:
# Replace A0123456X with your matriculation number.
MATRIC_NUM = 'A0112101M'

# Introduction

## Filesystem

Due to the special file system Google Colab uses, coding here will be a little bit different from coding on your local machine.

Here's what will happen when the following block runs on our side.
1. The TA's Google Drive will be mounted with the virtual machine that runs this Colab notebook, at `/content/drive/My Drive/`
2. A special variable `ROOT` will be set to `/content/drive/My Drive/CS5260/`
3. This `ROOT` variable, along with your matriculation number, will be used to locate resources related to your submission.

The filesystem will look like this:

```
/content/drive/My Drive/CS5260/ (ROOT)
  |____ model
  |  |____ model.pt
  |____ images
  |  |____ artifacts
  |  |  |____ 0000.png
  |  |  |____ 0001.png
  |  |  |____ ...
  |  |____ cancer_regions
  |  |  |____ XXXX.png
  |  |  |____ XXXX.png
  |  |  |____ ...
  |  |____ ...
  |____ results
  |  |____ MATRIC_NUM.txt
  |____ MATRIC_NUM
     |____ MATRIC_NUM.ipynb
     |____ other_supporting_files
     |____ ...
```

Therefore, in your algorithm, please use `os.path.join(ROOT, "model")` to replace `../model/`, the same applies to `../images` and `../results/`.

You can debug your code by creating the same folders on your Google Drive.

In [2]:
import sys
import os.path as osp
ROOT = 'C:\\Users\\tey_s\\Desktop\\CS5260'

# Colab: uncomment
from google.colab import drive
drive.mount('/content/drive')
ROOT = osp.join('/content', 'drive', 'My Drive', 'CS5260')
sys.path.append(osp.join(ROOT, MATRIC_NUM))

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


# Preparation

## Runtime Setup

Before running any code block, click "Runtime" in the menu bar and select "Change runtime type". In the popup window, change "hardware accelerator" to "GPU". If the following code block works, your environment should be ok.

Run the following cell to determine the device type of your machine.

In [3]:
import torch
if torch.cuda.is_available():
  print("GPU is available.")
  device = torch.device('cuda')
else:
  print("Change runtime type to GPU for better performance.")
  device = torch.device('cpu')

GPU is available.


## Libraries

You can import libraries as in a Jupyter notebook. To install a library, use `!pip install package-name`.

Please place all you imports in the following cell.

In [4]:
# Colab: uncomment
!pip install adversarial-robustness-toolbox
from load_model_36 import load_model

# Colab: comment
# from load_model_37 import load_model

import torchvision
import torch
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
import time
import datetime
import lightgbm as lgb
from copy import deepcopy

from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

from art.classifiers import PyTorchClassifier
from art.defences import TotalVarMin, ThermometerEncoding, SpatialSmoothing, PixelDefend, LabelSmoothing, JpegCompression, GaussianAugmentation, FeatureSqueezing
from art.defences import GaussianNoise, HighConfidence, ReverseSigmoid, Rounded, ClassLabels

print('Import done!')

Collecting adversarial-robustness-toolbox
[?25l  Downloading https://files.pythonhosted.org/packages/f7/b5/7c7ef44bd2729140930612b4d10af2dbcfa0ca6c9592251c490100b4753a/adversarial_robustness_toolbox-1.2.0-py3-none-any.whl (486kB)
[K     |▊                               | 10kB 18.1MB/s eta 0:00:01[K     |█▍                              | 20kB 4.7MB/s eta 0:00:01[K     |██                              | 30kB 6.4MB/s eta 0:00:01[K     |██▊                             | 40kB 8.2MB/s eta 0:00:01[K     |███▍                            | 51kB 5.5MB/s eta 0:00:01[K     |████                            | 61kB 6.4MB/s eta 0:00:01[K     |████▊                           | 71kB 7.2MB/s eta 0:00:01[K     |█████▍                          | 81kB 8.1MB/s eta 0:00:01[K     |██████                          | 92kB 6.5MB/s eta 0:00:01[K     |██████▊                         | 102kB 7.1MB/s eta 0:00:01[K     |███████▍                        | 112kB 7.1MB/s eta 0:00:01[K     |████████

# Submission

## How do I submit?
1. Place this notebook, along with all you supporting documents, in a folder named with your matriculation number.
2. Zip this folder, renamed the zip archive with your matriculation number.
3. Submit the zip archive using [this Google Form](https://forms.gle/A77s1N5tzu4XAr2QA) (Google account required).

## How do I Add Supporting Libraries if it's not in Pip?
Please keep this in mind: `os.path.join(ROOT, MATRIC_NUM)` will point to the directory of your submission. This directory has also been added to `sys.path` in the code cell above. If your supporting libraries lie in deeper directories, e.g. `os.path.join(ROOT, MATRIC_NUM, 'libs')`, you may append `sys.path` with those directories.

After adding all relevant directories to `sys.path`, you should be able to directly import them by the name of the modules.

# Now it's Your Turn
Please complete this notebook.
When evaluating your submission, we will directly open this notebook and click "Runtime -> Run all" in the menubar. Your result should then appear in `os.path.join(ROOT, 'results')` as `A0123456X.txt`. The format is quoted here:

> This text file contains one entry per test image separated by a ‘newline’ character.
> Each entry must contain image id and your top-1 prediction separated by ‘#’, e.g. 1000#0.

**We will not handle crashes.**

# 1. Data Directories

In [0]:
# Data directories
image_dir = osp.join(ROOT, 'images')
main_dir = osp.join(ROOT,MATRIC_NUM)
lgb_dir = osp.join(ROOT,MATRIC_NUM,'lgb_model.txt')

# 2. Define Functions

In [0]:
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]
normalize = transforms.Normalize(mean=[0,0,0], std=[1,1,1])

class ImageFolderWithPaths(torchvision.datasets.ImageFolder):
    """Custom dataset that includes image file paths. Extends
    torchvision.datasets.ImageFolder
    """

    # override the __getitem__ method. this is the method that dataloader calls
    def __getitem__(self, index):
        # this is what ImageFolder normally returns 
        original_tuple = super(ImageFolderWithPaths, self).__getitem__(index)
        # the image file path
        path = self.imgs[index][0]
        # make a new tuple that includes original and the path
        tuple_with_path = (original_tuple + (path,))
        return tuple_with_path
    
def load_data(data_dir, batch_size = 25):
    data_loader = torch.utils.data.DataLoader(
        ImageFolderWithPaths(data_dir, transforms.Compose([
            transforms.Resize(128),
            transforms.CenterCrop(128),
            transforms.ToTensor(),
            normalize,])),
            batch_size=batch_size, 
            num_workers=0)
    return data_loader
def write_results(image_set, data_folder = None, def_labels = None, path_set = None):
    if image_set == 'A0112101M':
        with open(osp.join(ROOT,'results','{}.txt'.format(image_set)), 'w') as the_file:
            the_file.write('')

        for i in range(len(def_labels)):
            name = path_set[i].split('/')[-1]
            name = name.split('.')[0]
            with open(osp.join(ROOT,'results','{}.txt'.format(image_set)), 'a') as the_file:
                the_file.write('{}#{:.0f}\n'.format(name,int(def_labels[i])))        
    else:
        if image_set == 'clean':
            image_dir = osp.join(ROOT, data_folder,'clean_images')
        elif image_set == 'adv':
            image_dir = osp.join(ROOT, data_folder,'adv_images')

        clean_labels = []
        data_loader = load_data(image_dir) # load data
        for pre_img, clean_label, path in data_loader:
            if len(clean_labels)==0:
                path_set = path
            else:
                path_set = path_set + path
            clean_labels = np.concatenate([clean_labels,clean_label],axis=0)

        with open(osp.join(ROOT,'results','{}.txt'.format(image_set)), 'w') as the_file:
            the_file.write('')

        for i in range(len(clean_labels)):
            name = path_set[i].split('/')[-1]
            name = name.split('.')[0]
            with open(osp.join(ROOT,'results','{}.txt'.format(image_set)), 'a') as the_file:
                the_file.write('{}#{}\n'.format(name,int(clean_labels[i])))
                
def eval_script():
    # Evaluation script
    import argparse
    import sys
    if sys.version_info[0] < 3 or sys.version_info[1] < 6:
        raise AssertionError("Please use Python 3.6+")

    usage = """

    This is the example evaluation script we will be using to calculate your accuracy.
    You score will be calculated as the harmonic mean of accuracy in clean and adversarial images, i.e.
    score = 2 / (1 / acc_clean + 1 / acc_adversarial)"""

    def harmonic_mean(x1, x2):
        eps = 1e-7
        x1 = x1 + eps
        x2 = x2 + eps
        return 2 / (1 / x1 + 1 / x2)

    def load_label(path):
        outputs = open(path, 'r').read().splitlines()
        outputs = {i.split('#')[0]: i.split('#')[1] for i in outputs}
        return outputs

    p = argparse.ArgumentParser(usage=usage)
    required = p.add_argument_group('required arguments')
    required.add_argument('--pred-file', type=str, required=True, help='The output of your defense algorithm.')
    required.add_argument('--clean-label', type=str, required=False, help='The grounding truth of clean images by CS5260 staff')
    required.add_argument('--adv-label', type=str, required=False, help='The grounding truth of adversarial images by CS5260 staff')
    args = p.parse_args(args=['--pred-file',osp.join(ROOT,'results','A0112101M.txt'),'--clean-label',osp.join(ROOT,'results','clean.txt'),'--adv-label',osp.join(ROOT,'results','adv.txt')]) # Need to key in required arguments

    pred        = load_label(args.pred_file)
    clean_label = load_label(args.clean_label)
    model_label   = load_label(args.adv_label)
    num_pred  = len(pred)
    num_clean = len(clean_label)
    num_adv   = len(model_label)

    if num_adv + num_clean != num_pred:
        raise AssertionError(f'Number of your predictions {num_pred} does not match the number of labels {num_clean + num_adv}.')

    clean_correct = 0
    adv_correct = 0
    for k, v in pred.items():
        if clean_label.get(k) == v:
            clean_correct += 1
        elif model_label.get(k) == v:
            adv_correct += 1

    score = harmonic_mean(clean_correct / num_clean, adv_correct / num_adv)
    result = """
    Evaluation result:
        Clean:       {} / {} correct.
        Adversarial: {} / {} correct.
        Score:       {:.4f}.
    """.format(clean_correct,num_clean,adv_correct,num_adv,score)
    print(result)

def harmonic_mean(x1, x2):
    eps = 1e-7
    x1 = x1 + eps
    x2 = x2 + eps
    return 2 / (1 / x1 + 1 / x2)

def unique(target): # return unique label counts
    uniq_target = target.unique(sorted=True)
    unique_count = torch.stack([(target==item).sum() for item in uniq_target])
#     for i in range(len(uniq_target)):
#         print("Label {}: {}".format(uniq_target.numpy()[i],unique_count.numpy()[i]))
    return unique_count

def denorm_img(image_set):
    img = deepcopy(image_set)
    for idx in range(img.shape[0]):
        for i in range(3):
            img[idx,i] = (img[idx,i]-mean[i])/std[i]
    return img

def evaluate_def(clean_label, model_label, def_label):

    model_good = 0
    def_good = 0
    num = int(len(model_label))

    for i in range(num):
        if model_label[i] == clean_label[i]:
            model_good += 1
        if def_label[i] == clean_label[i]:
            def_good += 1
    num = num/100
    print('                [{:.2f}%,{:.2f}%]'.format(model_good/num,def_good/num))
    return def_good/num

def print_results(clean_label, model_label, pre_label, post_label):
    print("Pre processing: [Model , Defended]")
    pre_acc = evaluate_def(clean_label, model_label, pre_label) 
    print("Postprocessing: [Model , Defended]")
    post_acc = evaluate_def(clean_label, model_label, post_label)  
    return pre_acc, post_acc
    
def pre_process(pre_img, model, preprocess):

    model_label = model(denorm_img(pre_img)).max(axis=1)[1].numpy()    

    post_img,_ = preprocess(pre_img.numpy())
    preds = model(torch.from_numpy(denorm_img(post_img)).float())
    
    return model_label, post_img, preds

def post_process(preds, bst): 
    softmax = nn.Softmax(dim=-1)(preds)    
    def_label = torch.from_numpy(bst.predict(softmax)).max(dim=1)[1]
    return def_label   

# 3. Code to pre-train Lightgbm Model (Commented)

In [0]:
# start_global = time.time() 
# # Train post-processing

# def print_score(label, pred, printout = False):
#     num = int(len(label))
#     score = 0

#     for i in range(num):
#         if pred[i] == label[i]:
#             score += 1
            
#     if printout:
#         print('Score: ',score*100/num)        
#     return score*100/num

# # Load models
# model = load_model(osp.join(ROOT, 'model','model.pt'), 'cpu')
# model.eval()
# preprocess = SpatialSmoothing(window_size=6, channel_index=1, clip_values=(0,1))

# clean_labels = []
# data_loader = load_data(image_dir, batch_size = 25) # load data
# for pre_img, clean_label, path in data_loader:
#     model_label, post_img, preds = pre_process(pre_img, model, preprocess) # pre-process    
#     softmax = nn.Softmax(dim=-1)(preds)
#     # Store values
#     if len(clean_labels)==0:
#         softmaxes = softmax
#     else:
#         softmaxes = np.concatenate([softmaxes,softmax],axis=0)
#     clean_labels = np.concatenate([clean_labels,clean_label],axis=0)
    
# # Save results
# np.save(osp.join(main_dir,'lgb_xtrain'), softmaxes)
# np.save(osp.join(main_dir,'lgb_ytrain'), clean_labels)
 
# # Load pre-saved results
# softmaxes = np.load(osp.join(main_dir,'lgb_xtrain.npy'))
# clean_labels = np.load(osp.join(main_dir,'lgb_ytrain.npy'))

# # Split data for Validation
# xtrain, xtest, ytrain, ytest = train_test_split(softmaxes, clean_labels, test_size=0.4, random_state = 40)

# score = -50
# train_data = lgb.Dataset(xtrain, label=ytrain)

# # Parameters set(After fine tuning) to search for the best params
# learning_rate = [0.1,0.15,0.2,0.3]
# leaf_no_set = [21,41,61]
# num_round_set = [50,100,150]

# for lr in learning_rate:
#     for leaf_no in leaf_no_set:
#         for num_round in num_round_set:
#             param = {'num_leaves': leaf_no, 'objective': 'multiclass', 'num_class':4, 'learning_rate':lr}

#             bst = lgb.train(param, train_data, num_round)

#             model_label = np.argmax(xtest,axis=1)
#             post_label = torch.from_numpy(bst.predict(xtest)).max(dim=1)[1]
            
#             pre_score = print_score(ytest, model_label)
#             post_score = print_score(ytest, post_label)
            
#             if post_score-pre_score> score:
#                 # Update best params
#                 score = post_score-pre_score
#                 best_leaf = leaf_no
#                 best_round = num_round
#                 best_lr = lr

#                 # Save model
#                 bst.save_model(lgb_dir)

# # Predict whole dataset with best params
# param = {'num_leaves': best_leaf, 'objective': 'multiclass', 'num_class':4, 'learning_rate':best_lr}

# bst = lgb.train(param, train_data, best_round)
# model_label = np.argmax(xtest,axis=1)
# post_label = torch.from_numpy(bst.predict(xtest)).max(dim=1)[1]

# pre_score = print_score(ytest, model_label, printout=True)
# post_score = print_score(ytest, post_label, printout=True)

# print('Best parameters:\nLR {}, Leaf {}, Round {}, Score {}'.format(best_lr,best_leaf,best_round, score))
# print("Time taken: {:.2f}s".format(time.time()-start_global))

# 4. Defense Algorithm

In [8]:
###
start_global = time.time()

# ------------- Defense algo starts here ----------------
# Load models
model = load_model(osp.join(ROOT, 'model','model.pt'), 'cpu')
model.eval()
preprocess = SpatialSmoothing(window_size=6, channel_index=1, clip_values=(0,1))
postprocess = lgb.Booster(model_file=lgb_dir)

# Initialize sets
clean_labels = [] # Actual labels of the image set
model_labels = [] # Initial labels by model
def_labels = [] # Final labels after pre and post processes
predss = [] # Prediction results after pre process
path_set = [] # Image path names

# load data
data_loader = load_data(image_dir)

for pre_img, clean_label, path in data_loader:
    
    model_label, post_img, preds = pre_process(pre_img, model, preprocess) # pre-process
    def_label = post_process(preds, postprocess)                           # post-process
    
    # Store values
    clean_labels = np.concatenate([clean_labels,clean_label],axis=0)
    model_labels = np.concatenate([model_labels,model_label],axis=0)
    def_labels = np.concatenate([def_labels,def_label],axis=0)
    if len(predss)==0:
        predss = preds
        path_set = path
    else:
        predss = np.concatenate([predss,preds],axis=0)
        path_set = path_set + path

pre_acc, post_acc = print_results(clean_labels, model_labels, np.argmax(predss,axis=1), def_labels)

write_results('A0112101M',def_labels = def_labels, path_set = path_set)
# print("Time taken: {:.2f}s".format(time.time()-start_global))



Pre processing: [Model , Defended]
                [44.05%,62.56%]
Postprocessing: [Model , Defended]
                [44.05%,80.18%]


# 5. Evaluate Results (Commented)

In [0]:
# # Colab: Comment
# data_folder = 'clean_adv'
# write_results('clean',data_folder)
# write_results('adv',data_folder)
# eval_script()

# Credits

This Colab notebook is created for CS5260 final project. Feel free to clone, but please do not distribute. 

Last Edited: Mar-14-2020 20:00