<div style="width:100%; height:140px">
    <img src="https://www.kuleuven.be/internationaal/thinktank/fotos-en-logos/ku-leuven-logo.png/image_preview" width = 300px, heigh = auto align=left>
</div>


KUL H02A5a Computer Vision: Group Assignment 1
---------------------------------------------------------------
Student numbers: <span style="color:red">r0585077, r0595762, r0654230, r0738552, r0829665</span>.

In this group assignment your team will delve into some deep learning applications for computer vision. The assignment will be delivered in the same groups from *Group assignment 0* and you start from this template notebook. The notebook you submit for grading is the last notebook you submit in the [Kaggle competition](https://www.kaggle.com/t/b7a2a8743bd842ca9ac93ae91cbc8d9f) prior to the deadline on **Tuesday 18 May 23:59**. Closely follow [these instructions](https://github.com/gourie/kaggle_inclass) for joining the competition, sharing your notebook with the TAs and making a valid notebook submission to the competition. A notebook submission not only produces a *submission.csv* file that is used to calculate your competition score, it also runs the entire notebook and saves its output as if it were a report. This way it becomes an all-in-one-place document for the TAs to review. As such, please make sure that your final submission notebook is self-contained and fully documented (e.g. provide strong arguments for the design choices that you make). Most likely, this notebook format is not appropriate to run all your experiments at submission time (e.g. the training of CNNs is a memory hungry and time consuming process; due to limited Kaggle resources). It can be a good idea to distribute your code otherwise and only summarize your findings, together with your final predictions, in the submission notebook. For example, you can substitute experiments with some text and figures that you have produced "offline" (e.g. learning curves and results on your internal validation set or even the test set for different architectures, pre-processing pipelines, etc). We advise you to first go through the PDF of this assignment entirely before you really start. Then, it can be a good idea to go through this notebook and use it as your first notebook submission to the competition. You can make use of the *Group assignment 1* forum/discussion board on Toledo if you have any questions. Good luck and have fun!

---------------------------------------------------------------
NOTES:
* This notebook is just a template. Please keep the five main sections, but feel free to adjust further in any way you please!
* Clearly indicate the improvements that you make! You can for instance use subsections like: *3.1. Improvement: applying loss function f instead of g*.


# 1. Overview
This assignment consists of *three main parts* for which we expect you to provide code and extensive documentation in the notebook:
* Image classification (Sect. 2)
* Semantic segmentation (Sect. 3)
* Adversarial attacks (Sect. 4)

In the first part, you will train an end-to-end neural network for image classification. In the second part, you will do the same for semantic segmentation. For these two tasks we expect you to put a significant effort into optimizing performance and as such competing with fellow students via the Kaggle competition. In the third part, you will try to find and exploit the weaknesses of your classification and/or segmentation network. For the latter there is no competition format, but we do expect you to put significant effort in achieving good performance on the self-posed goal for that part. Finally, we ask you to reflect and produce an overall discussion with links to the lectures and "real world" computer vision (Sect. 5). It is important to note that only a small part of the grade will reflect the actual performance of your networks. However, we do expect all things to work! In general, we will evaluate the correctness of your approach and your understanding of what you have done that you demonstrate in the descriptions and discussions in the final notebook.

## 1.1 Deep learning resources
If you did not yet explore this in *Group assignment 0 (Sect. 2)*, we recommend using the TensorFlow and/or Keras library for building deep learning models. You can find a nice crash course [here](https://colab.research.google.com/drive/1UCJt8EYjlzCs1H1d1X0iDGYJsHKwu-NO).

In [None]:
!pip install -U segmentation-models-pytorch albumentations --user 
!pip install albumentations==0.5.2
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0,2,3,4"


import sys
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, balanced_accuracy_score, log_loss, confusion_matrix
import albumentations as albu
import pickle
print(albu.__version__)

## 1.2 PASCAL VOC 2009
For this project you will be using the [PASCAL VOC 2009](http://host.robots.ox.ac.uk/pascal/VOC/voc2009/index.html) dataset. This dataset consists of colour images of various scenes with different object classes (e.g. animal: *bird, cat, ...*; vehicle: *aeroplane, bicycle, ...*), totalling 20 classes.

In [None]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

In [None]:
# Loading the training data
train_df = pd.read_csv('/kaggle/input/kul-h02a5a-computervision-groupassignment1/train/train_set.csv', index_col="Id")
labels = train_df.columns[:]
print("The training set contains {} examples.".format(len(train_df)))

In [None]:
labels = train_df.columns[:]
print(labels)
train_df.head(1)

In [None]:
# Loading the test data
test_df = pd.read_csv('/kaggle/input/kul-h02a5a-computervision-groupassignment1/test/test_set.csv', index_col="Id")
#test_df["img"] = [np.load('/content/drive/My Drive/CV Assignment 2/Data/test/img/test_{}.npy'.format(idx)) for idx, _ in test_df.iterrows()]
#test_df["seg"] = [-1 * np.ones(img.shape[:2], dtype=np.int8) for img in test_df["img"]]
print("The test set contains {} examples.".format(len(test_df)))

# The test dataframe is similar to the training dataframe, but here the values are -1 --> your task is to fill in these as good as possible in Sect. 2 and Sect. 3; in Sect. 6 this dataframe is automatically transformed in the submission CSV!
test_df.head(1)

In [None]:
# Setup path variables
DATA_DIR = '/kaggle/input/kul-h02a5a-computervision-groupassignment1'
x_train_dir = os.path.join(DATA_DIR,'train', 'img')
y_train_dir = os.path.join(DATA_DIR,'train', 'seg')

print(x_train_dir)

do_test_split=False

if not do_test_split:
  x_test_dir = os.path.join(DATA_DIR,'test', 'img')
  y_test_dir = None
else:
  x_test_dir = x_train_dir 
  y_test_dir = y_train_dir

### Visualization

In [None]:
from skimage.color import label2rgb
import skimage
from matplotlib.colors import ListedColormap
import matplotlib.patches as mpatches
print(skimage.__version__)
COLORS=('red', 'blue', 'yellow', 'magenta', 
            'green', 'indigo', 'darkorange', 'cyan', 'pink', 
            'yellowgreen', 'darkmagenta', 'brown',
            'purple', 'darkviolet',"dodgerblue",
            "darkgoldenrod","lawngreen","crimson",
              "darkkhaki","darkgreen")



# helper function for data visualization
def visualize(**images):
    """PLot images in one row."""
    n = len(images)
    plt.figure(figsize=(16, 5))
    for i, (name, image) in enumerate(images.items()):
        plt.subplot(1, n, i + 1)
        plt.xticks([])
        plt.yticks([])
        plt.title(' '.join(name.split('_')).title())
        plt.imshow(image)
    plt.show()


# helper function for data visualization
def plot_mask(img,mask,mask_truth=None,score=None,labels=labels):
  # create colormap for the legend
  cmap1 = ListedColormap(COLORS).colors
  cl_nums1 = np.unique(mask)
  cl_nums1=cl_nums1[cl_nums1!=0]
  # Subtract one so it can act as index for the labels
  cl_nums1= [x - 1 for x in cl_nums1]
  # Subset the list of all colors to bind color to class, instead of cycling
  # through colors 
  cmap1 = [cmap1[i] for i in cl_nums1]
  # create empty colored rectangles labeled with the segmentation class names
  # for the legend 
  r1 = [mpatches.Rectangle((0,0), 1, 1, color=cmap1[i],label=labels[cl_nums1[i]]) for i in range(len(cl_nums1))]
  result1 = label2rgb(mask,img,alpha=0.5,bg_label=0,colors=cmap1,kind="overlay")
  if mask_truth is None:
    plt.figure(figsize=(16, 5))
    # Mask input image with binary mask
    plt.imshow(result1)
    if score is not None:
      plt.title.set_text("IoU="+str(score))
    plt.legend(handles=r1, loc=4, borderaxespad=0.)
    plt.show()
  else:
    # repeat for ground truth mask
    cmap2 = ListedColormap(COLORS).colors
    cl_nums2 = np.unique(mask_truth)
    cl_nums2=cl_nums2[cl_nums2!=0]
    cl_nums2= [x - 1 for x in cl_nums2]
    cmap2 = [cmap2[i] for i in cl_nums2]
    r2 =  [mpatches.Rectangle((0,0), 1, 1, color=cmap2[i],label=labels[cl_nums2[i]]) for i in range(len(cl_nums2))]
    result2 = label2rgb(mask_truth,img,alpha=0.5,bg_label=0,colors=cmap2,kind="overlay")
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    axes[0].imshow(result1)
    if score is not None:
      axes[0].title.set_text("IoU="+str(score))
    axes[0].legend(handles=r1, loc=4, borderaxespad=0.)
    axes[1].imshow(result2)
    axes[1].legend(handles=r2, loc=4, borderaxespad=0.)


def plot_confusion_matrix(y_true, y_pred):
  """Plots a confusion matrix given true and predicted labels. 
  Parameters
  ----------
  y_true : array
      True labels
  y_pred : array
      Predicted labels
  """

# compute confusion matrix from input true and prediction labels
  cmx = confusion_matrix(y_true, y_pred)
  
  # initialize plot, pass in confusion matrix, set labels on x and y axis
  fig, ax = plt.subplots(1,1)
  img = ax.imshow(cmx)
  label_list = ['0', '1', '2']
  ax.set_xticks([0,1,2])
  ax.set_xticklabels(label_list)
  ax.set_yticks([0,1,2])
  ax.set_yticklabels(label_list)
  plt.title('Confusion Matrix')
  plt.xlabel('True Labels')
  plt.ylabel('Predicted Labels')
  
  # include numbers in the plot for better clarity on missclassifications
  for r in range(cmx.shape[0]):
      for c in range(cmx.shape[1]):
          ax.text(c, r, cmx[r,c], ha="center", va="center", color='white', fontsize=20)
  fig.colorbar(img)

### Dataloader

Writing helper class for data extraction, tranformation and preprocessing  
https://pytorch.org/docs/stable/data

In [None]:
from torch.utils.data import DataLoader
from torch.utils.data import Dataset as BaseDataset
from sklearn.model_selection import train_test_split

IMG_WIDTH=256
IMG_HEIGHT=256
class Dataset(BaseDataset):
    """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.
    
    Args:
        images_dir (str): path to images folder
        masks_dir (str): path to segmentation masks folder
        class_values (list): values of classes to extract from segmentation mask
        augmentation (albumentations.Compose): data transfromation pipeline 
            (e.g. flip, scale, etc.)
        preprocessing (albumentations.Compose): data preprocessing 
            (e.g. noralization, shape manipulation, etc.)
    
    """
        
    def __init__(
            self, 
            images_dir=None, 
            masks_dir=None, 
            classes=None, 
            all_classes=None,
            augmentation=None, 
            preprocessing=None,
            index_list=None,
            img_dims=(IMG_WIDTH,IMG_HEIGHT)
    ):
        self.all_classes=all_classes
        if classes is None:
          classes = all_classes
        self.ids = []#list(range(len(index_list)))
        # Load images and masks with progress updates printed
        self.images=[]
        self.masks_raw=[]
        counter=1
        for idx in index_list:
          if masks_dir is not None:
            mask_i = cv2.resize(np.load(masks_dir+'/train_{}.npy'.format(idx)),
                  img_dims, interpolation = cv2.INTER_NEAREST)
            found = False
            for cls in classes:
              if np.any(mask_i==all_classes.index(cls)+1):
                self.masks_raw.append(mask_i)
                self.ids.append(idx)
                found=True
                break
            if not found:
              continue
          if "train" in images_dir:
            self.images.append(cv2.resize(
                  np.load(images_dir+'/train_{}.npy'.format(idx)),
                  img_dims, interpolation = cv2.INTER_AREA))
          else:
            self.images.append(cv2.resize(
                  np.load(images_dir+'/test_{}.npy'.format(idx)),
                  img_dims, interpolation = cv2.INTER_AREA))
          counter+=1
          if (counter%100==0) or (counter==len(index_list)):
            print(counter,"/",len(index_list),"loaded.")
        #self.images=[np.load(images_dir+'/train_{}.npy'.format(idx)) for idx in index_list]
        #self.masks_raw=[np.load(masks_dir+'/train_{}.npy'.format(idx)) for idx in index_list]
        
        # convert str names to class values on masks
        self.class_values = [all_classes.index(cls.lower()) for cls in classes]
        
        self.augmentation = augmentation
        self.preprocessing = preprocessing
    
    def get_original_image(self,i):
      return self.images[i]


    def __getitem__(self, i):
        
        # read data
        image = self.images[i]
        if len(self.masks_raw)>0:
          mask = self.masks_raw[i]
          # extract certain classes from mask (e.g. cars)
          bg_values=list(range(len(self.all_classes)+1))
          for v in self.class_values:
            bg_values.remove(v+1)
          bg_mask=np.zeros(mask.shape)
          bg_mask[np.isin(mask,bg_values)]=1
          masks = [(mask == v+1) for v in self.class_values]
          mask = np.stack(masks, axis=-1).astype('float')
          #print(bg_mask)
          #print(mask.shape)
          bg_mask = np.expand_dims(bg_mask, 2)
          #print(bg_mask.shape)
          mask=np.concatenate([bg_mask,mask],axis=2)
        else:
          mask=None
        
        
        
        # apply augmentations
        if self.augmentation:
            if len(self.masks_raw)>0:
              sample = self.augmentation(image=image, mask=mask)
              image, mask = sample['image'], sample['mask']
            else:
              image = self.augmentation(image=image)['image']
        
        # apply preprocessing
        if self.preprocessing:
            if len(self.masks_raw)>0:
              sample = self.preprocessing(image=image, mask=mask)
              image, mask = sample['image'], sample['mask']
            else:
              image = self.preprocessing(image=image)['image']

            
        return image, mask
        
    def __len__(self):
        return len(self.images)

### Augmentations

In [None]:
def get_training_augmentation():
    train_transform = [

        albu.HorizontalFlip(p=0.5),

        albu.ShiftScaleRotate(scale_limit=(-0.5,1), rotate_limit=0, shift_limit=0.1, p=0.2, border_mode=0), #1

        #CopyPaste(blend=True, sigma=1, pct_objects_paste=0.8, max_paste_objects=1, p=1.), #pct_objects_paste is a guess
        # it's a dual transform so doesn't work

        albu.PadIfNeeded(min_height=320, min_width=320, always_apply=True, border_mode=0),
        albu.RandomCrop(height=320, width=320, always_apply=True),


        albu.IAAAdditiveGaussianNoise(p=0.1), #0.2
        albu.IAAPerspective(p=0.1), #0.5

        albu.OneOf(
            [
                albu.CLAHE(p=1),
                albu.RandomBrightness(p=1),
                albu.RandomGamma(p=1),
            ],
            p=0.2, #0.9
        ),

        albu.OneOf(
            [
                albu.IAASharpen(p=1),
                albu.Blur(blur_limit=3, p=1),
                albu.MotionBlur(blur_limit=3, p=1),
            ],
            p=0.2, #0.9
        ),

        
    ]
    return albu.Compose(train_transform)
'''

        albu.PadIfNeeded(min_height=320, min_width=320, always_apply=True, border_mode=0),
        albu.RandomCrop(height=320, width=320, always_apply=True),


        albu.IAAAdditiveGaussianNoise(p=0.2),
        albu.IAAPerspective(p=0.5),

        albu.OneOf(
            [
                albu.CLAHE(p=1),
                albu.RandomBrightness(p=1),
                albu.RandomGamma(p=1),
            ],
            p=0.9,
        ),

        albu.OneOf(
            [
                albu.IAASharpen(p=1),
                albu.Blur(blur_limit=3, p=1),
                albu.MotionBlur(blur_limit=3, p=1),
            ],
            p=0.9,
        ),

        albu.OneOf(
            [
                albu.RandomContrast(p=1),
                albu.HueSaturationValue(p=1),
            ],
            p=0.9,
        ),'''

def get_validation_augmentation():
    """Add paddings to make image shape divisible by 32"""
    test_transform = [
        albu.PadIfNeeded(384, 480)
    ]
    return albu.Compose(test_transform)


def to_tensor(x, **kwargs):
    return x.transpose(2, 0, 1).astype('float32')


def get_preprocessing(preprocessing_fn):
    """Construct preprocessing transform
    
    Args:
        preprocessing_fn (callbale): data normalization function 
            (can be specific for each pretrained neural network)
    Return:
        transform: albumentations.Compose
    
    """
    
    _transform = [
        albu.Lambda(image=preprocessing_fn),
        albu.Lambda(image=to_tensor, mask=to_tensor),
    ]
    return albu.Compose(_transform)

## Create model and train

### Load pretrained segmentation model

We used EfficientNet-B4 as the encoder/backbone for extracting features from the images and DeepLabV3Plus as the decoder head for labeling pixels.
Both were chosen on a trial and error basis after some theoretical deliberation. For the encoder we tried ResNeXt 50, MobileNet V2 and EfficientNet-B0 before chosing EfficientNet-B4 on the basis of accuracy and speed of training. We tried these encoders, because they have a relatively low number of parameters, which we expected to make model training easier on limited computational resources, and also less prone to overfitting on our relatively small data. As for the other models of the EfficientNet family, we only tried B0 and B4 because the former has the least amount of parameters and the latter seemed to have the best accuracy to model size ratio in the authors original publication.

For the decoder we tried FPN too, but DeepLabV3Plus worked much better. In the end we trained our final model for ~9000 epochs, reaching the best accuracy after ~6000 epochs, after which not even more augmentation and lower learning rate could give further improvements.

In [None]:
import torch
import numpy as np
!pip show segmentation_models_pytorch
import segmentation_models_pytorch as smp

In [None]:
ENCODER = 'efficientnet-b4'
ENCODER_WEIGHTS = 'imagenet'
ALL_CLASSES = list(train_df.columns[:])
CLASSES = labels #
ACTIVATION = "softmax2d" # could be None for logits or 'softmax2d' for multiclass segmentation
DEVICE = 'cuda'

# create segmentation model with pretrained encoder
"""
model = smp.DeepLabV3Plus(
    encoder_name=ENCODER, 
    encoder_weights=ENCODER_WEIGHTS, 
    classes=len(CLASSES)+1, 
    activation=ACTIVATION,
)
"""
MODEL_DIR='../input/trial-efficientnetb4-3-tpth/'
model=torch.load(MODEL_DIR+'trial_efficientnetb4_3_T.pth')

# Preprocessing specific to the used backbone CNN
preprocessing_fn = smp.encoders.get_preprocessing_fn(ENCODER, ENCODER_WEIGHTS)

### Generate the train/validation split

We split the data into train-validation sets with a 2-1 ratio. We also checked, that all the classes are present in both training and validation sets. Fortunately not only are there no missing classes in either groups, but the class distributions are very similar.

In [None]:
indexes=np.array((range(len(train_df))))

X_train, X_valid= train_test_split(indexes, test_size=0.1, random_state=42)
# if it is desired to test on labeled data, this data is taken from the validation set
if do_test_split: 
  X_valid, X_test = train_test_split(X_valid, test_size=0.33, random_state=42)
else:
  X_test =np.array((range(len(test_df))))

print(X_train.shape)
print(X_valid.shape)
print(X_test.shape)

train_classes=train_df.iloc[X_train,:].sum(axis=0)/len(X_train)
valid_classes=train_df.iloc[X_valid,:].sum(axis=0)/len(X_valid)
x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots(figsize=(20, 5))
rects1 = ax.bar(x - width/2, train_classes, width, label='Training')
rects2 = ax.bar(x + width/2, valid_classes, width, label='Validation')

ax.set_ylabel('Class relative frequencies')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()


fig.tight_layout()
plt.show()

### Load the images

The images get loaded into the Dataset class, which are loaded into the DataLoader class handling the batches. Preprocessing and augmentation is applied later on in an on-line fashion during the training process.

In [None]:
train_dataset = Dataset(
    x_train_dir, 
    y_train_dir, 
    augmentation=get_training_augmentation(), 
    preprocessing=get_preprocessing(preprocessing_fn),
    classes=CLASSES, #
    all_classes=ALL_CLASSES,
    index_list=X_train.tolist()
)

print("train set: ",len(train_dataset))
#for i in range(len(train_dataset)):
  #im,m = train_dataset[i]
  #print(m)

valid_dataset = Dataset(
    x_train_dir,
    y_train_dir,
    augmentation=None,#get_training_augmentation(),
    preprocessing=get_preprocessing(preprocessing_fn),
    classes=CLASSES,
    all_classes=ALL_CLASSES,
    index_list=X_valid.tolist()
)

print("val set: ",len(valid_dataset))

In [None]:
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_dataset, batch_size=1, shuffle=False, num_workers=2, pin_memory=True)

### Custom thresholding for model accuracy evaluation

Besides Jaccard loss for training the model, we were monitoring the IoU for accuracy.
When classifying pixels, every pixel has to belong to a class, because the background is also treated as a class! However, for evaluating the loss or accuracy, we were ignoring the background pixels.

To better estimate IoU accuracy we decided to use a local threshold instead of a global one. In our case local threshold means that for every pixel we take the maximum probability class prediction and set it to 1, and all the other classes probabilities' to 0. A global threshold would be setting all class probabilities above a certain value (eg. 0.5) to 1 and to 0 below. The problem with this would be that every pixel can only be classified as one class in the end, and a threshold under 0.5 can give probability 1 to multiple classes, but a threshold of 0.5 or above can mean that a pixel doesnt get classified as anything, not even as background, especially with so many classes.

Unfortunately, the segmentation library only had global thresholding, so we implemented a custom IoU function with local thresholding. 

In [None]:
# Dice/F1 score - https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient
# IoU/Jaccard score - https://en.wikipedia.org/wiki/Jaccard_index


def _take_channels(*xs, ignore_channels=None):
  if ignore_channels is None:
      return xs
  else:
      channels = [channel for channel in range(xs[0].shape[1]) if channel not in ignore_channels]
      xs = [torch.index_select(x, dim=1, index=torch.tensor(channels).to(x.device)) for x in xs]
      return xs

def _threshold(x, threshold=None):
    if threshold is not None:
        return (x > threshold).type(x.dtype)
    else:
        return x

def _threshold_local(x):
    x_max = torch.argmax(x,1,keepdim=True)
    x_ret = torch.zeros(x.shape, device='cuda:0').scatter(1,x_max,1.0)
    #print(x_ret.shape)
    return x_ret

def iou(pr, gt, eps=1e-7, threshold=None, ignore_channels=None):
  pr = _threshold_local(pr)
  pr, gt = _take_channels(pr, gt, ignore_channels=ignore_channels)

  intersection = torch.sum(gt * pr)
  union = torch.sum(gt) + torch.sum(pr) - intersection + eps
  #print("intersection=",intersection,"union=",union,"gt=",torch.sum(gt),"pr=",torch.sum(pr))
  score = (intersection + eps) / union
  '''
  if score > 5:
    print(pr.shape)
    print("intersection=",intersection,"union=",union,"gt=",torch.sum(gt),"pr=",torch.sum(pr),"iou=",score)
    pr_sing = np.argmax(pr.cpu().numpy(),axis=1)
    gt_sing = np.argmax(gt.cpu().numpy(),axis=1)
    for j in range(pr.shape[1]):
      visualize(pr=pr_sing[j,:,:], gt=gt_sing[j,:,:])
    print(np.unique(pr_flat))
    pr_flat = pr_sing.flatten()
    gt_flat = gt_sing.flatten()
    plot_confusion_matrix(gt_flat, pr_flat)
    raise ValueError("done")
    '''
  return score

# This is the custom IoU metric used for the training
class IoU2(smp.utils.metrics.IoU):

  def forward(self, y_pr, y_gt):
    y_pr = self.activation(y_pr)
    return iou(
        y_pr, y_gt,
        eps=1e-7,
        threshold=self.threshold,
        ignore_channels=self.ignore_channels,
    )

loss = smp.utils.losses.JaccardLoss(activation=ACTIVATION,ignore_channels=[0])
metrics2 = [IoU2(activation=ACTIVATION,ignore_channels=[0])]
#metrics1 = [smp.utils.metrics.IoU(threshold=None,activation=ACTIVATION,ignore_channels=[0])]

optimizer = torch.optim.Adam([ 
    dict(params=model.parameters(), lr=1e-4),
])

### Epoch runners

The TrainEpoch and ValidationEpoch classes are the wrappers for running the training and evaluation epochs. Every epoch needs to be called by a loop.

In [None]:
train_epoch = smp.utils.train.TrainEpoch(
    model, 
    loss=loss, 
    metrics=metrics2, 
    optimizer=optimizer,
    device=DEVICE,
    verbose=False,
)

valid_epoch = smp.utils.train.ValidEpoch(
    model, 
    loss=loss, 
    metrics=metrics2, 
    device=DEVICE,
    verbose=False,
)

In [None]:
# Start new logs
#train_logs={"jaccard_loss":[],"iou_score":[]}
#valid_logs={"jaccard_loss":[],"iou_score":[]}
# Load logs from file
f = open("../input/segmentation-logs/train_logs.pkl", "rb")
train_logs = pickle.load(f)
f = open("../input/segmentation-logs/valid_logs.pkl", "rb")
valid_logs = pickle.load(f)

In [None]:
assert torch.cuda.is_available(), "No GPU available"
import time
import IPython



# train model
#max_score=0.4003615085446552
max_score=0.3871722652839199
#MODEL_DIR='/output/working'

#for j in range(5):
  #model = torch.load(MODEL_DIR+'trial4_T.pth')
  #create_epoch(model) ### This does not initialize train_epoch as maybe intended 
  #optimizer.param_groups[0]['lr'] = 1e-5
# assign valid_log for the first iteration print
valid_log={"iou_score":0}
out = display(IPython.display.Pretty('Start training'), display_id=True)
for i in range(8530, 8700):
    if i == -1:
        optimizer.param_groups[0]['lr'] = 1e-4
        print('Decrease decoder learning rate to 1e-4!')
            
    out.update(IPython.display.Pretty(
      'Epoch {it} training, previous validation score: {score:.4f}'.format(it=i,score=valid_log["iou_score"])))
    start = time.time()
    train_log = train_epoch.run(train_loader)
    end = time.time()
    out.update(IPython.display.Pretty(
      'Epoch {it} validation, training epoch took {dur:.2f}s'.format(it=i,dur=end-start)))
    valid_log = valid_epoch.run(valid_loader)
    # store loss and metric
    for s in ["jaccard_loss","iou_score"]:
        train_logs[s].append(train_log[s])
        valid_logs[s].append(valid_log[s])
    # save model if validation score improved
    if max_score < valid_log['iou_score']:
        max_score = valid_log['iou_score']
        torch.save(model, 'trial_efficientnetb4_3_T.pth')
        print('Model saved! New best:',max_score)
    
    

In [None]:
# Save logs to file
f = open("train_logs.pkl","wb")
pickle.dump(train_logs,f)
f.close()
f = open("valid_logs.pkl","wb")
pickle.dump(valid_logs,f)
f.close()

In [None]:
torch.save(model, 'trial_efficientnetb4_3_T.pth')

In [None]:
#print(train_logs)
#print(valid_logs)
# summarize history for accuracy
f,ax = plt.subplots(nrows=1,ncols=2,figsize=(20,5))
ax[0].plot(train_logs['jaccard_loss'])
ax[0].plot(valid_logs['jaccard_loss'])
ax[0].set_title('Loss progress')
ax[0].set_ylabel('Jaccard loss')
ax[0].set_xlabel('epoch')
ax[1].plot(train_logs['iou_score'])
ax[1].plot(valid_logs['iou_score'])
ax[1].set_title('Score progress')
ax[1].set_ylabel('IoU score')
ax[1].set_xlabel('epoch')
f.legend(['training set', 'validation set'], loc='upper right')
plt.suptitle('Model Performance Over Epochs',fontsize=20)
plt.show()

In [None]:
# load best saved checkpoint
best_model = torch.load(MODEL_DIR+'trial_efficientnetb4_3_T.pth')
# evaluate model on test set
test_vis_epoch = smp.utils.train.ValidEpoch(
    model=best_model,
    loss=loss,
    metrics=metrics2,
    device=DEVICE,
)

In [None]:
# validation dataset without transformations for retrieving best IoU
test_dataset_vis = Dataset(
    x_train_dir, y_train_dir,
    preprocessing=get_preprocessing(preprocessing_fn),
    classes=CLASSES, #,
    all_classes=ALL_CLASSES,
    index_list=X_valid
)
test_dataloader_vis = DataLoader(test_dataset_vis, batch_size=1, shuffle=False, num_workers=2)
vis_log=test_vis_epoch.run(test_dataloader_vis)
print(vis_log["iou_score"])
print(len(test_dataloader_vis))

In [None]:
for i in range(75):
  #n = np.random.choice(len(test_dataset_vis))
  n=i
  
  
  image, gt_mask = test_dataset_vis[n]
  image_vis = test_dataset_vis.get_original_image(n)

  image_vis = image_vis.astype('uint8')

  x_tensor = torch.from_numpy(image).to(DEVICE).unsqueeze(0).float()

  pr_mask = best_model.predict(x_tensor)
  activation=torch.nn.Softmax(dim=1)
  #pr_mask = activation(pr_mask)

  gt_mask=torch.from_numpy(gt_mask).to(DEVICE).unsqueeze(0)

  score=iou(pr_mask,gt_mask,threshold=None, ignore_channels=[0]).item()

  cl_mask=pr_mask.argmax(1)
  gt_mask=gt_mask.argmax(1)


  cl_mask = cl_mask.squeeze().cpu().numpy()
  gt_mask = gt_mask.squeeze().cpu().numpy()

  if score >0:
    #print(i)
    if do_test_split:
      visualize(
        image=image_vis, 
        ground_truth_mask=gt_mask, 
        predicted_mask=cl_mask)
    else:
      plot_mask(
        image_vis, 
        cl_mask,
        gt_mask,
        score=score,
        labels=CLASSES
      )

## 1.3 Your Kaggle submission
Your filled test dataframe (during Sect. 2 and Sect. 3) must be converted to a submission.csv with two rows per example (one for classification and one for segmentation) and with only a single prediction column (the multi-class/label predictions running length encoded). You don't need to edit this section. Just make sure to call this function at the right position in this notebook.

In [None]:
def _rle_encode(img):
    """
    Kaggle requires RLE encoded predictions for computation of the Dice score (https://www.kaggle.com/lifa08/run-length-encode-and-decode)

    Parameters
    ----------
    img: np.ndarray - binary img array
    
    Returns
    -------
    rle: String - running length encoded version of img
    """
    pixels = img.flatten()
    pixels = np.concatenate([[0], pixels, [0]])
    runs = np.where(pixels[1:] != pixels[:-1])[0] + 1
    runs[1::2] -= runs[::2]
    rle = ' '.join(str(x) for x in runs)
    return rle

def generate_submission(df):
    """
    Make sure to call this function once after you completed Sect. 2 and Sect. 3! It transforms and writes your test dataframe into a submission.csv file.
    
    Parameters
    ----------
    df: pd.DataFrame - filled dataframe that needs to be converted
    
    Returns
    -------
    submission_df: pd.DataFrame - df in submission format.
    """
    df_dict = {"Id": [], "Predicted": []}
    for idx, _ in df.iterrows():
        df_dict["Id"].append(f"{idx}_classification")
        df_dict["Predicted"].append(_rle_encode(np.array(df.loc[idx, labels])))
        df_dict["Id"].append(f"{idx}_segmentation")
        df_dict["Predicted"].append(_rle_encode(np.array([df.loc[idx, "seg"] == j + 1 for j in range(len(labels))])))
    
    submission_df = pd.DataFrame(data=df_dict, dtype=str).set_index("Id")
    submission_df.to_csv("submission.csv")
    return submission_df

# 2. Image classification
The goal here is simple: implement a classification CNN and train it to recognise all 20 classes (and/or background) using the training set and compete on the test set (by filling in the classification columns in the test dataframe).

In [None]:
class RandomClassificationModel:
    """
    Random classification model: 
        - generates random labels for the inputs based on the class distribution observed during training
        - assumes an input can have multiple labels
    """
    def fit(self, X, y):
        """
        Adjusts the class ratio variable to the one observed in y. 

        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
        y: list of arrays - n x (nb_classes)

        Returns
        -------
        self
        """
        self.distribution = np.mean(y, axis=0)
        print("Setting class distribution to:\n{}".format("\n".join(f"{label}: {p}" for label, p in zip(labels, self.distribution))))
        return self
        
    def predict(self, X):
        """
        Predicts for each input a label.
        
        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
            
        Returns
        -------
        y_pred: list of arrays - n x (nb_classes)
        """
        np.random.seed(0)
        return [np.array([int(np.random.rand() < p) for p in self.distribution]) for _ in X]
    
    def __call__(self, X):
        return self.predict(X)
    
model = RandomClassificationModel()
model.fit(train_df["img"], train_df[labels])
test_df.loc[:, labels] = model.predict(test_df["img"])
test_df.head(1)

# 3. Semantic segmentation
The goal here is to implement a segmentation CNN that labels every pixel in the image as belonging to one of the 20 classes (and/or background). Use the training set to train your CNN and compete on the test set (by filling in the segmentation column in the test dataframe).

In [None]:
class RandomSegmentationModel:
    """
    Random segmentation model: 
        - generates random label maps for the inputs based on the class distributions observed during training
        - every pixel in an input can only have one label
    """
    def fit(self, X, Y):
        """
        Adjusts the class ratio variable to the one observed in Y. 

        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
        Y: list of arrays - n x (height x width)

        Returns
        -------
        self
        """
        self.distribution = np.mean([[np.sum(Y_ == i) / Y_.size for i in range(len(labels) + 1)] for Y_ in Y], axis=0)
        print("Setting class distribution to:\nbackground: {}\n{}".format(self.distribution[0], "\n".join(f"{label}: {p}" for label, p in zip(labels, self.distribution[1:]))))
        return self
        
    def predict(self, X):
        """
        Predicts for each input a label map.
        
        Parameters
        ----------
        X: list of arrays - n x (height x width x 3)
            
        Returns
        -------
        Y_pred: list of arrays - n x (height x width)
        """
        np.random.seed(0)
        return [np.random.choice(np.arange(len(labels) + 1), size=X_.shape[:2], p=self.distribution) for X_ in X]
    
    def __call__(self, X):
        return self.predict(X)
    
model = RandomSegmentationModel()
model.fit(train_df["img"], train_df["seg"])
test_df.loc[:, "seg"] = model.predict(test_df["img"])
test_df.head(1)

## Submit to competition
You don't need to edit this section. Just use it at the right position in the notebook. See the definition of this function in Sect. 1.3 for more details.

In [None]:
generate_submission(test_df)

# 4. Adversarial attack
For this part, your goal is to fool your classification and/or segmentation CNN, using an *adversarial attack*. More specifically, the goal is build a CNN to perturb test images in a way that (i) they look unperturbed to humans; but (ii) the CNN classifies/segments these images in line with the perturbations.

In [None]:
!pip install efficientnet_pytorch
!pip install torchsummary
from efficientnet_pytorch import EfficientNet
from torchvision import transforms
from PIL import Image
from tqdm import tqdm
model = EfficientNet.from_pretrained('efficientnet-b4', num_classes=20)

In [None]:
def get_training_augmentation():
    train_transform = [

        albu.HorizontalFlip(p=0.5),

        albu.ShiftScaleRotate(scale_limit=(-0.5,1), rotate_limit=0, shift_limit=0.1, p=0.2, border_mode=0), #1

        #CopyPaste(blend=True, sigma=1, pct_objects_paste=0.8, max_paste_objects=1, p=1.), #pct_objects_paste is a guess
        # it's a dual transform so doesn't work

        albu.PadIfNeeded(min_height=320, min_width=320, always_apply=True, border_mode=0),
        albu.RandomCrop(height=320, width=320, always_apply=True),


        albu.IAAAdditiveGaussianNoise(p=0.1), #0.2
        albu.IAAPerspective(p=0.1), #0.5

        albu.OneOf(
            [
                albu.CLAHE(p=1),
                albu.RandomBrightness(p=1),
                albu.RandomGamma(p=1),
            ],
            p=0.2, #0.9
        ),

        albu.OneOf(
            [
                albu.IAASharpen(p=1),
                albu.Blur(blur_limit=3, p=1),
                albu.MotionBlur(blur_limit=3, p=1),
            ],
            p=0.2, #0.9
        ),

        
    ]
    return albu.Compose(train_transform)
'''

        albu.PadIfNeeded(min_height=320, min_width=320, always_apply=True, border_mode=0),
        albu.RandomCrop(height=320, width=320, always_apply=True),


        albu.IAAAdditiveGaussianNoise(p=0.2),
        albu.IAAPerspective(p=0.5),

        albu.OneOf(
            [
                albu.CLAHE(p=1),
                albu.RandomBrightness(p=1),
                albu.RandomGamma(p=1),
            ],
            p=0.9,
        ),

        albu.OneOf(
            [
                albu.IAASharpen(p=1),
                albu.Blur(blur_limit=3, p=1),
                albu.MotionBlur(blur_limit=3, p=1),
            ],
            p=0.9,
        ),

        albu.OneOf(
            [
                albu.RandomContrast(p=1),
                albu.HueSaturationValue(p=1),
            ],
            p=0.9,
        ),'''

def get_validation_augmentation():
    """Add paddings to make image shape divisible by 32"""
    test_transform = [
        albu.PadIfNeeded(384, 480)
    ]
    return albu.Compose(test_transform)


def to_tensor(x, **kwargs):
    return x.transpose(2, 0, 1).astype('float32')


def get_preprocessing(preprocessing_fn):
    """Construct preprocessing transform
    
    Args:
        preprocessing_fn (callbale): data normalization function 
            (can be specific for each pretrained neural network)
    Return:
        transform: albumentations.Compose
    
    """
    
    _transform = [
        albu.Lambda(image=preprocessing_fn),
        albu.Lambda(image=to_tensor, mask=to_tensor),
    ]
    return albu.Compose(_transform)

In [None]:
IMG_WIDTH=380
IMG_HEIGHT=380

class ClassificationDataset(BaseDataset):
    """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.
    
    Args:
        images_dir (str): path to images folder
        masks_dir (str): path to segmentation masks folder
        class_values (list): values of classes to extract from segmentation mask
        augmentation (albumentations.Compose): data transfromation pipeline 
            (e.g. flip, scale, etc.)
        preprocessing (albumentations.Compose): data preprocessing 
            (e.g. noralization, shape manipulation, etc.)
    
    """
        
    def __init__(
                self, 
                images_dir_path=None,
                targets_path='/kaggle/input/kul-h02a5a-computervision-groupassignment1/train/train_set.csv',
                index_list=None,
                img_dims=(IMG_WIDTH,IMG_HEIGHT),
                preprocessing=True,
                augmentation=False
                
    ):
        # list of numpy tensors
        self.images=[]
        # load targets based on indices (from train/validation split) into a numpy array
        self.targets=np.take(pd.read_csv(targets_path,index_col="Id").to_numpy(),index_list, axis=0)
        
        counter=1
        for idx in index_list:
            if "train" in images_dir_path:
                self.images.append(cv2.resize(
                      np.load(images_dir_path+'/train_{}.npy'.format(idx)),
                      img_dims, interpolation = cv2.INTER_AREA))
            else:
                self.images.append(cv2.resize(
                      np.load(images_dir_path+'/test_{}.npy'.format(idx)),
                      img_dims, interpolation = cv2.INTER_AREA))
            counter+=1
            if (counter%100==0) or (counter==len(index_list)):
                print(counter,"/",len(index_list),"loaded.")
        #self.images=[np.load(images_dir+'/train_{}.npy'.format(idx)) for idx in index_list]
        #self.masks_raw=[np.load(masks_dir+'/train_{}.npy'.format(idx)) for idx in index_list]
                
        self.augmentation = augmentation
        self.preprocessing = preprocessing
    
    def get_original_image(self,i):
        return self.images[i]


    def __getitem__(self, i): 
        # self.images is a list of numpy tensors
        image = self.images[i]
        # self.targets is a numpy array
        target=self.targets[i,:]
        
        # apply augmentations
        if self.augmentation:
            image = self.augmentation(image=image)['image']
        # apply preprocessing
        if self.preprocessing:
            # Preprocess image
            tfms = transforms.Compose([transforms.Resize(IMG_WIDTH), transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),])

            image = tfms(Image.fromarray(image)).unsqueeze(0)   
        return {"image": image,"targets": target}
        
    def __len__(self):
        return len(self.images)

In [None]:
def train(data_loader, model, optimizer, loss_fun, device):
    
    model.train()
    
    for data in tqdm(data_loader, position=0, leave=True, desc='Training'):
        inputs = data["image"]
        targets = data['targets']
        
        inputs = inputs.to(device, dtype=torch.float)
        targets = targets.to(device, dtype=torch.float)
        
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = loss_fun
        loss.backward()
        optimizer.step()

In [None]:
indexes=np.array((range(len(train_df))))

X_train, X_valid= train_test_split(indexes, test_size=0.1, random_state=42)
# if it is desired to test on labeled data, this data is taken from the validation set
if do_test_split: 
    X_valid, X_test = train_test_split(X_valid, test_size=0.33, random_state=42)
else:
    X_test =np.array((range(len(test_df))))

print(X_train.shape)
print(X_valid.shape)
print(X_test.shape)

In [None]:
train_dataset = ClassificationDataset(
    x_train_dir, 
    '/kaggle/input/kul-h02a5a-computervision-groupassignment1/train/train_set.csv',
    index_list=X_train.tolist(),
    preprocessing=True,
    augmentation=get_training_augmentation(),    
)

print("train set: ",len(train_dataset))
#for i in range(len(train_dataset)):
  #im,m = train_dataset[i]
  #print(m)

valid_dataset = ClassificationDataset(
    x_train_dir, 
    '/kaggle/input/kul-h02a5a-computervision-groupassignment1/train/train_set.csv',
    index_list=X_valid.tolist(),
    preprocessing=True,
    augmentation=False 
)

print("val set: ",len(valid_dataset))

In [None]:
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True, num_workers=2, pin_memory=True)
valid_loader = DataLoader(valid_dataset, batch_size=1, shuffle=False, num_workers=2, pin_memory=True)

In [None]:
optimizer = torch.optim.Adam([ 
    dict(params=model.parameters(), lr=1e-4),
])
loss_fun=torch.nn.CrossEntropyLoss()
DEVICE = 'cuda'

model.to(DEVICE)

for epoch in range(10):
    train(train_loader, model, optimizer, loss_fun, device=DEVICE)

# 5. Discussion
Finally, take some time to reflect on what you have learned during this assignment. Reflect and produce an overall discussion with links to the lectures and "real world" computer vision.