# DATA20001 Deep Learning - Group Project
## Image project

**Due Thursday, May 20, before 23:59.**

The task is to learn to assign the correct labels to a set of images.  The images are originally from a photo-sharing site and released under Creative Commons-licenses allowing sharing.  The training set contains 20 000 images. We have resized them and cropped them to 128x128 to make the task a bit more manageable.

We're only giving you the code for downloading the data. The rest you'll have to do yourselves.

Some comments and hints particular to the image project:

- One image may belong to many classes in this problem, i.e., it's a multi-label classification problem. In fact there are images that don't belong to any of our classes, and you should also be able to handle these correctly. Pay careful attention to how you design the outputs of the network (e.g., what activation to use) and what loss function should be used.

- As the dataset is pretty imbalanced, don't focus too strictly on the outputs being probabilistic. (Meaning that the right threshold for selecting the label might not be 0.5.)

- Image files can be loaded as numpy matrices for example using `imread` from `matplotlib.pyplot`. Most images are color, but a few grayscale. You need to handle the grayscale ones somehow as they would have a different number of color channels (depth) than the color ones.

- In the exercises we used e.g., `torchvision.datasets.MNIST` to handle the loading of the data in suitable batches. Here, you need to handle the dataloading yourself.  The easiest way is probably to create a custom `Dataset`. [See for example here for a tutorial](https://github.com/utkuozbulak/pytorch-custom-dataset-examples).

In [2]:
# automatically reload dependencies and repository content so that kernel need not be restarted
%load_ext autoreload
%autoreload 2

## Get the data

In [None]:
import sys
import os
from os.path import join
from os.path import abspath
from os.path import split

import torch
import torchvision
from torchvision.datasets.utils import download_url
import zipfile

root_dir = os.getcwd()
if root_dir not in sys.path:
    sys.path.append(root_dir)
    
train_path = 'train'

data_folder_name = 'image-training-corpus+annotations'
DATA_FOLDER_DIR = os.path.abspath(os.path.join(root_dir, data_folder_name))

data_zip_name = 'dl2018-image-proj.zip'
DATA_ZIP_DIR = os.path.abspath(os.path.join(DATA_FOLDER_DIR, data_zip_name))

with zipfile.ZipFile(DATA_ZIP_DIR) as zip_f:
    zip_f.extractall(train_path)

The above command downloaded and extracted the data files into the `train` subdirectory.

The images can be found in `train/images`, and are named as `im1.jpg`, `im2.jpg` and so on until `im20000.jpg`.

The class labels, or annotations, can be found in `train/annotations` as `CLASSNAME.txt`, where CLASSNAME is one of the fourteen classes: *baby, bird, car, clouds, dog, female, flower, male, night, people, portrait, river, sea,* and *tree*.

Each annotation file is a simple text file that lists the images that depict that class, one per line. The images are listed with their number, not the full filename. For example `5969` refers to the image `im5969.jpg`.

## Your stuff goes here ...

In [3]:
import torch


device = None
if torch.cuda.is_available():
    print("Using GPU")
    device = torch.device("cuda:0")
else:
    print("Using CPU")
    device = torch.device("cpu")

Using GPU


In [4]:
from image_dataset import ImageDataset
from data_augmentation import DataAugmentation

dataAugmentation = DataAugmentation()
dataset = ImageDataset(dataAugmentation = dataAugmentation)


In [5]:
trainIds = range(1,19001)

In [5]:
from data_balancer import DataBalancer

dataBalancer = DataBalancer()

trainIds = dataBalancer.balanceData(trainIds, 20000-19000, trainIds)

In [6]:
trainDataset = ImageDataset(trainIds, dataAugmentation=dataAugmentation)
valDataset = ImageDataset(range(19001,20001))

In [18]:
from torch import nn
import math
import torch.nn.functional as F

torch.cuda.empty_cache()

class ReseptionNet(nn.Module):
    def __init__(self, config):
        super(ReseptionNet, self).__init__()
        
        channels = config["inChannels"]
        
        self.inceptions = nn.ModuleList([])
        dimensions = config["inDimensions"]
        for i, inception in enumerate(config["inceptions"]):
            for j in range(inception["amount"]):
                inceptionLayer = Inception(channels, dimensions, inception["config"])
                channels = inceptionLayer.outChannels
                dimensions = inceptionLayer.outDimensions
                self.inceptions.append(inceptionLayer)
                print("inception {} iteration {} layer output dimensions {} * {} * {} = {}".format(i+1, j+1, channels, dimensions[0], dimensions[1], channels*dimensions[0]*dimensions[1]))

                
        self.flatten = Flatten()
        self.linear = nn.Linear(channels*dimensions[0]*dimensions[1], config["outputs"])
        self.sigmoid = nn.Sigmoid()


    def forward(self, x):
        output = x
        for inception in self.inceptions:
            output = inception(output)
            
        output = self.flatten(output)
        output = self.linear(output)
        output = self.sigmoid(output)
        
        return output

    
class Inception(nn.Module):
    
    inceptionConfig = None
    outChannels = None
    outDimensions = None

    def __init__(self, inChannels, inDimensions, inceptionConfig):
        super(Inception, self).__init__()
        self.inceptionConfig = inceptionConfig
        
        self.outChannels = 0
        self.branches = nn.ModuleList([])
        branchDimensions = [
            self.updateDimensions(
                inDimensions,
                self.inceptionConfig["shortcut"]["padding"],
                self.inceptionConfig["shortcut"]["dilation"],
                self.inceptionConfig["shortcut"]["kernelSize"],
                self.inceptionConfig["shortcut"]["stride"]
            )
        ]
        for branch in self.inceptionConfig["branches"]:
            blocks = nn.ModuleList([])
            channels = inChannels
            dimensions = inDimensions
            for block in branch["blocks"]:
                convolution = block["convolution"]
                blocks.append(convolution(
                    channels,
                    math.ceil(channels*block["outputChannelMultiplier"]),
                    kernel_size = block["kernelSize"],
                    padding=block["padding"],
                    stride=block["stride"],
                    dilation=block["dilation"]
                ))
                channels = math.ceil(channels*block["outputChannelMultiplier"])
                dimensions = self.updateDimensions(dimensions, block["padding"], block["dilation"], block["kernelSize"], block["stride"])
            self.outChannels += channels
            self.branches.append(blocks)
            branchDimensions.append(dimensions)
            
        for dimensions in branchDimensions:
            if dimensions != branchDimensions[0]:
                print(branchDimensions)
                raise Exception("Dimensions must stay the same between all branches and shortcut in inceptions")
        
        self.outDimensions = branchDimensions[0]
            
        self.shortcut = self.inceptionConfig["shortcut"]["convolution"](
            inChannels,
            self.outChannels,
            kernel_size = self.inceptionConfig["shortcut"]["kernelSize"],
            padding = self.inceptionConfig["shortcut"]["padding"],
            stride = self.inceptionConfig["shortcut"]["stride"],
            dilation = self.inceptionConfig["shortcut"]["dilation"]
        )

    def forward(self, x):
        outputs = []
        for branch in self.branches:
            output = x
            for block in branch:
                output = block(output)
            outputs.append(output)
        
        output = torch.cat(outputs, 1)
        shortcut = self.shortcut(x)
        output = output + shortcut
        output = F.relu(output)

        return output
    
    def updateDimensions(self, dimensions, padding, dilation, kernelSize, stride):
        def dimensionalize(value):
            if type(value) is tuple:
                return value
            else:
                return (value, value)
        padding = dimensionalize(padding)
        dilation = dimensionalize(dilation)
        kernelSize = dimensionalize(kernelSize)
        stride = dimensionalize(stride)
        
        newHeight = (dimensions[0] + 2*padding[0] - dilation[0]*(kernelSize[0]-1)-1)//(stride[0])+1
        newWidth = (dimensions[1] + 2*padding[1] - dilation[1]*(kernelSize[1]-1)-1)//(stride[1])+1
        return (newHeight, newWidth)

class BasicConv2d(nn.Module):

    def __init__(self, in_channels, out_channels, **kwargs):
        super(BasicConv2d, self).__init__()
        self.conv = nn.Conv2d(in_channels, out_channels, bias=False, **kwargs)
        self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        return F.relu(x, inplace=True)
    
class AveragePooling(nn.Module):
    
    def __init__(self, in_channels, out_channels, **kwarks):
        super(AveragePooling, self).__init__()
        del kwarks["dilation"]
        self.kwarks = kwarks
        
    def forward(self, x):
        return F.avg_pool2d(x, **self.kwarks)
    
class MaxPooling(nn.Module):
    
    def __init__(self, in_channels, out_channels, **kwarks):
        super(MaxPooling, self).__init__()
        del kwarks["dilation"]
        self.kwarks = kwarks
        
    def forward(self, x):
        return F.max_pool2d(x, **self.kwarks)
    
    
class Flatten(nn.Module):
    
    def forward(self, x):
        N, C, H, W = x.size() # read in N, C, H, W
        return x.view(N, -1)
    
    
class IdentityConv2d(nn.Module):
    
    def __init__(self, in_channels, out_channels, **kwargs):
        super(IdentityConv2d, self).__init__()
        self.kwarks = kwarks
        def dimensionalize(value):
            if type(value) is tuple:
                return value
            else:
                return (value, value)
        kernelSize = dimensionalize(kwarks["kernelSize"])
        self.weights = np.zeros(kernelSize)
        self.weights[kernelSize[0]//2, kernelSize[1]//2] = 1
        self.weights = torch.Tensor(self.weights)
        self.weights = self.weights.view(1, 1, kernelSize[0], kernelSize[1]).repeat(1, out_channels, 1, 1)
        
    def forward(self, x):
        return F.conv2d(x, self.weights, **self.kwarks)
    

channelExploder = {
    "branches": [
        {
            "blocks": [
                {   
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 5,
                    "kernelSize": 1,
                    "padding": 0,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        },
        {
            "blocks": [
                {
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 5,
                    "kernelSize": 3,
                    "padding": 1,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        },
        {
            "blocks": [
                {
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 5,
                    "kernelSize": 5,
                    "padding": 2,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        },
        {
            "blocks": [
                {
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 5,
                    "kernelSize": 7,
                    "padding": 3,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        }
        
    ],
    "shortcut": {
        "convolution": IdentityConv2d,
        "kernelSize": 1,
        "padding": 0,
        "stride": 2,
        "dilation": 1
    }
}

basicInceptionConfig = {
    "branches": [
        {
            "blocks": [
                {   
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 1,
                    "padding": 0,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        },
        {
            "blocks": [
                {
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 3,
                    "padding": 1,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        },
        {
            "blocks": [
                {
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 5,
                    "padding": 2,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        },
        {
            "blocks": [
                {
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 7,
                    "padding": 3,
                    "stride": 1,
                    "dilation": 1
                },
                {   
                    "convolution": MaxPooling,
                    "outputChannelMultiplier": 1,
                    "kernelSize": 2,
                    "padding": 0,
                    "stride": 2,
                    "dilation": 1
                }
            ]
        }
        
    ],
    "shortcut": {
        "convolution": IdentityConv2d,
        "kernelSize": 1,
        "padding": 0,
        "stride": 2,
        "dilation": 1
    }
}

channelReducerConfig = {
    "branches": [
        {
            "blocks": [
                {   
                    "convolution": BasicConv2d,
                    "outputChannelMultiplier": 0.2,
                    "kernelSize": 1,
                    "padding": 0,
                    "stride": 1,
                    "dilation": 1
                }
            ]
        }
    ],
    "shortcut": {
        "convolution": IdentityConv2d,
        "kernelSize": 1,
        "padding": 0,
        "stride": 1,
        "dilation": 1
    }
}
    
config = {
    "inceptions": [
        {
            "config": channelExploder,
            "amount": 1,
        },
        {
            "config": basicInceptionConfig,
            "amount": 1,
        },
        {
            "config": channelReducerConfig,
            "amount": 1
        },
        {
            "config": basicInceptionConfig,
            "amount": 1,
        },
        {
            "config": channelReducerConfig,
            "amount": 1
        },
        {
            "config": basicInceptionConfig,
            "amount": 1,
        },
        {
            "config": channelReducerConfig,
            "amount": 1
        },
        {
            "config": basicInceptionConfig,
            "amount": 1,
        }
    ],
    "inChannels": 3,
    "inDimensions": (224, 224),
    "outputs": 14,
}

model = ReseptionNet(config)
model = model.to(device)

inception 1 iteration 1 layer output dimensions 60 * 112 * 112 = 752640
inception 2 iteration 1 layer output dimensions 240 * 56 * 56 = 752640
inception 3 iteration 1 layer output dimensions 48 * 56 * 56 = 150528
inception 4 iteration 1 layer output dimensions 192 * 28 * 28 = 150528
inception 5 iteration 1 layer output dimensions 39 * 28 * 28 = 30576
inception 6 iteration 1 layer output dimensions 156 * 14 * 14 = 30576
inception 7 iteration 1 layer output dimensions 32 * 14 * 14 = 6272
inception 8 iteration 1 layer output dimensions 128 * 7 * 7 = 6272


In [19]:
from train_model import train_model

model = train_model(
    model,
    trainDataset,
    valDataset,
    device,
    numberOfEpochs = 1
)

Epoch 1/1
----------
Training...

Progress: 0%
Progress: 10%
Progress: 20%
Progress: 30%
Progress: 40%
Progress: 50%
Progress: 60%
Progress: 70%
Progress: 80%
Progress: 90%
Progress: 100%

train Loss: 0.2833 Acc: 0.9251
Validating...

Progress: 0%
Progress: 10%
Progress: 20%
Progress: 30%
Progress: 40%
Progress: 50%
Progress: 60%
Progress: 70%
Progress: 80%
Progress: 90%
Progress: 100%

val Loss: 0.2280 Acc: 0.9287

Training complete in 34m 45s
Best val Acc: 0.928714


## Save your model

It might be useful to save your model if you want to continue your work later, or use it for inference later.

In [20]:
torch.save(model.state_dict(), 'deep_model.pkl')
torch.cuda.empty_cache()

The model file should now be visible in the "Home" screen of the jupyter notebooks interface.  There you should be able to select it and press "download".  [See more here on how to load the model back](https://github.com/pytorch/pytorch/blob/761d6799beb3afa03657a71776412a2171ee7533/docs/source/notes/serialization.rst) if you want to continue training later.

In [21]:
model.load_state_dict(torch.load("deep_model.pkl"))
valDataset = ImageDataset(range(19000,20000))

In [22]:
import eval_model

yHats, yTrues = eval_model.test_model(model, valDataset)
for metric in ['precision', 'recall', 'f1', 'accuracy']:
    print("{}: {}".format(metric, eval_model.get_metric(yTrues, yHats, metric)))

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

## Download test set

The testset will be made available during the last week before the deadline and can be downloaded in the same way as the training set.

## Predict for test set

You should return your predictions for the test set in a plain text file.  The text file contains one row for each test set image.  Each row contains a binary prediction for each label (separated by a single space), 1 if it's present in the image, and 0 if not. The order of the labels is as follows (alphabetic order of the label names):

    baby bird car clouds dog female flower male night people portrait river sea tree

An example row could like like this if your system predicts the presense of a bird and clouds:

    0 1 0 1 0 0 0 0 0 0 0 0 0 0
    
The order of the rows should be according to the numeric order of the image numbers.  In the test set, this means that the first row refers to image `im20001.jpg`, the second to `im20002.jpg`, and so on.

If you have the prediction output matrix prepared in `y` you can use the following function to save it to a text file.

In [None]:
np.savetxt('results.txt', y, fmt='%d')