# HybridNetworks with Model Based Operators and Deep Learning

When training neural networks, it is important to ensure that the distribution in the training data is similar to that of the test data. In this exercise we want to violate this assumption intentionally by using training and test data in which the illumination is different, but which otherwise come from the same distribution. We want to examine the generalization properties of the neural network. In addition, we want to incorporate model-based operators into our system to transform the data to an invariant space first, and then put it trough the neural architecture, attempting that the output of the transformation (input of the neural network) follows the same distribution.

 If training neural networks in pytorch is all new to you, this tutorial may be helpful:
 https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

In [None]:
#include dependencies
import os  
import math

import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision.transforms as transforms
import torchvision

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [10, 10]

if torch.cuda.is_available():
    device = torch.device("cuda:0")
    is_cuda = True
else:
    device = torch.device("cpu")
    is_cuda = False
    
print(device, is_cuda)

# Dataset

An image dataset of an simulated urban environment that consists of one large imageset, already split in a train and a validation set, where all images have the same illumination and a second small imageset with a different illumination.

(In the training and evaluation of machine learning models the dataset is typically split into three sets.
The training set is used to train the parameters of the model. The validation set is used to compare multiple hyperparametersettings (model-architecture, learningrate setting, ...). The test set is only used for the final evaluation of the model, where nothing is adjusted anymore, to achieve unbiased evaluation.)

Here we will also have 3 sets, but with a different meaning. We don't want to tune hyperparameters but illustrate the generalization property of a model in multiple illumination scenarios, therefore we omit a testset here.

In [None]:
#Load Dataset
path = #TODO Specify path of 'urban_driving_cl' folder

# TODO: load images as Datasets
#  use torchvision.datasets.ImageFolder()
#  specify a transform to convert Tensors to Tensors
train_set = # TODO: Create dataset with images from os.path.join(path,"driving_set1_cropped_train")
val_set1 = # TODO: Create dataset with images from os.path.join(path,"driving_set1_cropped_val")
val_set2 = # TODO: Create dataset with images from os.path.join(path,"driving_set2_cropped")


# TODO: Create the dataloaders that sample from corresponding dataset
#  use torch.utils.data.DataLoader()
#  set batch size and shuffle(==True for the trainset, and alo for 
#  the ohers if you want to see a good mix of all classes)
batch_size = 32
dataloader_train = # TODO
dataloader_val1 =  # TODO
dataloader_val2 =  # TODO


#vizualize the sets
for dl in [dataloader_train, dataloader_val1, dataloader_val2]:
    #And vizualize
    grid_img = torchvision.utils.make_grid(next(iter(dl))[0], nrow=8)
    
    plt.imshow(grid_img.permute(1, 2, 0))
    plt.show()

# Models

### CNN
First we build our Convolutional Neural Network model.

**You can define your own model**, or use the one described below. The input is expected to have a witdth, height of 64x64, while the number of channels should be a parameter (as the transforms have different channel output)

#### Possible Model Architecture:
* 3 Convolution Layers (nn.Conv2d) each with 64 kernels,
    - activated with ReLU (F.relu) 
    - followed by a 2x2 MaxPool (nn.MaxPool2d)
        - 1. 5x5 kernels -> output-shape (after maxpool) (batch_size, 64, 30x30)
        - 2. 3x3 kernels -> output-shape (after maxpool) (batch_size, 64, 14x14)
        - 3. 3x3 kernels -> output-shape (after maxpool) (batch_size, 64, 6x6)
* 2 Fully Connected Layers (nn.Linear), with 100 hidden layers:
    - 32x6x6 input units to 100 hidden units 
    - 100 hidden units to num_classes
    - the first fc is activated wit ReLU
    - the second fc must not be activated as pytorch auomatically applies a softmax in the loss function


In [None]:
#DEFINE THE CNN
class Net(nn.Module):
    def __init__(self,in_c = 3, num_classes = 5):
        super().__init__()
        #expected input: b,c,64, 64
        #TODO
  
    def forward(self, x):
        #TODO
        return x

net = Net().to(device)
print(net)
#check if runs and has the correct output:
print(net(torch.zeros((16,3,64,64)).to(device)).shape, "== [16,5]?")

## Train Loop
Create a function for training a model and to evaluate its performance.
#### Training:
* Loop over the dataset
* Feed the inputs to the network
* Calculate the Loss with a criterion
* Run optimizer on parameters

#### Evaluation:
Calculate the accurracy of the model by counting the number of true predictions(the class of the image has been predicted correctly i.e. has the maximum weight).



In [None]:
import torch.optim as optim

def train(model, dataloader, criterion, optimizer):
    model.train(True)
    # get the inputs and targets; data is a list of [inputs, targets]
    for i, (inputs,targets) in enumerate(dataloader, 0):
        inputs = inputs.to(device)
        targets = targets.to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()
            
def eval(model,dataloader):
    model.train(False)
    correct = 0
    total = 0
    # since we're not training, we don't need to calculate the gradients for our outputs
    with torch.no_grad():
        # TODO get the inputs and targets; data is a list of [inputs, targets]
        for ...
            # TODO send inputs and targets to device
            
            # calculate outputs by running images through the network
            # TODO
            
            # the class with the highest weight is what we choose as prediction
            _, predicted = torch.max(outputs.data, 1)
            total += targets.size(0)
            correct += (predicted == targets).sum().item()

    return correct/total
 
net = Net().to(device)

#citerion: the function that defines how the ouptut and desired target are compared
criterion = nn.CrossEntropyLoss() 
#optimizer: an iterative method for optimizing the parameters based on the computed gradient
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) 

# Check if model learns:
# The accurracy should go above 90%
num_epochs = 5
for epoch in range(num_epochs): # loop over the dataset multiple times
    print("epoch %i /%i"%(epoch+1, num_epochs))
    train(net, dataloader_train, criterion,optimizer)
    print("Accuracy on set1 (validation set) is %f %%"%(eval(net, dataloader_val1)*100))

## Transformations
In the lecture the rg transform an lbp transform was presented, that should be implemented and combined with the Neural Network.

In [None]:
# Example Implememtation of a transformation, 
# calculating the intensity of a batch of pytorch images

class Intensity(object):
    """ transform RGB image to Intensity Image"""
    def __call__(self, x):
        intensity = x.sum(dim =1)
        intensity = intensity.unsqueeze(1)/3.0
        return intensity
    
    def __str__(self):
        return "Intensity()"
    
intensity_transform = Intensity()
#And vizualize
images = next(iter(dataloader_train))[0].to(device)
grid_img = torchvision.utils.make_grid(images, nrow=8)
grid_img_t = torchvision.utils.make_grid(intensity_transform(images), nrow=8)
plt.imshow(grid_img.permute(1, 2, 0).cpu().detach().numpy())
plt.show()
plt.imshow(grid_img_t.permute(1, 2, 0).cpu().detach().numpy())
plt.show()   

### rg Transform
$r= \frac{R}{R+G+B}$ $g= \frac{G}{R+G+B}$

This model is invariant to shadow, light intensity and light direction, but has problems with highlight under the assumption of neutral i.e. white light, and assuming Shafer’s dichromatic reflection model.

In [None]:
#rg
class NormalizedRG(object):
    """ transform RGB image to normalized RG"""
    def __call__(self, x):
        eps = 1e-7
        x = (x+eps)
        rgb = #TODO
        return rgb[:,0:2,:,:]
    
    def __str__(self):
        return "NormalizedRG()"
    
#And vizualize
rg_transform = NormalizedRG()
images = next(iter(dataloader_train))[0].to(device)
grid_img = torchvision.utils.make_grid(images, nrow=8)
rg_images = rg_transform(images)
grid_img_r = torchvision.utils.make_grid([rg[0].unsqueeze(dim =0) for rg in rg_images], nrow=8)
grid_img_g = torchvision.utils.make_grid([rg[1].unsqueeze(dim =0) for rg in rg_images], nrow=8)
print("RGB")
plt.imshow(grid_img.permute(1, 2, 0).cpu().detach().numpy())
plt.show()
print("r")
plt.imshow(grid_img_r.permute(1, 2, 0).cpu().detach().numpy())
plt.show()
print("g")
plt.imshow(grid_img_g.permute(1, 2, 0).cpu().detach().numpy())
plt.show()

### LBP Transformation
* Works on a grayscale image/ intensity of the pixels
* Create p neighbouring positions in r distance to center pixel
* Threshholding neighbouring pixel with center pixel
    * If neighbour >= center pixel => 1 else 0
    * Create decimal number from binary result
 
* Implemented using Conv2d convolutions to use the GPU implementation

This model is invariant to monotonic illumination changes.

In [None]:
class LBP(object):
    def __init__(self,radius=1, points=8):
        self.radius = radius
        self.points = points
        
        #FILTER TO CONVERT TO GRAYSCALE
        self.intensity = Intensity()
        
        #FILTER TO COMPARE WITH NEIGHBOURS
        self.size = self.radius*2+1
        self.neighbor_positions = self.positions()
        self.lbp_conv1 = nn.Conv2d(in_channels = 1, out_channels = self.points, kernel_size=self.size, padding =self.radius, bias = False)
        self.lbp_conv1.weight.data.fill_(0.0)
        for i, (w, h) in enumerate(self.neighbor_positions):
            self.lbp_conv1.weight.data[i,0,self.size//2,self.size//2]  = -1
            self.lbp_conv1.weight.data[i,0,w,h]= 1
        #print(self.lbp_conv1.weight.data)
        self.lbp_conv1.to(device)

        #FILTER TO CONVERT BINARY NUMBER(0,1 in channels) TO DECIMAL
        self.lbp_conv2 = nn.Conv2d(in_channels = self.points, out_channels = 1, kernel_size=1,bias = False)
        for i,i_chan in enumerate(range(self.points)):
            self.lbp_conv2.weight.data[0,i,0,0] = 2**i
        #print(self.lbp_conv2.weight.data)
        self.lbp_conv2.to(device)
        self.max_value = float(2**(points+1)-1) #max decimal number, defined by number of points

    def positions(self):
        mid  = self.radius
        positions =[]
        for i in range(self.points):
            #calculate angle and according position for every point
            alpha = 2*math.pi / self.points *i 
            x = int(round(mid + self.radius *math.cos(alpha)))
            y = int(round(mid + self.radius *math.sin(alpha)))
            positions.append((x,y))
        #print(positions)
        return positions

    def __call__(self,x):
        # convert to grayscale(using the intensity tranformation)
        x = #TODO
        
        # compare the the neighbouring pixels to that of the central pixel
        #  applying lbp_conv1 
        #  (to create for every pixel for each neighbour point the difference of the point to the middle)
        x = #TODO 
        
        # convert to binary: 0 if less 0
        x[x >= 0] = #TODO
        x[x < 0] = #TODO
        
        # convert to decimal
        #  apply lbp_conv2
        x = #TODO

        x= x/self.max_value
        return x

    def __str__(self):
        return "LBP(radius_%i_points_%i)"%(self.radius, self.points)

In [None]:
lbp = LBP(1,8)
#And vizualize
images = next(iter(dataloader_train))[0].to(device)
grid_img = torchvision.utils.make_grid(images, nrow=8)
grid_img_t = torchvision.utils.make_grid(lbp(images), nrow=8)
plt.imshow(grid_img.permute(1, 2, 0).cpu().detach().numpy())
plt.show()
plt.imshow(grid_img_t.permute(1, 2, 0).cpu().detach().numpy())
plt.show()

## Hybrid Net
A class combining the model based transform and the CNN
* The model based transform is applied with disabled gradient calculation as the weights are(/should be) fixed


In [None]:
class HybridNet(nn.Module):
    def __init__(self, transform, nn):
        super(HybridNet, self).__init__()
        self.transform = transform
        self.nn = nn

    def forward(self, x):
        with torch.no_grad() :
            x = #TODO apply transform
        x = #TOOD apply nn
        return x

    def __str__(self):
        info ="(model_based_transform): %s \n"%(str(self.transform))
        info += "(CNN): %s\n"%(str(self.nn))
        return info
        

# Run the experiements

In [None]:
def run(model, num_epochs=20):
    print("Train and evaluate model %s"%(str(model)))
    accurracys=[]
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    
    acc_val1 = eval(model, dataloader_val1)*100
    acc_val2 = eval(model, dataloader_val2)*100
    accurracys.append((acc_val1,acc_val2))

    print("epoch 0 /%i Acc(set1): %f %%, Acc(set2): %f %%"%( num_epochs, acc_val1, acc_val2))
    for epoch in range(num_epochs): # loop over the dataset multiple times
        #train model
        #TODO
        
        #calculatte accurracys
        acc_val1 = #TODO
        acc_val2 = #TODO
        accurracys.append((acc_val1,acc_val2))
        print("epoch %i /%i Acc(set1): %f %%, Acc(set2): %f %%"%(epoch+1, num_epochs, acc_val1, acc_val2))
    #plot accurracys
    plt.plot(list(range(num_epochs+1)), [a[0] for a in accurracys])
    plt.plot(list(range(num_epochs+1)), [a[1] for a in accurracys])
    plt.ylim([0, 100])
    plt.show() 

### CNN Model

In [None]:
cnn_model = #TODO
run(cnn_model)

### Hybrid (NormalizedRG and CNN)

In [None]:
#Create a Hybrid of the normalized RG and CNN Model
rg_cnn_model = #TODO
run(rg_cnn_model)

### Hybrid (LBP and CNN)

In [None]:
#Create a Hybrid of LBP(1,8) and the CNN Model
lbp_cnn_model = #TODO
run(lbp_cnn_model)

# Conclusion
**TODO:** Write down your conclusion from the experiements in 2-4 sentences 