## Authors:
#### Daniel Stöckein (5018039), Alexander Triol (5018451)

In [4]:
import pandas as pd
import numpy as np
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
from torchvision import datasets, transforms
from matplotlib import pyplot as plt
%matplotlib inline
import time

In [5]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## 1. Load Data

In [6]:
mnist_train = datasets.FashionMNIST(
    root='..\datasets', 
    train=True, 
    download=True, 
    transform=transforms.ToTensor()
)

mnist_test = datasets.FashionMNIST(
    root='..\datasets', 
    train=False, 
    download=True, 
    transform=transforms.ToTensor()
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ..\datasets/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting ..\datasets/FashionMNIST/raw/train-images-idx3-ubyte.gz to ..\datasets/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ..\datasets/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting ..\datasets/FashionMNIST/raw/train-labels-idx1-ubyte.gz to ..\datasets/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ..\datasets/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting ..\datasets/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ..\datasets/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ..\datasets/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting ..\datasets/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ..\datasets/FashionMNIST/raw



## 2. Preparing DataLoader

In [7]:
def dloaders(batch_size):
    train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(mnist_test, batch_size=batch_size, shuffle=False)
    return train_loader, test_loader

## 3. Model definition

### 3.1 Dense Layer
``DenseLayer`` is the implementation of a single layer inside a ``DenseBlock``. 
- batch normalization
- ReLU
- 3x3 convolution

In [8]:
class DenseLayer(nn.Module):
    
    def __init__(self, input_channels, output_channels):
        super(DenseLayer, self).__init__()
        
        self.layer = nn.Sequential(
            nn.BatchNorm2d(input_channels),
            nn.ReLU(),
            nn.Conv2d(input_channels, output_channels, kernel_size=3, padding=1)
        )
        
    def forward(self, x):
        out = self.layer(x)
        out = torch.cat((out, x), dim=1)
        return out

### 3.2 Dense Block
``DenseBlock`` contains multiple ``DenseLayers`` applied in a sequence. Each dense layser has input:
- original input + concatination of previous layers

In [9]:
class DenseBlock(nn.Module):
    
    def __init__(self, input_channels, output_channels, num_layers):
        super(DenseBlock, self).__init__()
        
        layers = []
        for layer_index in range(num_layers): # e.g 5 repitions when num_layers = 5
            layers.append(
                DenseLayer(
                    input_channels + layer_index * output_channels,
                    output_channels
                )
            )
            
        # https://discuss.pytorch.org/t/append-for-nn-sequential-or-directly-converting-nn-modulelist-to-nn-sequential/7104
        self.block = nn.Sequential(*layers) 
        
    def forward(self, x):
        out = self.block(x)
        return out

### 3.3 Transition Layer
Transition layer to perform downsampling. Reducing the dimensionality using a 1v1 conv layer. Average pooling to reduce height and width dimension.

In [10]:
class TransitionLayer(nn.Module):
    
    def __init__(self, input_channels, output_channels):
        super(TransitionLayer, self).__init__()

        self.transition = nn.Sequential(
            nn.BatchNorm2d(input_channels),
            nn.ReLU(),
            nn.Conv2d(input_channels, output_channels, kernel_size=1),
            nn.AvgPool2d(kernel_size=2, stride=2)
        )      
        
    def forward(self, x):
        out = self.transition(x)
        return out

### 3.4 Net
- Before the first DenseBlock the inputs are passed through 3x3 convolution (preserving spatial dimensions) with 4 output channels and average pooling with stride 2 (no padding) to reduce the size.
- After that the blocks should be build and transition layers applied
- Finally, apply global average pooling, flatten the output and pass it as input to FC layer
- Some code snippets from
- - https://amaarora.github.io/2020/08/02/densenets.html
- - https://towardsdatascience.com/simple-implementation-of-densely-connected-convolutional-networks-in-pytorch-3846978f2f36

In [11]:
class DenseNet(nn.Module):
    def __init__(self, input_channels, num_classes):
        super(DenseNet, self).__init__()
        
        output_channels = 4
        growth_rate = 32
        num_layers_per_block = [3, 3] # 3 layers and 2 blocks
        
        self.net_input = nn.Sequential(
            nn.Conv2d(in_channels=input_channels, out_channels=output_channels, kernel_size=3),
            nn.AvgPool2d(kernel_size=3, stride=2)
        )
        
        blocks = []
        for block_index, num_layers in enumerate(num_layers_per_block):
            blocks.append(
                DenseBlock(output_channels, growth_rate, num_layers)
            )
            output_channels += num_layers * growth_rate 
            
            if block_index < len(num_layers_per_block) - 1:
                blocks.append(
                    TransitionLayer(output_channels, 4)
                )
                output_channels = 4
                
        self.blocks = nn.Sequential(*blocks)
        
        self.net_output = nn.Sequential(
            nn.BatchNorm2d(output_channels),
            nn.ReLU(),
            nn.AdaptiveMaxPool2d((1, 1)),
            nn.Flatten(),
            nn.Linear(output_channels, num_classes)
        )
        
    def forward(self, x):  
        out = self.net_input(x)
        out = self.blocks(out)
        out = self.net_output(out)
        return out

## 4. Answers to questions

First, print the model to see how it looks like.

In [12]:
model = DenseNet(1, 10)
print(model)

DenseNet(
  (net_input): Sequential(
    (0): Conv2d(1, 4, kernel_size=(3, 3), stride=(1, 1))
    (1): AvgPool2d(kernel_size=3, stride=2, padding=0)
  )
  (blocks): Sequential(
    (0): DenseBlock(
      (block): Sequential(
        (0): DenseLayer(
          (layer): Sequential(
            (0): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (1): ReLU()
            (2): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
        )
        (1): DenseLayer(
          (layer): Sequential(
            (0): BatchNorm2d(36, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (1): ReLU()
            (2): Conv2d(36, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          )
        )
        (2): DenseLayer(
          (layer): Sequential(
            (0): BatchNorm2d(68, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            (1): ReLU()
            (2): Conv2d(68, 32, kern

### Dataset description
- training set: 60.000 images
- test set: 10.000 images
- number of channels: 1 (greyscale)
- height: 28px
- width: 28px
- features: 28x28px = 784
- outputs: 10 categories

https://github.com/zalandoresearch/fashion-mnist

In [13]:
print(mnist_train[1][0].shape)

torch.Size([1, 28, 28])


Test scenario according to defined ``DenseNet``

In [14]:
train_loader, test_loader = dloaders(batch_size=50) # data iters

first_conv = nn.Sequential(
    nn.Conv2d(in_channels=1, out_channels=4, kernel_size=3),
    nn.AvgPool2d(kernel_size=3, stride=2)
)

In [15]:
net_output = nn.Sequential(
            nn.BatchNorm2d(100),
            nn.ReLU(),
            nn.AdaptiveMaxPool2d((1, 1))
)

Pass one batch to the network and print out shapes

In [16]:
for batch_index, (features, labels) in enumerate(train_loader):
    print("input: ", features.shape, "\n")
    
    x = first_conv(features)
    print("After first conv: ", x.shape, "\n")
    
    block = DenseBlock(4, 32, 3)
    x = block(x)
    print("Out first block: ", x.shape)
    block = TransitionLayer(100, 4)
    x = block(x)
    print("After transition: ", x.shape, "\n")
    
    block = DenseBlock(4, 32, 3)
    x = block(x)
    print("Out second block: ", x.shape, "\n")
    
    x = net_output(x)
    print("input to FC: ", x.shape)
    x = torch.flatten(x, start_dim=1)
    print("flattened input to FC: ", x.shape)
    
    logits = model(features)
    break

input:  torch.Size([50, 1, 28, 28]) 

After first conv:  torch.Size([50, 4, 12, 12]) 

Out first block:  torch.Size([50, 100, 12, 12])
After transition:  torch.Size([50, 4, 6, 6]) 

Out second block:  torch.Size([50, 100, 6, 6]) 

input to FC:  torch.Size([50, 100, 1, 1])
flattened input to FC:  torch.Size([50, 100])


**4.1**: The spatial dimension of the first DenseBlock is ``12x12`` with ``4`` input channels and ``100`` output channels.

- Conv2d: ``n - f + 1`` = 28 - 3 + 1 = 26x26
- AveragePooling: ``floor((input + 2*pad - filter) / stride) + 1`` = (26 + 0 - 3 / 2) + 1 = 12x12
https://stats.stackexchange.com/questions/223630/fractional-output-dimensions-of-sliding-windows-convolutions-pooling-etc-in

**4.2**: The spatial dimension of the second DenseBlock is ``6x6``.

**4.3**: The input dimension of the final FC layer is ``100``.

## 5. Metrics
- computes accuracy and returns ``list(correct, wrong, accuracy)``

In [17]:
def comp_accuracy(model, data_loader):
    correct = 0
    wrong = 0
    num_examples = 0
    
    # turn on eval mode if model Inherits from nn.Module
    if isinstance(model, nn.Module):
        model.eval()
    
    with torch.no_grad():
        for batch_index, (features, labels) in enumerate(data_loader):
            features = features.to(device)
            labels = labels.to(device)

            logits = model(features)
            _, predictions = torch.max(logits, dim=1) # single class with highest probability. simply retain indices

            num_examples += labels.size(0)

            correct += (predictions == labels).sum().float()
            wrong += (predictions != labels).sum().float()
            
        accuracy = correct / num_examples * 100      
        
    return correct, wrong, accuracy

## 5. Training

In [18]:
def fit(model, train_loader, epochs, learning_rate, loss_func=nn.CrossEntropyLoss(), opt_func=torch.optim.SGD):
    
    optimizer = opt_func(model.parameters(), learning_rate) # objective function
    
    model = model.to(device)
    
    start = time.time() # measure time
    
    for epoch in range(epochs):
        
        model = model.train()
              
        for batch_index, (features, labels) in enumerate(train_loader):
            
            # gpu usage if possible
            features = features.to(device)
            labels = labels.to(device)
            
            # 1. forward
            logits = model(features)

            # 2. compute objective function (softmax, cross entropy)
            cost = loss_func(logits, labels)

            # 3. cleaning gradients
            optimizer.zero_grad() 

            # 4. accumulate partial derivatives
            cost.backward() 

            # 5. step in the opposite direction of the gradient
            optimizer.step() 
            
            if not batch_index % 250:
                print ('Epoch: {}/{} | Batch {}/{} | Cost: {:.4f}'.format(
                    epoch+1,
                    epochs,
                    batch_index,
                    len(train_loader),
                    cost
                ))
        
        correct, wrong, accuracy = comp_accuracy(model, train_loader)
        print ('Training: Correct[{:.0f}] | Wrong[{:.0f}] | Accuracy[{:.2f}%]'.format(
            correct,
            wrong,
            accuracy
        ), '\n')
      
    correct, wrong, accuracy = comp_accuracy(model, test_loader)  
    print ('Test: Correct[{:.0f}] | Wrong[{:.0f}] | Accuracy[{:.2f}%]'.format(
        correct,
        wrong,
        accuracy
    ), '\n')
    
    end = time.time()
    print('Training time: {:.2f} seconds on {}'.format(
        end - start, 
        device
    ))    

## 6. Playground

### 1. Attempt

In [19]:
batch_size = 50
epochs = 10
learning_rate = 0.01

train_loader, test_loader = dloaders(batch_size=batch_size) # data iters

In [20]:
model = DenseNet(1, 10)
fit(model, train_loader, epochs, learning_rate) # training

Epoch: 1/10 | Batch 0/1200 | Cost: 3.1749
Epoch: 1/10 | Batch 250/1200 | Cost: 0.5342
Epoch: 1/10 | Batch 500/1200 | Cost: 0.7364
Epoch: 1/10 | Batch 750/1200 | Cost: 0.3925
Epoch: 1/10 | Batch 1000/1200 | Cost: 0.4919
Training: Correct[51068] | Wrong[8932] | Accuracy[85.11%] 

Epoch: 2/10 | Batch 0/1200 | Cost: 0.4710
Epoch: 2/10 | Batch 250/1200 | Cost: 0.4769
Epoch: 2/10 | Batch 500/1200 | Cost: 0.5527
Epoch: 2/10 | Batch 750/1200 | Cost: 0.2541
Epoch: 2/10 | Batch 1000/1200 | Cost: 0.2001
Training: Correct[52215] | Wrong[7785] | Accuracy[87.03%] 

Epoch: 3/10 | Batch 0/1200 | Cost: 0.2712
Epoch: 3/10 | Batch 250/1200 | Cost: 0.3581
Epoch: 3/10 | Batch 500/1200 | Cost: 0.4133
Epoch: 3/10 | Batch 750/1200 | Cost: 0.4042
Epoch: 3/10 | Batch 1000/1200 | Cost: 0.3779
Training: Correct[53182] | Wrong[6818] | Accuracy[88.64%] 

Epoch: 4/10 | Batch 0/1200 | Cost: 0.2638
Epoch: 4/10 | Batch 250/1200 | Cost: 0.2775
Epoch: 4/10 | Batch 500/1200 | Cost: 0.3864
Epoch: 4/10 | Batch 750/1200 | Co

### Summary 1. Attempt
With a learning rate of 0.01, 10 epochs and a batch size of 50 our model was able to reach a accuracy of around ~89% on the test set with 10.000 samples. The training time decreased from CPU runtime 1400 seconds (previous experiments) to 328 seconds on gpu. The training accuracy only slightly increased per epoch, the model was able to perform well on the first epoch already.

The model from problem 1 with the same hyperparameters reached ~82% on training aswell as on the testdata. 

### 2. Attempt

In [21]:
batch_size = 256
epochs = 25
learning_rate = 0.1

train_loader, test_loader = dloaders(batch_size=batch_size) # data iters

In [22]:
fit(model, train_loader, epochs, learning_rate) # training

Epoch: 1/25 | Batch 0/235 | Cost: 0.2595
Training: Correct[52356] | Wrong[7644] | Accuracy[87.26%] 

Epoch: 2/25 | Batch 0/235 | Cost: 0.2926
Training: Correct[51623] | Wrong[8377] | Accuracy[86.04%] 

Epoch: 3/25 | Batch 0/235 | Cost: 0.3176
Training: Correct[52008] | Wrong[7992] | Accuracy[86.68%] 

Epoch: 4/25 | Batch 0/235 | Cost: 0.3194
Training: Correct[48938] | Wrong[11062] | Accuracy[81.56%] 

Epoch: 5/25 | Batch 0/235 | Cost: 0.3901
Training: Correct[54177] | Wrong[5823] | Accuracy[90.29%] 

Epoch: 6/25 | Batch 0/235 | Cost: 0.2424
Training: Correct[54168] | Wrong[5832] | Accuracy[90.28%] 

Epoch: 7/25 | Batch 0/235 | Cost: 0.2849
Training: Correct[52503] | Wrong[7497] | Accuracy[87.50%] 

Epoch: 8/25 | Batch 0/235 | Cost: 0.4506
Training: Correct[55061] | Wrong[4939] | Accuracy[91.77%] 

Epoch: 9/25 | Batch 0/235 | Cost: 0.1927
Training: Correct[50876] | Wrong[9124] | Accuracy[84.79%] 

Epoch: 10/25 | Batch 0/235 | Cost: 0.2504
Training: Correct[50124] | Wrong[9876] | Accurac

### Summary 2. Attempt
The model was not able to reach higher accuracy on the test data even though we used more epochs here. That probably due to the higher batch size. Considering the higher training time, this was a worse attempt than the first.