# Embedded ML Lab - Excercise 0 - Intro Inference


We start with a NN model similar to the LeNet model from 1989 (https://en.wikipedia.org/wiki/LeNet). The LeNet Model is designed to detect handwritten numbers from the MNIST dataset http://yann.lecun.com/exdb/mnist/with size 28x28 and outputs a vector with size 10, where each number in this vector represents the likelihood that the input corresponds to that number. All Conv layers have `stride=1` `padding=0`.

<img src="src/lenet.png" alt="drawing" width="600"/>

<span style="color:green">Your Tasks:</span>
* <span style="color:green">Write the init code for the required modules to define LeNet  (Use the provided image to determine the number of input/ouput filters and kernel sizes)</span>
    * <span style="color:green">Determine the output size of conv2 to determine the input size of fc1</span>
    * The size of the output conv2d layer can be determined with the following formula $H_{\text{out}} = \lfloor{ \frac{H_{\text{in}} + 2 \times \text{padding} - 1 \times ( \text{kernelsize} -1 ) -1 } {\text{stride}} +1}\rfloor$
    * Here, maxpool2d with kernel size 2 reduces the input size by factor two: $H_{\text{out}} = \lfloor \frac{H_{\text{in}}}{2}\rfloor$
    * <span style="color:green">Use following modules: `nn.Conv2d, nn.Linear`</span>
* <span style="color:green">Define the forward pass of LeNet, check the provided image for the flow of data through the modules and functions</span>
    * <span style="color:green">Use the following functions: `F.relu, F.max_pool2d, tensor.flatten`</span>

In [1]:
import torch
torch.rand(1).to('cuda') #initialize cuda context (might take a while)
import torch.nn as nn
import torch.nn.functional as F

In [2]:
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        
        #---to-be-done-by-student---
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(400,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)   
        #---end---------------------
        return
    
    def forward(self,x):
        #---to-be-done-by-student---
        tmp = self.conv1(x)
        tmp = F.relu(tmp)
        tmp = F.max_pool2d(tmp,2)
        tmp = self.conv2(tmp)
        tmp = F.relu(tmp)
        tmp = F.max_pool2d(tmp,2)
        tmp = tmp.flatten(start_dim=1) #passt jetzt
        tmp = self.fc1(tmp)
        tmp = F.relu(tmp)
        tmp = self.fc2(tmp)
        tmp = F.relu(tmp)
        tmp = self.fc3(tmp)
        
        #---end---------------------
        return tmp

We can now create a new model instance

In [3]:
net = LeNet()

We now load the state dict with the filename `lenet.pt` into the model. These weights are already pretrained and should have a high accuracy when detecting MNIST images. Afterwards, we check if the network is able to detect our stored sample.

<span style="color:green">Your Task:</span>
* <span style="color:green">Load the state_dict `lenet.pt` from disk and load the state dict into the LeNet instance</span>
* <span style="color:green">Calculate the output of the network when feeding in the image</span>
    * Load the image from disk (`mnist_sample.pt`) into a tensor 
    * Note that you need to expand the dimensions of the tensor, since the network expects an input with size $N \times 1 \times 28 \times 28$ but the image is size $ 28 \times 28$. You can create two dimensions by using a slice with **[None, None, :, :]**
    * Check if the image is detected correctly. The output with the highest value corresponds to the estimated class (you can use `torch.argmax`)

In [4]:
#---to-be-done-by-student---
net.load_state_dict(torch.load('./lenet.pt'))
image = torch.load('./mnist_sample.pt')
image = image[None,None,:,:]
print(image.shape)
torch.argmax(net(image))
#---end---------------------

torch.Size([1, 1, 28, 28])


tensor(6)

Next, we want to determine the accuracy of the network using the full MNIST test data. Additionally, we want to measure the execution time for the network on the CPU as well as on the GPU.

* We first load the complete MNIST testset (10.000 Images), and zero-center and scale it.
* We create a DataLoader, which can be iterated with enumerate and returns the data in chunks of 64, so-called batches. The resulting tensor is of size $64 \times 1 \times 28 \times 28$.
* The target tensor is of size $64$ where for each image the tensor entry is the correct label number (e.g. image shows a `inputs[8, :, :, :]` shows a two, the corresponding value in the target tensor `targets[8]` is 2.

<span style="color:green">Your Task:</span>
* <span style="color:green">For every batch load the data into the network.</span>
* <span style="color:green">Calculate the overall accuracy (ratio of correctly deteced images to all images).</span>
* <span style="color:green">Calculate the overall execution time (forward pass) of the network on the cpu as well as on the gpu.</span>
    * <span style="color:green">For GPU calculations you have to load the network as well as the input to the GPU and bring the result back to the CPU for your accuracy calculations.</span>

In [5]:
import torchvision
import time

test_data = torchvision.datasets.MNIST('.', train=False, download=True, transform=torchvision.transforms.Compose([
                                                torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(
                                                (0.1307, ), (0.3081)) ]))

test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)
print(f"Number of test images: {len(test_data)}")
print(f"Number of batches: {len(test_loader)}")
_, (inputs, targets) = next(enumerate(test_loader))
print(f"Batch shape: {inputs.size()}")
print(f"Target (Labels): {targets[0:15]}")

Number of test images: 10000
Number of batches: 157
Batch shape: torch.Size([64, 1, 28, 28])
Target (Labels): tensor([7, 2, 1, 0, 4, 1, 4, 9, 5, 9, 0, 6, 9, 0, 1])


In [6]:
for dev in ['cpu','cuda']:
    device = torch.device(dev)
    correct_detected = 0
    accuracy = 0
    total_time = 0.0
    s = time.time()
    net.to(device)
    net.eval()
    for batch_idx, (inputs, targets) in enumerate(test_loader):
        #---to-be-done-by-student---
        targets = targets.to(device)
        inputs = inputs.to(device)
        pred = net(inputs)
        pred_class = pred.argmax(dim=1)
        correct_detected+=(pred_class==targets).count_nonzero()
        #---end---------------------      

    accuracy = correct_detected/len(test_data)
    print(f'({dev}) LenNet Accuracy is: {accuracy:.2%}')
    print(f'({dev}) Total time for forward pass: {round(time.time() - s, 4)}s')

(cpu) LenNet Accuracy is: 97.43%
(cpu) Total time for forward pass: 139.6275s
(cuda) LenNet Accuracy is: 97.43%
(cuda) Total time for forward pass: 82.4323s
