# Neural Networks and Learning Machines
## Exercise 2 - Training and Validation
This exercise builds on the first one. The aim of the first exercise was that you should get familiar with Jupyter notebooks and the PyTorch framework by training your first neural network.

In this exercise, you have the task to properly validate the classifier and improve the training procedure to get a more reliable (and justifiable) score.

However, before starting that task, you should complete a more basic exercise with perceptrons to understand how units in artificial neural networks are evaluated and how changes of the parameters can make a neuron detect a specific input pattern.

In this exercise you will need to:
    1. Define perceptrons by hand that predict the digits of a 7-segment display.
    2. Implement a way to validate the performance of an MNIST classifier.
    3. Rewrite the training protocol (the code) to avoid overfitting.
    4. Produce a graph of the training and validation performance of the model over the number of training epochs.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy
#from tqdm import tqdm
from tqdm.notebook import tqdm

# To setup run :
# conda install -c conda-forge ipywidgets tqdm

### 7-segment prediction

A 7-segment display [https://en.wikipedia.org/wiki/Seven-segment_display] can be used to display the different digits by turning the different segments (A,B,C,D,E,F) on or off. Your task is to design ten different perceptrons that recognizes the ten different digits (0,1,2,3, ... ,9) represented by a 7-segment display. The input to each perceptron will be a vector {A,B,C,D,E,F} where A is 1 if segment A is turned on and 0 otherwise (and the same for all the other segments).

This means that for each digit you should create a perceptron which output is larger than 0 for that digit and below 0 for all other digits. You don't need to consider non-digit cases.

![Seven Segment Display](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/7_Segment_Display_with_Labeled_Segments.svg/225px-7_Segment_Display_with_Labeled_Segments.svg.png)

For this task it is recommended to use numpy rather than PyTorch.

After completing this exercise you should understand how an artificial neural network unit (like the perceptron) produces one scalar output from multiple input values, and how the parameter values determine the relation between output and input values (amplitudes).

In [None]:
### WRITE CODE FOR 10 DIFFERENT PERCEPTRONS THAT EACH RECOGNIZES A SPECIFIC DIGIT

# Example perceptron
# Note that this perceptron is missing something and can therefore not solve the given task
# Fix that something and then create ten different neurons to solve recognize each digit

# The input, since all 7 segments are currently 1 this input would correspond to an "8"

def perceptron(x, w):
    return 1.0 if w.dot(x) >= 1.0 else 0.0

    
def result(x):
    return [
        perceptron(x, numpy.array([1/6,1/6,1/6,1/6,1/6,1/6,-1])),
        perceptron(x, numpy.array([-1,1/2,1/2,-1,-1,-1,-1])),
        perceptron(x, numpy.array([1/5,1/5,-1,1/5,1/5,-1,1/5])),
        perceptron(x, numpy.array([1/5,1/5,1/5,1/5,-1,-1,1/5])),
        perceptron(x, numpy.array([-1, 1/4, 1/4, -1, -1, 1/4, 1/4])),
        perceptron(x, numpy.array([1/5,-1,1/5,1/5,-1,1/5,1/5])),
        perceptron(x, numpy.array([1/6,-1,1/6,1/6,1/6,1/6,1/6])),
        perceptron(x, numpy.array([1/3,1/3,1/3,-1,-1,-1,-1])),
        perceptron(x, numpy.array([1/7,1/7,1/7,1/7,1/7,1/7,1/7])),
        perceptron(x, numpy.array([1/6,1/6,1/6,1/6,-1,1/6,1/6]))
    ]

assert result(numpy.array([1,1,1,1,1,1,0])) == [1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
assert result(numpy.array([0,1,1,0,0,0,0])) == [0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
assert result(numpy.array([1,1,0,1,1,0,1])) == [0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
assert result(numpy.array([1,1,1,1,0,0,1])) == [0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0]
assert result(numpy.array([0,1,1,0,0,1,1])) == [0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0]
assert result(numpy.array([1,0,1,1,0,1,1])) == [0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0]
assert result(numpy.array([1,0,1,1,1,1,1])) == [0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0]
assert result(numpy.array([1,1,1,0,0,0,0])) == [0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0]
assert result(numpy.array([1,1,1,1,1,1,1])) == [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0]
assert result(numpy.array([1,1,1,1,0,1,1])) == [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0]

### MNIST dataset preparation

In [None]:
# How many mini-batches will the dataset be split into
batch_size = 1000

# Create a loader for the training set
mnist_train = datasets.MNIST('./', train=True, download=True, transform=transforms.ToTensor())
train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=False)

# Create a loader for the validation set
mnist_validation = datasets.MNIST('./', train=False, download=True, transform=transforms.ToTensor())
validation_loader = DataLoader(mnist_validation, batch_size=batch_size, shuffle=False)


def plot_digit(data):
    # Transfrom the images into an appropriate shape for displaying
    data = data.view(28,28)
    plt.imshow(data, cmap='gray')
    plt.show()
    
images, labels = next(iter(train_loader))
plot_digit(images[0])

### The Network

Like in the previous exercise you'll need to implement an artificial neural network, select an optimizer, and select a loss function that results in low validation and training errors.

The units of the resulting artificial neural network functions much like the perceptrons that you designed in the first part of this exercise, with the differences that other activation functions are considered; many units are connected; and that there are many more parameters that are optimized numerically. 

In [None]:
# This code initializes the neural network
network = nn.Sequential(
    nn.Linear(784, 784),
    nn.ReLU(),
    nn.Linear(784, 784),
    nn.ReLU(),
    nn.Linear(784, 784),
    nn.ReLU(),
    nn.Linear(784, 10)
)

# Initialize the optimizer
optimizer = optim.SGD(network.parameters(), lr=0.5)

# Initialize the loss function
loss_function = nn.MSELoss()

# An Embedding layer used for turning int into one-hot (0 -> [1,0,0,0,0,0,0,0,0,0], 5 -> [0,0,0,0,0,1,0,0,0,0])
to_onehot = nn.Embedding(10, 10) 
to_onehot.weight.data = torch.eye(10)

### Training the network
From here on forward you will need to edit the code in order to complete the exercise.
You have been provided with a simple training code and no validation code at all. The goal of the exercise is to implement a validation procedure to evaluate how well the network learns to generalize, plot the performance of the network on the validation set over a number of training epochs, and then to improve the training code to minimize overfitting.

In [None]:
# Validation function
def validate(iterable):
    nb = 0
    success = 0
    for batch_nr, (images, labels) in enumerate(iterable):
        images = images.view(-1,784)
        prediction = network(images)
        for p,l in zip(prediction,labels):
            guess = torch.argmax(p, dim=-1)
            nb += 1
            if guess.item()==l.item():
                success+=1
    return success/nb

In [None]:
# Decide the number of epochs to train for (one epoch is one complete run-through of the data)
epochs = 80


# Create a list to keep track of how the loss changes
losses = []
training_score = []
validation_score = []

# For each epoch

pbar_epoch = tqdm(range(epochs))
pbar_epoch.set_description("Epoch 0 : tscore=? & vscore=?")
pbar_batch = None

for epoch in pbar_epoch:
    # A variable for containing the sum of all batch losses for this epoch
    epoch_loss = 0
    
    # For each batch
    if(pbar_batch):
        pbar_batch.reset()
    else:
        pbar_batch = tqdm(total=len(train_loader))
        
    for batch_nr, (images, labels) in enumerate(train_loader):        
        # Extract the labels and turn them into one-hot representation (note: not all loss functions needs this)
        labels = to_onehot(labels)
        # Reshape the images to a single vector (28*28 = 784)
        images = images.view(-1,784)
        # Predict for each digit in the batch whatclass they belong to
        prediction = network(images)
        # Calculate the loss of the prediction by comparing to the expected output
        loss = loss_function(prediction, labels)
        # Backpropogate the loss through the network to find the gradients of all parameters
        loss.backward()
        # Update the parameters along their gradients
        optimizer.step()
        # Clear stored gradient values
        optimizer.zero_grad()
        # Add the loss to the total epoch loss (item() turns a PyTorch scalar into a normal Python datatype)
        epoch_loss += loss.item()
        #Print the epoch, batch, and loss
        pbar_batch.update()
        pbar_batch.set_description("Batch : loss {:1.5f}".format(loss))
    
    #Append the epoch loss to the list of losses
    losses.append(epoch_loss)
    training_score.append(validate(train_loader))
    validation_score.append(validate(validation_loader))
    pbar_epoch.set_description("Epoch {} : tscore={:1.4f} & vscore={:1.4f}".format(epoch, training_score[-1],validation_score[-1]))

# Plot the training loss per epoch
plt.plot(range(1,epochs+1),losses)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()

plt.plot(range(1,epochs+1),validation_score, 'g')
plt.plot(range(1,epochs+1),training_score, 'r')
plt.xlabel('Epochs')
plt.ylabel('Score')
plt.show()

# Save score
import pickle
saveString = pickle.dumps((validation_score,training_score))
        

In [None]:
# Score saved 
import pickle

# Score 1, epoch 50, 1 HD and 0.5 as lossfunction
validation_score, training_score = pickle.loads((b'\x80\x03]q\x00(G?\xe9\xe0\r\x1bqu\x8eG?\xeb\\(\xf5\xc2\x8f\\G?\xec\x1f\xf2\xe4\x8e\x8arG?\xec\xb9#\xa2\x9cw\x9aG?\xed\x11\x9c\xe0u\xf6\xfdG?\xedP\xb0\xf2{\xb2\xffG?\xedo\x00h\xdb\x8b\xacG?\xed\x97$tS\x8e\xf3G?\xed\xc1\xbd\xa5\x11\x9c\xe0G?\xed\xd2\xf1\xa9\xfb\xe7mG?\xed\xef\x9d\xb2-\x0eVG?\xee\x0bx\x03F\xdc]G?\xee\x18\x93t\xbcj\x7fG?\xee\'RT`\xaaeG?\xee8\x86YJ\xf4\xf1G?\xeeFs\x81\xd7\xdb\xf5G?\xeeQ\xeb\x85\x1e\xb8RG?\xee_\xd8\xad\xab\x9fVG?\xeekP\xb0\xf2{\xb3G?\xeeu%F\n\xa6LG?\xee}Vl\xf4\x1f!G?\xee\x86YJ\xf4\xf0\xd8G?\xee\x8f\\(\xf5\xc2\x8fG?\xee\x9b\xa5\xe3S\xf7\xcfG?\xee\xa3\xd7\n=p\xa4G?\xee\xaad\xc2\xf87\xb5G?\xee\xafO\r\x84M\x01G?\xee\xb49X\x10bNG?\xee\xb8Q\xeb\x85\x1e\xb8G?\xee\xbcj~\xf9\xdb#G?\xee\xbd<6\x114\x05G?\xee\xc3\xc9\xee\xcb\xfb\x16G?\xee\xc7\xe2\x82@\xb7\x80G?\xee\xcep:\xfb~\x91G?\xee\xd0\xe5`A\x897G?\xee\xd1\xb7\x17X\xe2\x19G?\xee\xd8D\xd0\x13\xa9*G?\xee\xdc]c\x88e\x95G?\xee\xde\xd2\x88\xcep;G?\xee\xe4\x8e\x8aq\xdejG?\xee\xe7\xd5f\xcfA\xf2G?\xee\xe9x\xd4\xfd\xf3\xb6G?\xee\xeb\xed\xfaC\xfe]G?\xee\xeec\x1f\x8a\t\x03G?\xee\xef4\xd6\xa1a\xe5G?\xee\xf1\xa9\xfb\xe7l\x8bG?\xee\xf4\x1f!-w2G?\xee\xf7e\xfd\x8a\xda\xbaG?\xee\xf9\xdb"\xd0\xe5`G?\xee\xfd\xf3\xb6E\xa1\xcbe]q\x01(G?\xe9\x85A\xac+%\x00G?\xeb#\\\xb4\xc5\'uG?\xeb\xf2\x12\xd7s\x18\xfcG?\xecxl"h\t\xd5G?\xec\xd37\x91\xaeZcG?\xed\x18\xd9\\n\xdduG?\xedN\xc7\x9c\x9a\x8eEG?\xed}\xe2<Y\x05\rG?\xed\xa2\xbfks\xa4\xccG?\xed\xc4\xe1\x8d\x95\xc6\xeeG?\xed\xdc\xc6?\x14\x12\x06G?\xed\xf6+j\xe7\xd5gG?\xee\x0cI\xba^5?G?\xee\x1d}\xbfH\x7f\xccG?\xee,\xc8nQ\xa5\x9dG?\xee=\xb6\x8b\x89}3G?\xeeL/\x83{J#G?\xeeY\xd6\xc4U\xbe1G?\xeei!s^\xe4\x03G?\xeeu\x02R1l\xd1G?\xee\x80\x11y\xec\x9c\xbeG?\xee\x89\xe6\x0f\x04\xc7WG?\xee\x91\xaeZb\x93\xbbG?\xee\x99\x99\x99\x99\x99\x9aG?\xee\xa3n.\xb1\xc43G?\xee\xaa\xcd\x9e\x83\xe4&G?\xee\xb49X\x10bNG?\xee\xbbR\xe00\x0fKG?\xee\xc3\xc9\xee\xcb\xfb\x16G?\xee\xca4\xb3\xad\x88\xabG?\xee\xd0|\x84\xb5\xdc\xc6G?\xee\xd7-1I\xddRG?\xee\xdd\xba\xea\x04\xa4cG?\xee\xe3S\xf7\xce\xd9\x17G?\xee\xe8\x84*\raYG?\xee\xec\xbf\xb1[W?G?\xee\xf0\xfb8\xa9M$G?\xee\xf5\x13\xcc\x1e\t\x8fG?\xee\xf8\xc3\x84\x07\x19\x88G?\xee\xfdg\xe6\xe0\xbb\xdfG?\xef\x01]\x86|>\xceG?\xef\x05S&\x17\xc1\xbeG?\xef\t\xd4\x95\x18*\x99G?\xef\x0cl\xae7n\xbbG?\xef\x0f\x90\x96\xbb\x98\xc8G?\xef\x13\xa9*0U2G?\xef\x18p\x80\xe31\x04G?\xef\x1b\x94ig[\x11G?\xef \xa1\xa7\xcc\xa9\xd9G?\xef$\x97Gh,\xc8e\x86q\x02.'))

plt.plot(range(1,epochs+1),validation_score, 'g')
plt.plot(range(1,epochs+1),training_score, 'r')
plt.xlabel('Epochs')
plt.ylabel('Score')
plt.show()