# Homework 2

As for homework 1, try to write your solution in a Jupyter Notebook trying to explain your choices using comments or (preferably) the markdown cells. For this homework, Python scripts will be accepted as well.


Reconstruct in PyTorch the first experiment in [Learning representations by back-propagating errors](https://www.nature.com/articles/323533a0) with learning rule in eq.8 (gradient descent without momentum) ([alternative link to paper](https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf)).

Try to be as close as possible to the original protocol, except for what regards the learning rule, and perhaps the random initialization method

1. Read the paper (don’t worry if you don’t understand the other experiments in detail, because our focus is on the first one)

2. Create the data, the model and everything is needed (do not use dataloaders if you don’t know yet how they work)

3. Train the model

4. Inspect the weights you obtained and check if they provide a solution to the problem

Compare the solution to the solution reported in the paper



## Point 2

We will reproduce the experiment in Fig. 1 of the paper.

We want to detect myrror symmetry in the input vectors. Since we have 6 nodes in the inuts units our vectors will have 6 elements in each vector, which will be $1$ or $0$. We will have $64$ possible input vectors.

Since we have two hidden units, we will have one layer with two nodes, both having the bias.

In [1]:
import torch

# Dataset handling
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

# Permutation building
from itertools import product

We will generate all the 64 possible vectors long 6 of $0$ or $1$ digits.

In [2]:
X = torch.Tensor([item for item in product([0, 1], repeat=6)])

In [3]:
leng = 64

In [4]:
flag = 1
y = torch.zeros((leng, 1))
for j in range(leng):
    for i in range(3):
        if X[j][i] != X[j][5-i]:
            flag = 0
    if flag == 1:
        y[j] = 1
        print(X[j])
    flag = 1

y.T

tensor([0., 0., 0., 0., 0., 0.])
tensor([0., 0., 1., 1., 0., 0.])
tensor([0., 1., 0., 0., 1., 0.])
tensor([0., 1., 1., 1., 1., 0.])
tensor([1., 0., 0., 0., 0., 1.])
tensor([1., 0., 1., 1., 0., 1.])
tensor([1., 1., 0., 0., 1., 1.])
tensor([1., 1., 1., 1., 1., 1.])


tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
         1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0.,
         0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

In [5]:
train = TensorDataset(X, y)
trainloader = DataLoader(train, batch_size=leng, shuffle=False)

In [6]:
epsilon = 0.1  # learning rate

In [7]:
class MLP(torch.nn.Module):
    
    def __init__(self):
        super().__init__()
        self.layer1 = torch.nn.Linear(in_features =  6, out_features = 2, bias = True)
        self.layer2 = torch.nn.Linear(in_features =  2, out_features = 1, bias = True)
        
    def forward(self, X):
        out = self.layer1(X)
        out = torch.sigmoid(out)
        out = self.layer2(out)
        out = torch.sigmoid(out)
        return out

## Point 3

In [8]:
num_epochs = 1000

In [9]:
model = MLP()
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=epsilon)

In [10]:
class AverageMeter(object):
    def __init__(self):
        self.reset()

    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0

    def update(self, val, n=1):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count

In [11]:
def accuracy(y_hat, y):
    '''
    y_hat is the model output - a Tensor of shape (n x num_classes)
    y is the ground truth

    How can we implement this function?
    '''
    classes_prediction = y_hat.argmax(dim=1)
    match_ground_truth = classes_prediction == y # -> tensor of booleans
    correct_matches = match_ground_truth.sum()
    return (correct_matches / y_hat.shape[0]).item()

In [12]:
def train_epoch(model, dataloader, loss_fn, optimizer, loss_meter, accuracy_meter):
    for X, y in dataloader:
        # 1. reset the gradients previously accumulated by the optimizer
        #    this will avoid re-using gradients from previous loops
        optimizer.zero_grad()
        # 2. get the predictions from the current state of the model
        #    this is the forward pass
        y_hat = model(X)
        # 3. calculate the loss on the current mini-batch
        loss = loss_fn(y_hat, y)
        # 4. execute the backward pass given the current loss
        loss.backward()
        # 5. update the value of the params
        optimizer.step()
        # 6. calculate the accuracy for this mini-batch
        acc = accuracy(y_hat, y)
        # 7. update the loss and accuracy AverageMeter
        loss_meter.update(val=loss.item(), n=X.shape[0])
        accuracy_meter.update(val=acc, n=X.shape[0])

def train_model(model, dataloader, loss_fn, optimizer, num_epochs):
    model.train()
    for epoch in range(num_epochs):
        loss_meter = AverageMeter()
        accuracy_meter = AverageMeter()
        train_epoch(model, dataloader, loss_fn, optimizer, loss_meter, accuracy_meter)
        # now with loss meter we can print both the cumulative value and the average value
        print(f"Epoch {epoch+1} completed. Loss - total: {loss_meter.sum} - average: {loss_meter.avg}; Accuracy: {accuracy_meter.avg}")
    # we also return the stats for the final epoch of training
    return loss_meter.sum, accuracy_meter.avg

In [13]:
loss, acc = train_model(model, trainloader, loss_fn, optimizer, num_epochs)
print(f"Training completed - final accuracy {acc} and loss {loss}")

605957 - average: 0.10939176380634308; Accuracy: 56.0
Epoch 802 completed. Loss - total: 7.001069068908691 - average: 0.1093917042016983; Accuracy: 56.0
Epoch 803 completed. Loss - total: 7.001065731048584 - average: 0.10939165204763412; Accuracy: 56.0
Epoch 804 completed. Loss - total: 7.001063346862793 - average: 0.10939161479473114; Accuracy: 56.0
Epoch 805 completed. Loss - total: 7.001059055328369 - average: 0.10939154773950577; Accuracy: 56.0
Epoch 806 completed. Loss - total: 7.001055717468262 - average: 0.10939149558544159; Accuracy: 56.0
Epoch 807 completed. Loss - total: 7.0010528564453125 - average: 0.10939145088195801; Accuracy: 56.0
Epoch 808 completed. Loss - total: 7.001049518585205 - average: 0.10939139872789383; Accuracy: 56.0
Epoch 809 completed. Loss - total: 7.001046180725098 - average: 0.10939134657382965; Accuracy: 56.0
Epoch 810 completed. Loss - total: 7.00104284286499 - average: 0.10939129441976547; Accuracy: 56.0
Epoch 811 completed. Loss - total: 7.0010395050

## Point 4

In [14]:
model.state_dict()

OrderedDict([('layer1.weight',
              tensor([[ 0.3047,  0.2600, -0.0881, -0.2216, -0.3213,  0.4201],
                      [-0.3349,  0.1256, -0.3005,  0.3302,  0.4106,  0.1697]])),
             ('layer1.bias', tensor([ 0.1375, -0.2310])),
             ('layer2.weight', tensor([[-0.5246, -0.3130]])),
             ('layer2.bias', tensor([-1.4803]))])