## A.I. Assignment 5

## Learning Goals

By the end of this lab, you should be able to:
* Get more familiar with tensors in pytorch 
* Create a simple multilayer perceptron model with pytorch
* Visualise the parameters


### Task

Build a fully connected feed forward network that adds two bits. Determine the a propper achitecture for this network (what database you use for this problem? how many layers? how many neurons on each layer? what is the activation function? what is the loss function? etc)

Create at least 3 such networks and compare their performance (how accurate they are?, how farst they are trained to get at 1 accuracy?)

Display for the best one the weights for each layer.


In [100]:
import torch
import torch.nn as nn
from collections import OrderedDict
from datetime import datetime

In [101]:
# your code here
model1 = nn.Sequential(OrderedDict([
    ('hidden', nn.Linear(2, 32)),
    ('hidden_activation', nn.ReLU()),
    ('output', nn.Linear(32, 2)),
    ('output_activation', nn.Sigmoid())
]))

model2 = nn.Sequential(OrderedDict([
    ('hidden', nn.Linear(2,128)),
    ('hidden_act', nn.ReLU()),
    ('output', nn.Linear(128,2)),
    ('output_act', nn.Sigmoid())
]))
model3 = nn.Sequential(OrderedDict([
    ('hidden', nn.Linear(2,16)),
    ('hidden_activation', nn.Sigmoid()),
    ('output', nn.Linear(16,2)),
    ('hidden_activation', nn.Sigmoid())
]))
# The database will contain 1 and 0 because we must add two bits
# We are going to have 3 main layers: input layer, hidden layers, and output layer

# MODEL 1: Defines a feedforward neural network with: 2 neurons on input layer, 32 on the hidden layer, and 2 on the output layer
# MODEL 2&3: Defines a feedforward neural network with: 2 neurons on input layer, 8 and 16 on the hidden layer, 
# and 2 on the output layer


In [102]:
print(model1)
print(model2)
print(model3)

Sequential(
  (hidden): Linear(in_features=2, out_features=32, bias=True)
  (hidden_activation): ReLU()
  (output): Linear(in_features=32, out_features=2, bias=True)
  (output_activation): Sigmoid()
)
Sequential(
  (hidden_net): Linear(in_features=2, out_features=128, bias=True)
  (hidden_act): ReLU()
  (output_net): Linear(in_features=128, out_features=2, bias=True)
  (output_act): Sigmoid()
)
Sequential(
  (hidden_net): Linear(in_features=2, out_features=16, bias=True)
  (hidden_act): Sigmoid()
  (output_net): Linear(in_features=16, out_features=2, bias=True)
  (output_act): Sigmoid()
)


In [103]:
# In data_in we have all the possible combinations of 2 bits 1 and 0 
data_in = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)

print(data_in)

tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])


In [104]:
# contains all the possible results of adding 2 bits
data_target = torch.tensor([[0,0], [0,1], [0,1], [1,0]], dtype=torch.float32)
print(data_target)

tensor([[0., 0.],
        [0., 1.],
        [0., 1.],
        [1., 0.]])


In [105]:
# The loss function for each model will be the MSELoss function (Mean Squared Error).
# MSELoss() is a torch method that creates a criterion that measures the mean squared error between each element in the 
# input x and the target y
# L1Loss() aka Absolute Error Loss, is the absolute difference prediction and the actual value, calculated for each ex in a
# dataset

criterion1 = nn.BCEWithLogitsLoss()
criterion2 = nn.MSELoss()
criterion3 = nn.L1Loss()
optimizer1 = torch.optim.SGD(model1.parameters(), lr=0.01)
optimizer2 = torch.optim.SGD(model2.parameters(), lr=0.01)
optimizer3 = torch.optim.SGD(model3.parameters(), lr=0.01)
models = [model1, model2, model3]
criterions = [criterion1,criterion2,criterion3]
optimizers = [optimizer1,optimizer2,optimizer3]

In [106]:
def train(model, inputs,outputs, criterion, optimizer):
    for epoch in range(1000):
        optimizer.zero_grad()
        output = model(inputs)
        loss = criterion(output, outputs)
        loss.backward()
        optimizer.step()
        if (epoch + 1) % 100 == 0:
            print(f"Epoch [{epoch+1}/1000], Loss: {loss.item():.4f}")

In [107]:
# In the next part, we train each model
accuracies = []

for i in range(3):
    print("Model", i+1, ": ")
    start_time = datetime.now()
    train(models[i],data_in, data_target, criterions[i], optimizers[i])
    
    # next, we compute the accuracy
    outputs = models[i](data_in)
    predicted = (outputs >= 0.5).float()
    
    accuracy = (predicted == data_target).float().mean()
    end_time = datetime.now()
    start_time = datetime.strptime(start_time.strftime("%H:%M:%S:%f"),"%H:%M:%S:%f") 
    end_time = datetime.strptime(end_time.strftime("%H:%M:%S:%f"),"%H:%M:%S:%f")                                                                           
    
    if accuracy == 1:
        print("The time for model ", i+1, "to get accuracy 1 was", end_time - start_time)
        
    accuracies.append(accuracy)
    


Model 1 : 
Epoch [100/1000], Loss: 0.7814
Epoch [200/1000], Loss: 0.7697
Epoch [300/1000], Loss: 0.7596
Epoch [400/1000], Loss: 0.7510
Epoch [500/1000], Loss: 0.7438
Epoch [600/1000], Loss: 0.7378
Epoch [700/1000], Loss: 0.7327
Epoch [800/1000], Loss: 0.7285
Epoch [900/1000], Loss: 0.7249
Epoch [1000/1000], Loss: 0.7218
Model 2 : 
Epoch [100/1000], Loss: 0.2183
Epoch [200/1000], Loss: 0.2036
Epoch [300/1000], Loss: 0.1915
Epoch [400/1000], Loss: 0.1806
Epoch [500/1000], Loss: 0.1708
Epoch [600/1000], Loss: 0.1619
Epoch [700/1000], Loss: 0.1537
Epoch [800/1000], Loss: 0.1462
Epoch [900/1000], Loss: 0.1394
Epoch [1000/1000], Loss: 0.1329
The time for model  2 to get accuracy 1 was 0:00:00.608999
Model 3 : 
Epoch [100/1000], Loss: 0.4841
Epoch [200/1000], Loss: 0.4659
Epoch [300/1000], Loss: 0.4504
Epoch [400/1000], Loss: 0.4378
Epoch [500/1000], Loss: 0.4278
Epoch [600/1000], Loss: 0.4199
Epoch [700/1000], Loss: 0.4138
Epoch [800/1000], Loss: 0.4088
Epoch [900/1000], Loss: 0.4049
Epoch [

In [108]:
for accuracy in accuracies:
    print("Accuracy: {:.2f}%".format(accuracy.item()*100))

Accuracy: 62.50%
Accuracy: 100.00%
Accuracy: 62.50%


In [109]:
# Now we display the weights for the best model
best_model = models[accuracies.index(max(accuracies))]
print("The weights of the best model are: ", best_model[0].weight)

The weights of the best model are:  Parameter containing:
tensor([[ 0.5160,  0.0646],
        [-0.4942,  0.4318],
        [-0.6409,  0.0192],
        [ 0.1635, -0.5800],
        [-0.4655,  0.4045],
        [-0.1862, -0.4563],
        [-0.2012, -0.3554],
        [-0.0696, -0.4528],
        [-0.7029,  0.5946],
        [-0.2251,  0.3844],
        [-0.0258, -0.0346],
        [-0.1491,  0.6518],
        [ 0.4032, -0.2926],
        [ 0.4797,  0.5426],
        [ 0.4292, -0.3435],
        [-0.0076, -0.5901],
        [ 0.2310, -0.4164],
        [ 0.3410, -0.3926],
        [-0.4776, -0.1955],
        [ 0.6263,  0.6529],
        [ 0.4816, -0.5375],
        [ 0.6405,  0.5761],
        [ 0.3820,  0.1943],
        [-0.1223, -0.4320],
        [ 0.2894,  0.3067],
        [ 0.5735, -0.0145],
        [ 0.0650,  0.4184],
        [ 0.7086, -0.4989],
        [-0.3272,  0.4257],
        [ 0.4282, -0.6286],
        [-0.6567, -0.1736],
        [ 0.2152,  0.3790],
        [ 0.1376, -0.2591],
        [-0.2453, 