## A.I. Assignment 5

## Learning Goals

By the end of this lab, you should be able to:
* Get more familiar with tensors in pytorch 
* Create a simple multilayer perceptron model with pytorch
* Visualise the parameters


### Task

Build a fully connected feed forward network that adds two bits. Determine the a propper achitecture for this network (what database you use for this problem? how many layers? how many neurons on each layer? what is the activation function? what is the loss function? etc)

Create at least 3 such networks and compare their performance (how accurate they are?, how farst they are trained to get at 1 accuracy?)

Display for the best one the weights for each layer.


In [5]:
import torch
import torch.nn as nn
from collections import OrderedDict


In [52]:
# your code here
# For this problem of adding two bits, we can treat it as a binary classification problem where the input consists of two bits
# (binary digits) and the output is the sum of these two bits. We'll create three fully connected feedforward neural networks 
# and compare their performance.

# Hidden Layer:The hidden layer(s) of a neural network are intermediary layers between the input and output layers.
# Each neuron in the hidden layer receives input from the previous layer, applies a transformation (determined by its weights and
# activation function), and passes the result to the next layer.
# The hidden layer is responsible for learning and extracting features from the input data, transforming it into a form that is useful for making predictions.
# In the provided architecture, there is one hidden layer with 32 neurons. This means there are 32 units or nodes in this hidden layer, each performing its own weighted sum of inputs and applying an activation function to produce an output.
# Output Layer:The output layer is the final layer of the neural network, responsible for producing the network's output.

# Each neuron in the output layer typically represents a different class or prediction value.
# The number of neurons in the output layer depends on the nature of the problem. For example, in a binary classification problem like the one described (adding two bits), there are two possible outputs (0 or 1), so the output layer has 2 neurons.
# The output layer applies its own transformation to the information it receives from the previous layer(s), often using a different activation function than the hidden layers.
# In the provided architecture, the output layer has 2 neurons, each producing an output corresponding to one of the possible binary sums of the two input bits.
# In summary, the main difference between the hidden layer and the output layer lies in their roles within the neural network. The hidden layer processes and transforms the input data to extract useful features, while the output layer produces the final prediction or classification based on the processed information from the hidden layers.
model1 = nn.Sequential(OrderedDict([
    ('hidden_net', nn.Linear(2,32)), #2(input layer)-(32-32)hidden-2(output layer)-search pic of neural network
    ('hidden_act', nn.Sigmoid()),  
    ('output_net', nn.Linear(32,2)),
    ('output_act', nn.Sigmoid())  
]))
model2 = nn.Sequential(OrderedDict([
    ('hidden_net', nn.Linear(2,8)),
    ('hidden_act', nn.ReLU()),
    ('output_net', nn.Linear(8,2)),
    ('output_act', nn.Sigmoid())
]))
model3 = nn.Sequential(OrderedDict([
    ('hidden_net', nn.Linear(2,16)),
    ('hidden_act', nn.Sigmoid()),
    ('output_net', nn.Linear(16,2)),
    ('output_act', nn.Sigmoid())
]))

In [7]:
print(model1)
print(model2)
print(model3)

Sequential(
  (hidden_net): Linear(in_features=2, out_features=32, bias=True)
  (hidden_act): Sigmoid()
  (output_net): Linear(in_features=32, out_features=2, bias=True)
  (output_act): Sigmoid()
)
Sequential(
  (hidden_net): Linear(in_features=2, out_features=8, bias=True)
  (hidden_act): ReLU()
  (output_net): Linear(in_features=8, out_features=2, bias=True)
  (output_act): Sigmoid()
)
Sequential(
  (hidden_net): Linear(in_features=2, out_features=16, bias=True)
  (hidden_act): Sigmoid()
  (output_net): Linear(in_features=16, out_features=2, bias=True)
  (output_act): Sigmoid()
)


In [8]:
# your code here
data_in = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float) #add them, we get 0, 1, 1, 2
print(data_in)

tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])


In [9]:
# your code here
data_target = torch.tensor([[0, 0], [0, 1], [0, 1], [1, 0]], dtype=torch.float) #this is 0,1,1,2 in base 2 (1 0)=2=1+1 in base 10
print(data_target)

tensor([[0., 0.],
        [0., 1.],
        [0., 1.],
        [1., 0.]])


In [10]:
# your code here
criterion1 = nn.MSELoss() #mean squared error loss function
# MSE loss measures the mean squared difference between the predicted values and the target values. It's commonly used for 
#regression problems.
optimizer1 = torch.optim.Adam(model1.parameters(), lr=0.01)
#Adam is an optimization algorithm that adapts the learning rate for each parameter individually.
#Here, model1.parameters() provides the parameters (weights and biases) of the neural network model to be optimized, and
#lr=0.01 sets the learning rate for the optimizer to 0.01.-In traditional optimization algorithms like stochastic gradient 
# descent (SGD), a single learning rate is applied to update all parameters of the model. However, in many cases, different
# parameters might require different learning rates for effective training. Some parameters may need larger updates, while 
# others may need smaller updates. it starts from lr=0.01 and it can change it on each param
criterion2 = nn.CrossEntropyLoss()
optimizer2 = torch.optim.SGD(model2.parameters(), lr=0.01, momentum=0.9)
#stochastic gradient descent
criterion3 = nn.L1Loss()
optimizer3 = torch.optim.SGD(model3.parameters(), lr=0.01, momentum=0.9)

In [53]:
# your code here
# Train the model
def train(model, inputs, outputs, criterion, optimizer):
    for epoch in range(10000):
       
        optimizer.zero_grad()
        loss = criterion(model(inputs), outputs)
        loss.backward()
        optimizer.step()
        outputs = model(data_in)
        predicted = (outputs >=0.5).float()
        accuracy = (predicted == data_target).float().mean()
        if accuracy == 1:
            print(f'Training Accuracy: {accuracy.item()*100} in {epoch+1} epochs, loss: {loss}')
            break

In [54]:
# your code here
for model in [model1, model2, model3]:
    if model == model1:
        criterion = criterion1
        optimizer = optimizer1
    if model == model2:
        criterion = criterion2
        optimizer = optimizer2
    if model == model3:
        criterion = criterion3
        optimizer = optimizer3
    train(model, data_in, data_target, criterion, optimizer)
    outputs = model(data_in)
    predicted = (outputs >=0.5).float()
    print(predicted)
    accuracy = (predicted == data_target).float().mean()
    print(f'Training Accuracy: {accuracy.item()*100}')
# model(data_in): This syntax calls the model as if it were a function, passing the input data data_in to it. In PyTorch, this
# invokes the forward method of the model, which performs the forward pass computation through the neural network layers.
# The input data data_in propagates through the network, and the model generates predictions (output activations) for each input
# sample.

# outputs: This variable stores the output predictions generated by the neural network model for the input data data_in. The 
# outputs tensor typically contains real-valued numbers representing probabilities or scores for different classes or 
# categories. In the provided context, it likely contains the model's predictions for the binary sums of the input pairs
# of bits.
# predicted == data_target-This comparison operation compares each element of the predicted tensor (which contains the binary predictions generated by 
# the model) with the corresponding element of the data_target tensor (which contains the ground truth or true labels).
# This operation results in a new boolean tensor where each element is True if the prediction matches the target label and False
# otherwise.

# .float(): This method converts the boolean tensor obtained from the previous step into a float tensor. This conversion is
# necessary to perform arithmetic operations with these values.

# .mean(): This method calculates the mean (average) of all the elements in the tensor. Since the boolean values (True and False)
# are treated as 1.0 and 0.0 respectively after the conversion to float, taking the mean effectively computes the accuracy of the
# predictions.

# Specifically, the mean operation calculates the fraction of correct predictions out of the total number of predictions made. 
# If the predictions match the targets, the corresponding elements in the boolean tensor are 1.0, contributing to the numerator
# of the mean calculation. Otherwise, they are 0.0, contributing to the denominator.

tensor([[0., 1.],
        [0., 1.],
        [0., 1.],
        [0., 1.]])
Training Accuracy: 62.5
tensor([[0., 1.],
        [0., 1.],
        [0., 1.],
        [0., 1.]])
Training Accuracy: 62.5
tensor([[1., 0.],
        [1., 0.],
        [1., 0.],
        [1., 0.]])
Training Accuracy: 37.5


In [31]:
# your code here
for name, param in model1.named_parameters():
    print(name, param.data)

hidden_net.weight tensor([[-3.8360, -4.1430],
        [-3.7472, -3.7722],
        [-4.8880, -4.9714],
        [-1.9762, -2.6197],
        [-1.4855,  3.0303],
        [ 3.8673,  3.6824],
        [-2.4565, -2.5139],
        [ 3.7762,  3.5859],
        [-3.9446, -3.7539],
        [ 2.5841,  2.1582],
        [ 3.8317, -4.7734],
        [-1.0257, -0.7952],
        [-4.8148, -4.8131],
        [-4.7856,  4.0958],
        [-4.5492,  5.5350],
        [-3.4252, -3.2786],
        [-3.8055, -4.0219],
        [-3.3417, -3.3775],
        [ 4.2742,  4.3602],
        [-2.6796, -1.1075],
        [-3.7367, -3.6491],
        [-3.5608, -3.4065],
        [ 4.2890,  4.1696],
        [-2.1985, -2.6443],
        [-1.9615, -2.3075],
        [ 2.6682,  2.9772],
        [-4.2441, -4.2478],
        [-3.4053, -3.2821],
        [-3.7131, -3.5721],
        [-2.6542, -2.7966],
        [-3.2136, -3.1327],
        [ 5.5110, -4.7183]])
hidden_net.bias tensor([ 1.6048,  1.4834,  2.0299,  3.0857,  0.8781, -5.5808,  1.3373