# **Neural Net per XOR**

L'operazione di `XOR` è un'operazione bitwise non lineare.

| x1 | x2 | x1 XOR x2 |
|----|----|-----------|
| 0  | 0  | 0         |
| 0  | 1  | 1         |
| 1  | 0  | 1         |
| 1  | 1  | 0         |

Una neural net è adatta ad imparare questo tipo di task data la non linearità del problema


## **Dataset**

In [1]:
import torch

In [2]:
x_train_tensor = torch.tensor(
    [[0,0],[0,1],[1,0],[1,1]]
).float()

y_train_tensor = torch.tensor(
    [0,1,1,0]
).view(4,1).float()

x_train_tensor.shape, y_train_tensor.shape

(torch.Size([4, 2]), torch.Size([4, 1]))

In [3]:
x_val_tensor = torch.clone(x_train_tensor)
y_val_tensor = torch.clone(y_train_tensor)

x_val_tensor.shape, y_val_tensor.shape

(torch.Size([4, 2]), torch.Size([4, 1]))

## **Neural Net con PyTorch**
Il modulo `nn` contiene tutte le operazioni e i tools per addestrare reti neurali di qualsiasi dimensione e per qualsiasi task

* `nn.Module` --> classe utile alla definizione della rete neurale e alle loss. La classe contiene i pesi della rete, mantiene i gradienti ed esegue la backpropagation
* `optim` --> sotto-module che contiene una vasta scelta di optimizer (e.g. SGD, Adam, etc)



In [4]:
import torch.nn as nn
import torch.optim as optim

Definizione rete neurale (MLP):
* input: N esempi di dimensione 2 ([x1, x2]) - dim: Nx2
* hidden layers: un Linear layer con 2 neuroni in input e 2 neuroni in output -- dim: 2x2
* output layer: un Linear layer con 2 neuroni in input e 1 in output -- dim: 2x1

In [5]:
class Net(nn.Module):

  def __init__(self, input_dim: int = 2, output_dim: int = 1) -> None:

    super().__init__()
    hidden_out = 2
    self.hidden = nn.Linear(
        in_features=input_dim, out_features=hidden_out
    )
    self.output = nn.Linear(
        in_features=hidden_out, out_features=output_dim
    )
    self.activation = nn.Sigmoid()

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    # x shape --> (N, 2)
    #   - N --> batch size
    #   - 2 --> [x1, x2]
    x = self.hidden(x) # [Nx2]
    x = self.activation(x)
    logits = self.output(x) # [Nx1]
    return logits

In [6]:
model = Net(2, 1)

Proviamo se Net funziona correttamente con diversi tipi e dimensioni di input

In [7]:
batch_size = 2
x = torch.rand(size=(batch_size, 2)) # e.g. [[0.5, 0.2]]
print("Input data info:")
print(f"\t- input dim: {x.shape}")
print(f"\t- input data: \n{x}")

Input data info:
	- input dim: torch.Size([2, 2])
	- input data: 
tensor([[0.0533, 0.8607],
        [0.0653, 0.5055]])


In [8]:
# forward pass
logits = model(x)
print("Logits data info:")
print(f"\t- logits dim: {logits.shape}")
print(f"\t- input data: \n{logits}")

Logits data info:
	- logits dim: torch.Size([2, 1])
	- input data: 
tensor([[-0.7958],
        [-0.8132]], grad_fn=<AddmmBackward0>)


Proviamo sui dati di train e val dello XOR

In [9]:
print("Input data info:")
print(f"\t- input dim: {x_train_tensor.shape}")
print(f"\t- input data: \n{x_train_tensor}")

Input data info:
	- input dim: torch.Size([4, 2])
	- input data: 
tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])


In [10]:
# forward pass
logits = model(x_train_tensor)
print("Logits data info:")
print(f"\t- logits dim: {logits.shape}")
print(f"\t- input data: \n{logits}")

Logits data info:
	- logits dim: torch.Size([4, 1])
	- input data: 
tensor([[-0.8362],
        [-0.7877],
        [-0.8664],
        [-0.8168]], grad_fn=<AddmmBackward0>)


### **Training Setup**

1) Optimizer: useremo la [Stochastic Gradient Descent](https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31) con learning rate a 0.01

2) Loss: Mean Squared Error: MSE = $\frac{1}{n} \Sigma_{i=1}^n({y}-\hat{y})^2$

In [11]:
model = Net(2, 1)
learning_rate = 0.01
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction="mean")

### **Training Loop**

In [12]:
model.train()
for epoch in range(1):
  # gradienti di ogni peso in ogni layer a zero
  optimizer.zero_grad()
  # Check dei gradienti
  print(f"Check gradienti dopo zero_grad:")
  print(f"\t- hidden layer: {model.hidden.weight.grad}")
  print(f"\t- output layer: {model.output.weight.grad}")
  # forward pass
  logits = model(x_train_tensor)
  print(f"Logits: {logits}")
  # loss computation
  loss = criterion(logits, y_train_tensor)
  print(f"Loss: {loss:.4f}")
  # calcolo dei gradienti -- backprop
  loss.backward()
  # Check dei gradienti
  print(f"Check gradienti dopo backpropagation:")
  print(f"\t- hidden layer: {model.hidden.weight.grad}")
  print(f"\t- output layer: {model.output.weight.grad}")
  # step dell'optimizer
  optimizer.step()

Check gradienti dopo zero_grad:
	- hidden layer: None
	- output layer: None
Logits: tensor([[-0.3461],
        [-0.2486],
        [-0.3004],
        [-0.2030]], grad_fn=<AddmmBackward0>)
Loss: 0.8527
Check gradienti dopo backpropagation:
	- hidden layer: tensor([[-0.1160, -0.1118],
        [ 0.0230,  0.0221]])
	- output layer: tensor([[-0.7678, -0.7830]])


In [None]:
# ripartiamo da random weights
model = Net(2, 1)
learning_rate = 0.01
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction="mean")
n_epochs = 40000

In [None]:
for epoch in range(n_epochs):
  model.train()
  # forward pass
  logits = model(x_train_tensor)
  # loss computation
  loss = criterion(logits, y_train_tensor)
  # calcolo dei gradienti -- backprop
  loss.backward()
  # step dell'optimizer
  optimizer.step()
  # gradienti di ogni peso in ogni layer a zero
  optimizer.zero_grad()
  # log loss + validation
  if (epoch % int(0.05*n_epochs)) == 0:
    # validation
    model.eval()
    y_preds = model(x_val_tensor)
    val_loss = criterion(y_preds, y_val_tensor)
    print(f"epoch [{epoch}/{n_epochs}], train_loss: {loss:.3f} - val_loss: {val_loss:.3f}")

epoch [0/40000], train_loss: 0.669 - val_loss: 0.646
epoch [2000/40000], train_loss: 0.251 - val_loss: 0.251
epoch [4000/40000], train_loss: 0.251 - val_loss: 0.251
epoch [6000/40000], train_loss: 0.250 - val_loss: 0.250
epoch [8000/40000], train_loss: 0.250 - val_loss: 0.250
epoch [10000/40000], train_loss: 0.250 - val_loss: 0.250
epoch [12000/40000], train_loss: 0.250 - val_loss: 0.250
epoch [14000/40000], train_loss: 0.250 - val_loss: 0.250
epoch [16000/40000], train_loss: 0.250 - val_loss: 0.250
epoch [18000/40000], train_loss: 0.249 - val_loss: 0.249
epoch [20000/40000], train_loss: 0.249 - val_loss: 0.249
epoch [22000/40000], train_loss: 0.248 - val_loss: 0.248
epoch [24000/40000], train_loss: 0.247 - val_loss: 0.247
epoch [26000/40000], train_loss: 0.245 - val_loss: 0.245
epoch [28000/40000], train_loss: 0.239 - val_loss: 0.239
epoch [30000/40000], train_loss: 0.227 - val_loss: 0.227
epoch [32000/40000], train_loss: 0.202 - val_loss: 0.202
epoch [34000/40000], train_loss: 0.161 

### Inference

In [None]:
model.train()
print(model.training)
model.eval()
print(model.training)

True
False


In [17]:
input_data = [1., 1.] # target 1
x = torch.tensor([input_data], requires_grad=False)
logit = model(x)
y_pred = 1 if logit[0] > 0.5 else 0
print(f"Logit: {logit}")
print(f"[x1, x2] = [{int(input_data[0])}, {int(input_data[1])}]")
print(f"x1 XOR x2: {y_pred}")

Logit: tensor([[-0.1783]], grad_fn=<AddmmBackward0>)
[x1, x2] = [1, 1]
x1 XOR x2: 0


In [15]:
with torch.no_grad():
  input_data = [1., 1.] # target 1
  x = torch.tensor([input_data], requires_grad=False)
  logit = model(x)
  y_pred = 1 if logit[0] > 0.5 else 0
  print(f"Logit: {logit.item():.4f}")
  print(f"[x1, x2] = [{int(input_data[0])}, {int(input_data[1])}]")
  print(f"x1 XOR x2: {y_pred}")

Logit: -0.1783
[x1, x2] = [1, 1]
x1 XOR x2: 0
