# **Neural Net per XOR**

L'operazione di `XOR` è un'operazione bitwise non lineare.

| x1 | x2 | x1 XOR x2 |
|----|----|-----------|
| 0  | 0  | 0         |
| 0  | 1  | 1         |
| 1  | 0  | 1         |
| 1  | 1  | 0         |

Una neural net è adatta ad imparare questo tipo di task data la non linearità del problema


## **Dataset**

In [None]:
import torch

In [None]:
x_train_tensor = torch.tensor(
    [[0,0],[0,1],[1,0],[1,1]]
).float()

y_train_tensor = torch.tensor(
    [0,1,1,0]
).view(4,1).float()

x_train_tensor.shape, y_train_tensor.shape

(torch.Size([4, 2]), torch.Size([4, 1]))

In [None]:
x_val_tensor = torch.clone(x_train_tensor)
y_val_tensor = torch.clone(y_train_tensor)

x_val_tensor.shape, y_val_tensor.shape

(torch.Size([4, 2]), torch.Size([4, 1]))

## **Neural Net con PyTorch**
Il modulo `nn` contiene tutte le operazioni e i tools per addestrare reti neurali di qualsiasi dimensione e per qualsiasi task

* `nn.Module` --> classe utile alla definizione della rete neurale e alle loss. La classe contiene i pesi della rete, mantiene i gradienti ed esegue la backpropagation
* `optim` --> sotto-module che contiene una vasta scelta di optimizer (e.g. SGD, Adam, etc)



In [None]:
import torch.nn as nn
import torch.optim as optim

Definizione rete neurale (MLP):
* input: N esempi di dimensione 2 ([x1, x2]) - dim: Nx2
* hidden layers: un Linear layer con 2 neuroni in input e 2 neuroni in output -- dim: 2x2
* output layer: un Linear layer con 2 neuroni in input e 1 in output -- dim: 2x1

In [None]:
class Net(nn.Module):

  def __init__(self, input_dim: int = 2, output_dim: int = 1) -> None:

    super().__init__()
    hidden_out = 2
    self.hidden = nn.Linear(
        in_features=input_dim, out_features=hidden_out
    )
    self.output = nn.Linear(
        in_features=hidden_out, out_features=output_dim
    )
    self.activation = nn.Sigmoid()

  def forward(self, x: torch.Tensor) -> torch.Tensor:
    # x shape --> (N, 2)
    #   - N --> batch size
    #   - 2 --> [x1, x2]
    x = self.hidden(x) # [Nx2]
    x = self.activation(x)
    logits = self.output(x) # [Nx1]
    # logits = self.activation(x)
    return logits

### **Training Setup**

1) Optimizer: useremo la [Stochastic Gradient Descent](https://towardsdatascience.com/stochastic-gradient-descent-clearly-explained-53d239905d31) con learning rate a 0.01

2) Loss: Mean Squared Error: MSE = $\frac{1}{n} \Sigma_{i=1}^n({y}-\hat{y})^2$

In [None]:
model = Net(2, 1)
device = "cuda:0" # device type
model.to(device) # move model to GPU
learning_rate = 0.01
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction="mean")
n_epochs = 40000

### **Training Loop**

In [None]:
for epoch in range(n_epochs):
  model.train()
  # forward pass
  logits = model(x_train_tensor.to(device))
  # loss computation
  loss = criterion(logits, y_train_tensor.to(device))
  # calcolo dei gradienti -- backprop
  loss.backward()
  # step dell'optimizer
  optimizer.step()
  # gradienti di ogni peso in ogni layer a zero
  optimizer.zero_grad()
  # log loss + validation
  if (epoch % int(0.05*n_epochs)) == 0:
    # validation
    model.eval()
    y_preds = model(x_val_tensor.to(device))
    val_loss = criterion(y_preds, y_val_tensor.to(device))
    print(f"epoch [{epoch}/{n_epochs}], train_loss: {loss:.3f} - val_loss: {val_loss:.3f}")

epoch [0/50000], train_loss: 0.344 - val_loss: 0.337
epoch [2500/50000], train_loss: 0.251 - val_loss: 0.251
epoch [5000/50000], train_loss: 0.250 - val_loss: 0.250
epoch [7500/50000], train_loss: 0.250 - val_loss: 0.250
epoch [10000/50000], train_loss: 0.249 - val_loss: 0.249
epoch [12500/50000], train_loss: 0.249 - val_loss: 0.249
epoch [15000/50000], train_loss: 0.248 - val_loss: 0.248
epoch [17500/50000], train_loss: 0.245 - val_loss: 0.245
epoch [20000/50000], train_loss: 0.241 - val_loss: 0.241
epoch [22500/50000], train_loss: 0.232 - val_loss: 0.232
epoch [25000/50000], train_loss: 0.215 - val_loss: 0.215
epoch [27500/50000], train_loss: 0.186 - val_loss: 0.186
epoch [30000/50000], train_loss: 0.137 - val_loss: 0.137
epoch [32500/50000], train_loss: 0.068 - val_loss: 0.068
epoch [35000/50000], train_loss: 0.016 - val_loss: 0.016
epoch [37500/50000], train_loss: 0.002 - val_loss: 0.002
epoch [40000/50000], train_loss: 0.000 - val_loss: 0.000
epoch [42500/50000], train_loss: 0.000

In [None]:
model

Net(
  (hidden): Linear(in_features=2, out_features=2, bias=True)
  (output): Linear(in_features=2, out_features=1, bias=True)
  (activation): Sigmoid()
)

### Inference

In [None]:
with torch.no_grad():
  input_data = [1., 1.] # target 1
  x = torch.tensor([input_data], requires_grad=False)
  logit = model(x.to(device))
  y_pred = 1 if logit[0] > 0.5 else 0
  print(f"Logit: {logit}")
  print(f"[x1, x2] = [{int(input_data[0])}, {int(input_data[1])}]")
  print(f"x1 XOR x2: {y_pred}")

tensor([[9.2268e-05]], device='cuda:0', grad_fn=<AddmmBackward0>)
[x1, x2] = [1, 1]
x1 XOR x2: 0


tensor([[0.9497]], device='cuda:0', grad_fn=<SigmoidBackward0>)