<a href="https://github.com/EmmanuelADAM/IntelligenceArtificiellePython/blob/master/summerSchool/NN_1_learn_AND.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learning AND with PyTorch
## Illustration of the Importance of Bias

|a|b|a and b|
|:-:|:-:|:-:|
|0|0|0|
|0|1|0|
|1|0|0|
|1|1|1|

*Theoretically, learning AND with a single-layer neural network is not possible.*

Indeed, the layer consists of only 1 neuron (1 output), its inputs are the values $a$ and $b$.<br>
$w_a$ and $w_b$ being the weights assigned to these values, we need to verify:
 - $f(0) \searrow 0$ --> ok
 - $f(w_b) \searrow 0$, $\Rightarrow w_b \leq 0$
 - $f(w_a) \searrow 0$ $\Rightarrow w_a \leq 0$
 - $f(w_a + w_b) \nearrow 1$ $\Rightarrow w_a + w_b \geq 1$ --> conflict with the previous lines

*Let's verify this...*

---
**Import Libraries**

In [41]:
# import the torch library
import torch
# import the Neural Network class
import torch.nn as nn
# import the optimizers class
import torch.optim as optim
import numpy as np

---

## Define Inputs and Expected Outputs

In [42]:
# a and b are the only inputs
inputs = np.array([[0,0],[0,1],[1,0],[1,1]], float)

# single output
outputs = np.array([[0],[0],[0],[1]], float)

In [43]:
# transform inputs and outputs into tensors
tensor_X = torch.tensor(inputs, dtype=torch.float32)
tensor_y = torch.tensor(outputs, dtype=torch.float32)

---
## 1. Version without BIAS

### 1.1. Choose the Network Model
***Here the layers are sequential***

In [44]:
model = nn.Sequential()

### 1.2. Define the Network Architecture
- Here, a single layer consisting of 1 output neuron,
- 2 input neurons (for each value),
- using the sigmoid as the activation function

In [45]:
# add a linear layer with 2 inputs and 1 output, without bias
model.add_module('single layer', nn.Linear(2, 1, bias=False))
# add a sigmoid activation function
model.add_module('sigmoid', nn.Sigmoid())

---

### 1.3. Error Correction
Here, we specify that
 - the error correction algorithm is 'Adam',
 - the calculated error is the mean of squared errors. <br>
 $E = \Sigma_{i=1 \dots n}{(y^I_i - y_i)} / n$, $y^I_i$ ideal expected output, $y_i$ calculated output

In [46]:
# use the mean squared error function
criterion = nn.MSELoss()
# use the Adam optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

---

### 1.4. Train the Network
- Here we do not display the training steps,
- and we run 2000 training cycles (wait between 4 to 6 minutes!)

In [47]:
# launch training for 2000 iterations
model.train()  # set the model to training mode

for epoch in range(2000):
    # calculate outputs and loss
    computed_outputs = model(tensor_X)
    loss = criterion(computed_outputs, tensor_y)

    # backpropagation and optimization
    # reset gradients to zero
    optimizer.zero_grad()
    # calculate gradients
    loss.backward()
    # update weights
    optimizer.step()

    # for demo, display the loss every 10 epochs
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

Epoch [100/100], Loss: 0.2500
Epoch [200/100], Loss: 0.2500
Epoch [300/100], Loss: 0.2500
Epoch [400/100], Loss: 0.2500
Epoch [500/100], Loss: 0.2500
Epoch [600/100], Loss: 0.2500
Epoch [700/100], Loss: 0.2500
Epoch [800/100], Loss: 0.2500
Epoch [900/100], Loss: 0.2500
Epoch [1000/100], Loss: 0.2500
Epoch [1100/100], Loss: 0.2500
Epoch [1200/100], Loss: 0.2500
Epoch [1300/100], Loss: 0.2500
Epoch [1400/100], Loss: 0.2500
Epoch [1500/100], Loss: 0.2500
Epoch [1600/100], Loss: 0.2500
Epoch [1700/100], Loss: 0.2500
Epoch [1800/100], Loss: 0.2500
Epoch [1900/100], Loss: 0.2500
Epoch [2000/100], Loss: 0.2500


---

### 1.5. Verify the Network
Optional step, generally ***we test the network on other examples***.
- Here, we don't have any. So we ask it to calculate the output for each example in the training set

In [48]:
model.eval()  # set the model to evaluation mode

Sequential(
  (single layer): Linear(in_features=2, out_features=1, bias=False)
  (sigmoid): Sigmoid()
)

In [49]:
# calculate predictions from inputs
# use torch.no_grad() to not modify gradients
with torch.no_grad():
    predictions = model(tensor_X)
    print("Predictions:")
    for i, pred in enumerate(predictions):
        print(f"Input: {inputs[i]}, Predicted: {pred.item():.2f}, Expected: {outputs[i][0]:.2f}")

# display the final loss
final_loss = criterion(predictions, tensor_y)
print(f'Final Loss: {final_loss.item():.4f}')

Predictions:
Input: [0. 0.], Predicted: 0.50, Expected: 0.00
Input: [0. 1.], Predicted: 0.50, Expected: 0.00
Input: [1. 0.], Predicted: 0.50, Expected: 0.00
Input: [1. 1.], Predicted: 0.50, Expected: 1.00
Final Loss: 0.2500


Big errors here! As expected without the bias...

In [50]:
# display the model weights
print("Weights:")
for name, param in model.named_parameters():
    print(f"{name}: {param.data.numpy()}")

Weights:
single layer.weight: [[-1.1913059e-08 -5.5206133e-08]]


---
## 2. Version WITH BIAS

The table is then

|bias|a|b|a and b|
|:-:|:-:|:-:|:-:|
|1|0|0|0|
|1|0|1|0|
|1|1|0|0|
|1|1|1|1|

*Theoretically, learning AND with a single-layer neural network is then possible.*

Indeed, the layer consists of only 1 neuron (1 output), its inputs are the values `bias`, `a`, and `b`.
`wbias`, `wa`, and `wb` being the weights assigned to these values, we need to verify:
 - $f(w_{bias}) \searrow 0$ --> ok
 - $f(w_{bias} + w_b) \searrow 0$
 - $f(w_{bias} + w_a) \searrow 0$
 - $f(w_{bias} + w_a + w_b) \nearrow 1$ --> possible if $w_a \leq -w_{bias}$ and $w_b \leq -w_{bias}$, if $w_a \geq 1$ and $w_b \geq 1$ and if $w_{bias} + w_a + w_b \geq 1$

*Let's verify this...*

---

### 2.1. Define the Network Architecture
- Here, a single layer consisting of 1 output neuron,
- 3 input neurons (2 containing the values + **a Bias** (always emitting the signal 1)),
- using the sigmoid as the activation function

In [51]:
model = nn.Sequential()
# add a linear layer with 2 inputs and 1 output, with bias (default)
model.add_module('single layer', nn.Linear(2, 1))
# add a sigmoid activation function
model.add_module('sigmoid', nn.Sigmoid())

# use the mean squared error function
criterion = nn.MSELoss()
# use the Adam optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)

---

### 2.2. Train the Network
We keep the same optimizer and the same loss calculation.

In [52]:
# launch training for 1000 iterations
model.train()  # set the model to training mode

for epoch in range(2000):
    # calculate outputs and loss
    computed_outputs = model(tensor_X)
    loss = criterion(computed_outputs, tensor_y)

    # backpropagation and optimization
    # reset gradients to zero
    optimizer.zero_grad()
    # calculate gradients
    loss.backward()
    # update weights
    optimizer.step()

    # for demo, display the loss every 10 epochs
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/100], Loss: {loss.item():.4f}')

Epoch [100/100], Loss: 0.1402
Epoch [200/100], Loss: 0.0958
Epoch [300/100], Loss: 0.0706
Epoch [400/100], Loss: 0.0540
Epoch [500/100], Loss: 0.0423
Epoch [600/100], Loss: 0.0338
Epoch [700/100], Loss: 0.0274
Epoch [800/100], Loss: 0.0226
Epoch [900/100], Loss: 0.0188
Epoch [1000/100], Loss: 0.0159
Epoch [1100/100], Loss: 0.0136
Epoch [1200/100], Loss: 0.0117
Epoch [1300/100], Loss: 0.0101
Epoch [1400/100], Loss: 0.0089
Epoch [1500/100], Loss: 0.0078
Epoch [1600/100], Loss: 0.0069
Epoch [1700/100], Loss: 0.0062
Epoch [1800/100], Loss: 0.0055
Epoch [1900/100], Loss: 0.0050
Epoch [2000/100], Loss: 0.0045


---
### 2.2. Verify the Network
- We calculate the output for each example in the training set

In [53]:
model.eval()  # set the model to evaluation mode
# calculate predictions from inputs
# use torch.no_grad() to not modify gradients
with torch.no_grad():
    predictions = model(tensor_X)
    print("Predictions:")
    for i, pred in enumerate(predictions):
        print(f"Input: {inputs[i]}, Predicted: {pred.item():.2f}, Expected: {outputs[i][0]:.2f}")

# display the final loss
final_loss = criterion(predictions, tensor_y)
print(f'Final Loss: {final_loss.item():.4f}')

Predictions:
Input: [0. 0.], Predicted: 0.00, Expected: 0.00
Input: [0. 1.], Predicted: 0.08, Expected: 0.00
Input: [1. 0.], Predicted: 0.07, Expected: 0.00
Input: [1. 1.], Predicted: 0.92, Expected: 1.00
Final Loss: 0.0045


**Almost perfect learning!!!**
- -> concrete demonstration of the effect of `Bias`!!

In [54]:
# display the model weights
print("Weights:")
for name, param in model.named_parameters():
    print(f"{name}: {param.data.numpy()}")

Weights:
single layer.weight: [[4.9143333 4.978275 ]]
single layer.bias: [-7.4425335]


---
### Usage
Let's test other values

In [55]:
tests = np.array([[0.5,0.2], [0.9, 0.7], [0.1, 0.9]], float)

In [56]:
# let's see how the model predicts the results for new inputs
# use torch.no_grad() to not modify gradients
with torch.no_grad(): predictions = model(torch.tensor(tests, dtype=torch.float32))

In [60]:
def print_predictions(inputs, outputs):
    for i in range(0, len(inputs)):
        print(f"If A is true at {inputs[i][0]*100:.2f}% and B true at {inputs[i][1]*100:.2f}%, having A and B true at the same time is possible at {outputs[i][0]*100:.2f}%")

In [61]:
print_predictions(tests, predictions)

If A is true at 50.00% and B true at 20.00%, having A and B true at the same time is possible at 1.82%
If A is true at 90.00% and B true at 70.00%, having A and B true at the same time is possible at 61.42%
If A is true at 10.00% and B true at 90.00%, having A and B true at the same time is possible at 7.79%


Neural network can **interpolate outputs** from unlearned entries !!

----