# Lab 2: Backpropagation for NXOR
---

Grupo 3 \
Alexandre Rodrigues: 75545 \
Tiago Granja: 79845 \
Diogo Silva: 79828

Import libraries

In [1]:
import numpy as np

---
1.

To calculate the weights update we first need to feedforward our network. With our 2:1 architecture we have 6 weights, 4 in the hidden layer, $w_{11}^{[1]}, w_{21}^{[1]}, w_{12}^{[1]}, w_{22}^{[1]}$, and 2 in the output layer, $w_1^{[2]}, w_2^{[2]}$, and 3 biases, $b_1^{[1]}, b_2^{[1]}$, in the first layer and $b_1^{[2]}$ in the second. With this we can write our inner weighted sums and outputs as follows:

$$
\begin{split}
z_1^{[1]} = x_1 w_{11}^{[1]} + x_2 w_{21}^{[1]} + b_1^{[1]} \\
y_1^{[1]} = S(z_1^{[1]}) \\
\\
z_2^{[1]} = x_1 w_{12}^{[1]} + x_2 w_{22}^{[1]} + b_2^{[1]} \\
y_2^{[1]} = S(z_2^{[1]}) \\
\\
z_1^{[2]} = y_1^{[1]} w_1^{[2]} + y_2^{[1]} w_2^{[2]} + b_1^{[2]} \\
y = S(z_1^{[2]})
\end{split}
$$

Where S is the sigmoid function.

#### Backpropagation

For backpropagation we use the squared error as our error where $E = \frac{1}{2}(y - t)^2$, where $t$ is our target output. With this we can start writing the expressions for the weight updates such that:

$$
\begin{split}
w_{ij}^{[k]} = w_{ij}^{[k]} - \upeta \frac{\partial E}{\partial w_{ij}^{[k]}}
\end{split}
$$

##### Outer layer

For the outer layer we can write:

$$
\begin{split}
\Delta w_{i}^{[2]} = \frac{\partial E}{\partial w_{i}^{[2]}} \\
\Delta w_{i}^{[2]} = \frac{\partial E}{\partial y}\frac{\partial y}{\partial z_1^{[2]}}\frac{\partial z_1^{[2]}}{\partial w_i^{[2]}} \\
\Delta w_{i}^{[2]} = (y - t) S'(z_1^{[2]})y_i^{[1]} \\
\end{split}
$$

Where the last term is equal to one when taking the partial derivative in order to $b_1^{[2]}$ and $S'$ is the derivative of $S$.

##### Inner (Hidden) layer

For the inner layer we can write:

$$
\begin{split}
\Delta w_{ij}^{[1]} = \frac{\partial E}{\partial w_{ij}^{[1]}} \\
\Delta w_{ij}^{[1]} = \frac{\partial E}{\partial y_j^{[1]}}\frac{\partial y_j^{[1]}}{\partial z_j^{[1]}}\frac{\partial z_j^{[1]}}{\partial w_{ij}^{[1]}} \\
\Delta w_{ij}^{[1]} = \frac{\partial E}{\partial y_j^{[1]}} S'(z_j^{[1]}) x_i
\end{split}
$$

Where:

$$
\begin{split}
\frac{\partial E}{\partial y_j^{[1]}} = \frac{\partial E}{\partial y}\frac{\partial y}{\partial z_1^{[2]}}\frac{\partial z_1^{[2]}}{\partial y_j^{[1]}} \\
\frac{\partial E}{\partial y_j^{[1]}} = (y - t) S'(z_1^{[2]}) w_j^{[2]}
\end{split}
$$

Concluding:

$$
\begin{split}
\Delta w_{ij}^{[1]} = (y - t) S'(z_1^{[2]}) w_j^{[2]} S'(z_j^{[1]}) x_i
\end{split}
$$

Where the last term is equal to one when taking the partial derivative in order to $b_1^2$ and $S'$ is the derivative of $S$.

---
2.

In [2]:
def sigmoid(x: float) -> float:
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x: float) -> float:
    return sigmoid(x) * (1 - sigmoid(x))

In [3]:
class BinaryTwoOneNetwork:

    def __init__(self, learning_rate: float):
        self._learning_rate = learning_rate
        
        self._inner_weights = np.random.uniform(0.0, 0.1, (2, 2))
        self._inner_biases = np.random.uniform(0.0, 0.1, 2)
        self._outer_weights = np.random.uniform(0.0, 0.1, 2)
        self._outer_bias = np.random.uniform(0.0, 0.1)

        self._inner_sums = np.empty(2)
        self._inner_outputs = np.empty(2)
    
    def _feed_forward(self, x1: float, x2:float) -> float:

        # Inner layer
        self._inner_sums[0] = self._inner_weights[0][0] * x1 + self._inner_weights[1][0] * x2 + self._inner_biases[0]
        self._inner_outputs[0] = sigmoid(self._inner_sums[0])
        self._inner_sums[1] = self._inner_weights[0][1] * x1 + self._inner_weights[1][1] * x2 + self._inner_biases[1]
        self._inner_outputs[1] = sigmoid(self._inner_sums[1])

        # Outer layer
        self._outer_sum = self._outer_weights[0] * self._inner_outputs[0] + self._outer_weights[1] * self._inner_outputs[1] + self._outer_bias
        self._output = sigmoid(self._outer_sum)

        return self._output

    def _backpropagate(self, x1: float, x2:float, target: float) -> None:
        error = (self._output - target)
        delta_output = error * sigmoid_derivative(self._outer_sum)
        
        # Inner layer
        for j in range(2):
            self._inner_weights[0][j] -= self._learning_rate * delta_output * self._outer_weights[j] * sigmoid_derivative(self._inner_sums[j]) * x1
            self._inner_weights[1][j] -= self._learning_rate * delta_output * self._outer_weights[j] * sigmoid_derivative(self._inner_sums[j]) * x2
            self._inner_biases[j] -= self._learning_rate * delta_output * self._outer_weights[j] * sigmoid_derivative(self._inner_sums[j])

        # Outer layer
        for i in range(2):
            self._outer_weights[i] -= self._learning_rate * delta_output * self._inner_outputs[i]
        self._outer_bias -= self._learning_rate * delta_output

    def predict(self, x1: float, x2: float) -> float:
        return round(self._feed_forward(x1, x2))

In [4]:
x = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

target = np.array([[1],
                     [0],
                     [0],
                     [1]])

network = BinaryTwoOneNetwork(0.5)
iterations = 10000
for i in range(iterations):
    for j in range(4):
        network._feed_forward(x[j][0], x[j][1])
        network._backpropagate(x[j][0], x[j][1], target[j][0])

for i in range(4):
    x1 = x[i][0]
    x2 = x[i][1]
    xnor = network.predict(x1, x2)
    print(f"x1 = {x1}, x2 = {x2}")
    print(f"Expected: {target[i][0]}")
    print(f"Got: {xnor}\n")

x1 = 0, x2 = 0
Expected: 1
Got: 1

x1 = 0, x2 = 1
Expected: 0
Got: 0

x1 = 1, x2 = 0
Expected: 0
Got: 0

x1 = 1, x2 = 1
Expected: 1
Got: 1



---
3.

In [5]:
print(f"hidden w11 = {network._inner_weights[0][0]}")
print(f"hidden w21 = {network._inner_weights[1][0]}")
print(f"hidden bias1 = {network._inner_biases[0]}")
print(f"hidden w12 = {network._inner_weights[0][1]}")
print(f"hidden w22 = {network._inner_weights[1][1]}")
print(f"hidden bias2 = {network._inner_biases[1]}")
print(f"outer w1 = {network._outer_weights[0]}")
print(f"outer w2 = {network._outer_weights[1]}")
print(f"outer bias = {network._outer_bias}")

hidden w11 = -4.705923271599746
hidden w21 = -4.70891917233781
hidden bias1 = 6.9992134963942965
hidden w12 = -6.544397381740643
hidden w22 = -6.565696896610717
hidden bias2 = 2.6719085088955197
outer w1 = -9.659639429413177
outer w2 = 9.775593015536492
outer bias = 4.573062147398275


The obtained weights are different from the last assignment, altough the weights for the first neuron match the ratio from the and perceptron, the others don't match to any of the previous calculated weights. The weights are different since the network try to approach the actual output to either 1 or 0 when the output is a result of a contiguos function in the interval 0 and 1. This means that even when the predicetd value rounded is correct, the network will keep on approaching values closer to the exact target output.