Logs
- [2024/04/15]   
  You do not need to restart this notebook when updating the scratch library

In [1]:
import numpy as np
import tqdm

from typing import List

from scratch.linear_algebra import LinearAlgebra as la
from scratch.linear_algebra import Vector
from scratch.gradient_descent import GradientDescent as gd

In [2]:
%load_ext autoreload
%autoreload 2 

An artificial neural networks (or neural network for short) is a predictive  
model motivated by the way the brain operates.

## Perceptrons

A perceptron is a mathematical model of a single neuron with $n$ binary inputs.  
An example of 3-binary inputs:
- [0, 0, 0]
- [0, 0, 1]
- [0, 1, 0]
- [0, 1, 1]
- [1, 0, 0]
- ...
- [1, 1, 1]

Let us define a perceptron using Python function

In [3]:
def step_function(x: float) -> float:
  return 1.0 if x >= 0 else 0.0

def perceptron_output(weights: Vector, bias: float, x: Vector) -> float:
  """Returns 1 if the perceptron 'fires', 0 if not"""
  calculation = la.dot(weights, x) + bias
  return step_function(calculation)

Interpretation of perceptron formula above can be seen as a hyperplane of points
$\mathbf{x}$
$$
  \mathbf{w} \cdot \mathbf{x} + b = 0
$$
where $\mathbf{w}$ is the weight (vector), $\mathbf{x}$ is the input (vector),   
and $b$ is the bias term (scalar). Then perceptron is simply distinguishing   
between the half-spaces separated by the hyperplane above.

With the perceptron above, we can build a function that can solve AND gate problem

In [4]:
and_weights = [2., 2]
and_bias = -3.

assert perceptron_output(and_weights, and_bias, [1, 1]) == 1
assert perceptron_output(and_weights, and_bias, [0, 1]) == 0
assert perceptron_output(and_weights, and_bias, [1, 0]) == 0
assert perceptron_output(and_weights, and_bias, [0, 0]) == 0

We can also solve OR gate and NOT gate with the perceptron

In [5]:
or_weights = [2., 2]
or_bias = -1.

assert perceptron_output(or_weights, or_bias, [1, 1]) == 1
assert perceptron_output(or_weights, or_bias, [0, 1]) == 1
assert perceptron_output(or_weights, or_bias, [1, 0]) == 1
assert perceptron_output(or_weights, or_bias, [0, 0]) == 0

In [6]:
not_weights = [-2.]
not_bias = 1.

assert perceptron_output(not_weights, not_bias, [0]) == 1
assert perceptron_output(not_weights, not_bias, [1]) == 0

But, the perceptron above cannot solve XOR gate no matter how we set weights   
and bias. Because of that we need more complicated neural networks.

## Feed-Forward Neural Networks

We want to create a neuron model by using perceptron and a nonlinear function.  
There are two options to be used as a nonlinear function, because we want to
represent activation and no-activation of neuron:
- with `step_function` (but there is a problem that this function is not continuous)
- with `sigmoid` (this is an approximation of step function but this function
  is continuous)

To make our model of neuron simple, we concatenate 1 to `inputs` vector and put   
the bias into weights vector as a
$$
  \text{neuron}(\mathbf{x};\mathbf{w}, b)
  = 
  \begin{bmatrix}
    w_1 & w_2 & \ldots & w_n & b
  \end{bmatrix}
  \begin{bmatrix}
    x_1 \\ x_2 \\ \vdots \\ x_n \\ 1
  \end{bmatrix}
$$

We concatenate another function to make our model become a nonlinear model   
using sigmoid function

In [7]:
def sigmoid(t: float) -> float:
  return 1 / (1 + np.exp(-t))

In [8]:
def neuron_output(weights: Vector, inputs: Vector) -> float:
  # weights includes the bias term, inputs includes a a
  return sigmoid(la.dot(weights, inputs))

We will represent a neural network as a list (layers) of lists (neurons) of  
vectors (weights).

In [9]:
def feed_forward(neural_network: List[List[Vector]],
                  input_vector: Vector) -> List[Vector]:
  """Feeds the input vector through the neural network. Returns the outputs of
  all layers (not just the last one) """
  outputs: List[Vector] = []

  for layer in neural_network:
    input_with_bias = input_vector + [1]              # Add a constant.
    output = [neuron_output(neuron, input_with_bias)  # Compute the output
              for neuron in layer]                    # for each neuron.
    outputs.append(output)                            # Add to results.

    # Then the input to the next layer is the output of this one
    input_vector = output
  
  return outputs

In [10]:
xor_network = [                    # hidden layer
                [[20., 20, -30],   # - `and` neuron
                 [20., 20, -10]],  # - `or` neuron
                                   # output layer
                [[-60., 60, -30]]  # - `2nd input but not 1st input` neuron
              ]

# feed_forward returns the outputs of all layers, so the [-1] gets the
# final output, and the [0] gets the value out of the resulting vector
assert 0.000 < feed_forward(xor_network, [0, 0])[-1][0] < 0.001
assert 0.999 < feed_forward(xor_network, [1, 0])[-1][0] < 1.000
assert 0.999 < feed_forward(xor_network, [0, 1])[-1][0] < 1.000
assert 0.000 < feed_forward(xor_network, [1, 1])[-1][0] < 0.001

The above feed-forward neural network can be understood by the following diagram

<img src="./img-resources/neural-nets-feed-forward.png" width=600>

## Backpropagation

In the above feed-forward neural network for solving XOR gate, we defined   
by hand neural network parameters. Now we want to define neural networks such   
that we can automatically get the neural networks parameters.   
This process is called _training_. To perform training, we need a training set.  
For our XOR gate, the training set is

| `input`  | `output` |
|----------|----------|
| `[0, 0]` | `[1]`    |
| `[0, 1]` | `[1]`    |
| `[1, 0]` | `[1]`    |
| `[1, 1]` | `[0]`    |

During training, we adjust weights by the following algorithm
1. Run `feed_forward` on an input vector to produce the outputs of all the   
   neurons in the network.
2. We know the target output, so we can compute a _loss_ that's the (half)  
   of the sum of the squared erros. [In textbook, the author do not say _half_.   
   If we do not put the half, the formula for variable `output_deltas` in    
   `sqerror_gradients` must include a factor 2]
3. Compute the gradient of this loss as a function of the output neuron's weights
4. "Propagate" the gradients and errors backward to compute the gradients with  
   respect to the hidden neuron's weights.
5. Take a gradient descent step.

Please take a look to [`neural_nets.drawio`](./img-resources/neural-nets.drawio) 
for the derivation of the formula

In [20]:
def sqerror_gradients(network: List[List[Vector]], input_vector: Vector,
                      target_vector: Vector) -> List[List[Vector]]:
  """Given a neural network, an input vector, and a target vector,
  make a prediction and compute the gradient of the squared error loss with 
  respect to the neuron weights"""

  # forward pass
  hidden_outputs, outputs = feed_forward(network, input_vector)

  # gradients with respect to output neuron pre-activation outputs
  output_deltas = [output * (1 - output) * (output - target)
                    for output, target in zip(outputs, target_vector)]
  
  # print(f"len(output_deltas): {len(output_deltas)}")
  # print(f"len(network[-1]): {len(network[-1])}")
  # gradients with respect to output neuron weights
  output_grads = [[output_deltas[i] * hidden_output
                    for hidden_output in hidden_outputs + [1]]
                      for i, output_neuron in enumerate(network[-1])]

  # gradients with respect to hidden neuron pre-activation outputs
  hidden_deltas = [hidden_output * (1 - hidden_output) * 
                    la.dot(output_deltas, [n[i] for n in network[-1]])
                    for i, hidden_output in enumerate(hidden_outputs)]
  
  # gradients with respect to hidden neuron weights
  hidden_grads = [[hidden_deltas[i] * input_ for input_ in input_vector + [1]]
                  for i, hidden_neuron in enumerate(network[0])]

  return [hidden_grads, output_grads]

In [22]:
## for checking the dimension of output_deltas and network[-1]
# seed = 24_04_25
# rng = np.random.default_rng(seed)

# # training data
# xs = [[0., 0], [0., 1], [1., 0], [1., 1]]
# ys = [[0.], [1.], [1.], [0.]]

# # start with random weights
# network = [                                           # hidden layer: 2 inputs -> 2 outputs
#             [[rng.random() for _ in range(2 + 1)],    # - 1st hidden neuron
#              [rng.random() for _ in range(2 + 1)]],   # - 2nd hidden neuron
#                                                       # output layer: 2 inputs -> 1 output
#             [[rng.random() for _ in range(2 + 1)]]    # - 1st output neuron
# ]

# sqerror_gradients(network, xs[0], ys[0])

Let us try to learn the XOR network. We will start by generating the training  
data and initializing our neural network with random weights

In [13]:
seed = 24_04_25
rng = np.random.default_rng(seed)

# training data
xs = [[0., 0], [0., 1], [1., 0], [1., 1]]
ys = [[0.], [1.], [1.], [0.]]

# start with random weights
network = [                                           # hidden layer: 2 inputs -> 2 outputs
            [[rng.random() for _ in range(2 + 1)],    # - 1st hidden neuron
             [rng.random() for _ in range(2 + 1)]],   # - 2nd hidden neuron
                                                      # output layer: 2 inputs -> 1 output
            [[rng.random() for _ in range(2 + 1)]]    # - 1st output neuron
]
init_network = network.copy()

learning_rate = 1.0     # if you use loss function 0.5*|| out - target ||^2
# learning_rate = 0.5   # if you use loss function || out - target ||^2

for epoch in tqdm.trange(20_000, desc="neural net for xor"):
  for x, y in zip(xs, ys):
    gradients = sqerror_gradients(network, x, y)

    # Take a gradient step for each neuron in each layer
    network = [[gd.gradient_step(neuron, grad, -learning_rate)
                for neuron, grad in zip(layer, layer_grad)]
                  for layer, layer_grad in zip(network, gradients)]



neural net for xor:   0%|          | 0/20000 [00:00<?, ?it/s]

neural net for xor: 100%|██████████| 20000/20000 [00:01<00:00, 17035.30it/s]


In [14]:
# check that it learned XOR
assert feed_forward(network, [0, 0])[-1][0] < 0.01
assert feed_forward(network, [0, 1])[-1][0] > 0.99
assert feed_forward(network, [1, 0])[-1][0] > 0.99
assert feed_forward(network, [1, 1])[-1][0] < 0.01

In [15]:
for x1, x2 in xs:
  print(feed_forward(network, [x1, x2]))

[[0.027046326868200243, 0.9582945652249892], [0.008066210168484208]]
[[2.828055316890695e-05, 0.04268833549811431], [0.9907825583757307]]
[[0.9580605062851266, 0.9999295783071573], [0.992432080159848]]
[[0.02271307293959756, 0.9649808000792041], [0.007170531277044988]]


In [16]:
display(init_network)
network

[[[0.8748572217914731, 0.2333282645901471, 0.29831543557166096],
  [0.5298359741305657, 0.6033836721509632, 0.3441243903330605]],
 [[0.6883007269360694, 0.7843383033775261, 0.5413104665657402]]]

[[[6.711468244221999, -6.890522612714037, -3.5827852630319876],
  [6.426415021705044, -6.2447270565192365, 3.134523759418311]],
 [[10.883927950893137, -10.685196688788292, 5.133223033529587]]]

## Example: Fizz Buzz