# Chapter 1: Using neural nets to recognize handwritten digits

Starting with definition of artificial neur*ons*, i.e. perceptrons and sigmoid neuron.

## Perceptrons

A perceptron is a simple neuron (a node in the neural network) that takes $n$ binary inputs and returns a binary output. It usually attaches weights to each input, and if the incoming signals, multiplied by their respective weights, are higher than a given threshold, the output signal is fired.


In [4]:
import numpy as np

class perceptron:
    def fire(self, inputs, weights, threshold):
        return 1 if np.sum(inputs*weights) > threshold else 0

In [2]:
my_neuron = perceptron()
my_neuron.fire(
    inputs = np.array([1, 1]), 
    weights = np.array([0.6, 0.5]), 
    threshold = 1
)

1

By stacking such neurons in multiple layers, the neural network can make increasingly subtle decisions.

For notational simplicity that will become clear later on, we use instead of a *threshold* a *bias* term, which describes something like "the difficulty of getting this neuron to fire."

In [3]:
class perceptron:
    def fire(self, inputs, weights, bias):
        return 1 if np.dot(inputs,weights) + bias > 0 else 0

In [4]:
my_new_neuron = perceptron()
my_new_neuron.fire(
    inputs = np.array([0, 0, 1]), 
    weights = np.array([2, 2, 6]), 
    bias = -5
)

1

Perceptrons can be configured to compute logical functions, such as "and", "or", or "nand". For example, here is a "nand" perceptron:

In [5]:
class nand_perceptron:
    weights = np.array([-2, -2])
    bias = 3
    def fire(self, inputs, weights = weights, bias = bias):
        return 1 if np.dot(inputs,weights) + bias > 0 else 0

In [6]:
nand_neuron = nand_perceptron()
nand_neuron.fire(inputs = np.array([1, 0]))

1

This is pretty cool, because NAND is universal for computation. That is, we can build up any computation out of NAND gates.

In [7]:
class classic_nand_gate:
    def fire(self, inputs):
        return False if inputs[0] and inputs[1] else True

In [8]:
my_nand_gate = classic_nand_gate()
my_nand_gate.fire(inputs = [True, True])

False

For example, we can build up bitwise addition out of NAND gates. And since we can build up NAND out of a perceptron, perceptrons are universal for computation too.

In [9]:
def add_bits(bit1, bit2, type_of_gate):
    
    if type_of_gate == "classic":
        gate = classic_nand_gate()
    elif type_of_gate == "perceptron":
        gate = nand_perceptron()
    
    gate0_in = [bit1, bit2]
    gate0_out = gate.fire(inputs = gate0_in)
    
    gate1_in = [bit1, gate0_out]
    gate1_out = gate.fire(inputs = gate1_in)
    
    gate2_in = [gate0_out, bit2]
    gate2_out = gate.fire(inputs = gate2_in)
    
    gate3_in = [gate1_out, gate2_out]
    gate3_out = gate.fire(inputs = gate3_in)
    
    gate4_in = [gate0_out, gate0_out]
    gate4_out = gate.fire(inputs = gate4_in)
    
    bit_sum = gate3_out
    carry_bit = gate4_out
    
    return str(int(carry_bit)) + str(int(bit_sum))

In [10]:
assert add_bits(1, 1, type_of_gate="classic") == '10'
assert add_bits(0, 1, type_of_gate="classic") == '01'
assert add_bits(1, 0, type_of_gate="classic") == '01'
assert add_bits(0, 0, type_of_gate="classic") == '00'

assert add_bits(1, 1, type_of_gate="perceptron") == '10'
assert add_bits(0, 1, type_of_gate="perceptron") == '01'
assert add_bits(1, 0, type_of_gate="perceptron") == '01'
assert add_bits(0, 0, type_of_gate="perceptron") == '00'

This isn't so exciting on its own. BUT, because perceptrons have weights and bias values that can be tuned to *configure* lower-level logical functions that can in turn be combined to configure higher-level computations, we should be able to expose an artificial neural network to an external source of data that will tune those weights and biases for us. We can then end up with a neural network that is trained to represent any arbitrary computation.

## Sigmoid neurons

If we want to be able to train our neural network by slightly tuning the weights and biases of its constituent neurons, we need small changes in tunings to cause small changes in outputs. Perceptrons make this hard, because they can only output 0 or 1. There's no middle ground. That's why we'll benefit from using *sigmoid neurons* instead.

The output of a sigmoid neuron is not 0 or 1, but $\sigma(w\cdot x + b)$, where $\sigma(z) = \frac{1}{1+e^{-z}}$

(In this case $\sigma()$ is an example of an *activation function* $f()$. We'll see other examples later on.)

In [11]:
import math

class sigmoid_neuron:
    def fire(self, inputs, weights, bias):
        return 1 / (1 + math.exp(-np.dot(inputs, weights) - bias))

In [12]:
my_sig_neuron = sigmoid_neuron()
my_sig_neuron.fire(inputs=np.array([1, 2, 3]), weights=np.array([1, 1, 1]), bias=-10)

0.01798620996209156

### Exercises

#### Sigmoid neurons simulating perceptrons, part I

Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, $c>0$. Show that the behaviour of the network doesn't change.

In [13]:
my_perceptron = perceptron()
inputs = np.array([1, 2, 3])
weights = np.array([3, 3, 4])
bias = -3

for positive_constant in range(1, 101):
    original = my_perceptron.fire(inputs = inputs, weights = weights, bias = bias)
    modified = my_perceptron.fire(inputs = inputs, weights = positive_constant*weights, bias = positive_constant*bias)
    assert original == modified

Since a perceptron network is made up of only perceptrons, we can show that the network's overall behaviour will never change so long as any individual perceptron's behaviour will never change.

Formula for a perceptron:

$1$ if $w\cdot x + b > 0$

$0$ if $w\cdot x + b <= 0$

Let's expand out the dot product and see whether it's possible to change the sign of $cw\cdot x + cb$ just by varying $c$.

$cw\cdot x + cb = c\sum_{j}w_{j}x_{j} + cb$

$cw\cdot x + cb = cw_{1}x_{1} + cw_{2}x_{2} + ... + cw_{j}x_{j} + cb$

$cw\cdot x + cb = c(w_{1}x_{1} + w_{2}x_{2} + ... + w_{j}x_{j} + b)$

Since $c$ is always a positive constant, the sign of the right-hand side will never change. Therefore, the behaviour of a neural network made up of perceptrons will never change.

#### Sigmoid neurons simulating perceptrons, part II

Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won't need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that $w\cdot x+b \neq 0$ for the input $x$ to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant $c>0$. Show that in the limit as $c\to\infty$ the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons. How can this fail when $w⋅x+b=0$ for one of the perceptrons?

So we're trying to show that this expression behaves exactly like a percetron as $c\to\infty$:

$\lim_{c\to\infty} \sigma(cw\cdot x + cb)$

$\lim_{c\to\infty} \sigma(c(w\cdot x + b))$

$\lim_{c\to\infty} \frac{1}{1 + e^{c(-w\cdot x - b))}}$

As $c\to\infty$, this goes to 0 if $-w\cdot x - b > 0$, and to 1 if $-w\cdot x - b < 0$, which is exactly the behaviour of a perceptron. (Technically, the perceptron outputs 0 if $w\cdot x + b < 0$ and 1 if $w\cdot x + b > 0$, but these statements are equivalent to the previous ones because $-1*1=-1$ and $-1*-1=1$.)

This fails when $w\cdot x+b=0$ because $\lim_{c\to\infty} \frac{1}{1 + e^{c*0}} = 0.5$, which is an impossible output for a perceptron.

## The architecture of neural networks

Neural networks which use the output from one layer as the input to the next are called *feedforward* neural networks, and they will be the focus of the book. (See also *recurrent* neural networks, which allow for feedback loops within the network.)

### Exercise

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01.

![](architecture_exercise.png)

Bitwise representations of the numbers from 0 to 9:

- 0: 0000
- 1: 0001
- 2: 0010
- 3: 0011
- 4: 0100
- 5: 0101
- 6: 0110
- 7: 0111
- 8: 1000
- 9: 1001

In [87]:
def new_output_layer(input_vector, weights_matrix, bias_vector):
    return(np.sum(input_vector*weights_matrix, axis=1) + bias_vector > 0)

weights_matrix = np.array([[0, 2, 0, 2, 0, 2, 0, 2, 0, 2],
                           [0, 0, 2, 2, 0, 0, 2, 2, 0, 0],
                           [0, 0, 0, 0, 2, 2, 2, 2, 0, 0],
                           [0, 0, 0, 0, 0, 0, 0, 0, 2, 2]])
bias_vector = np.array([-1, -1, -1, -1])

In [88]:
mappings = {
    0: {
        "input_vector": np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
        "output_vector": np.array([0, 0, 0, 0])
    },
    1: {
        "input_vector": np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0]),
        "output_vector": np.array([1, 0, 0, 0])
    },
    2: {
        "input_vector": np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0]),
        "output_vector": np.array([0, 1, 0, 0])
    },
    3: {
        "input_vector": np.array([0, 0, 0, 1, 0, 0, 0, 0, 0, 0]),
        "output_vector": np.array([1, 1, 0, 0])
    },
    4: {
        "input_vector": np.array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0]),
        "output_vector": np.array([0, 0, 1, 0])
    },
    5: {
        "input_vector": np.array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0]),
        "output_vector": np.array([1, 0, 1, 0])
    },
    6: {
        "input_vector": np.array([0, 0, 0, 0, 0, 0, 1, 0, 0, 0]),
        "output_vector": np.array([0, 1, 1, 0])
    },
    7: {
        "input_vector": np.array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0]),
        "output_vector": np.array([1, 1, 1, 0])
    },
    8: {
        "input_vector": np.array([0, 0, 0, 0, 0, 0, 0, 0, 1, 0]),
        "output_vector": np.array([0, 0, 0, 1])
    },
    9: {
        "input_vector": np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
        "output_vector": np.array([1, 0, 0, 1])
    }
}

for digit in range(10):
    output = new_output_layer(
        input_vector=mappings[digit]["input_vector"], 
        weights_matrix=weights_matrix, 
        bias_vector=bias_vector
    )
    expected_output = mappings[digit]["output_vector"]
    assert (output == expected_output).all()