So I wanted to understand about how neural networks work for the upcoming data mining exam (in the next 8 hours).
I think the best way to understand neural networks is to build one.

This notebook is based on:

- The lecture slides on neural networks from Data Mining course, from textbook _Data Mining for Business Intelligence_ (Shmueli, Patel & Bruce).
- The lecture slides on neural networks from Machine Learning course, from textbook _Machine Learning_ (T. Mitchell, McGraw Hill, 1997)
- The homework worksheet from Data Mining course.

In [1]:
from pylab import *

# Constructing the Network

So, __neural network__ is a network formed of neurons connected into a network of layers.
A neuron has input and outputs. There are weights associated to each connection. So let's first construct a neuron.

In [2]:
class Neuron:
    def __init__(self, value=None):
        self.value = value
        self.inputs = []
        self.outputs = []

A connection class is used to hold the weights between two nodes.

In [3]:
class Connection:
    def __init__(self, weight):
        self.weight = weight

Let's construct our network based on the exercise sheet.

In [4]:
fat    = Neuron()
salt   = Neuron()
output = Neuron()

layers = [
    [fat, salt],
    [Neuron(), Neuron(), Neuron()],
    [output],
]

The `connect()` function connects between two neurons. The `connect_layer()` function connects between two layers.

In [5]:
def connect(a, b, weight):
    conn = Connection(weight)
    a.outputs.append((b, conn))
    b.inputs.append((a, conn))

def connect_layer(a, b, weights):
    for j in range(len(b)):
        connect(Neuron(1.0), b[j], weights[j][0])
        for i in range(len(a)):
            connect(a[i], b[j], weights[j][i + 1])

Let's connect the input layer to the hidden layer...

In [6]:
connect_layer(layers[0], layers[1], [
    [-0.2, 0.02, 0.03],
    [0.3, 0.01, -0.01],
    [0.1, -0.01, 0.01],
])

...then on to the output layer.

In [7]:
connect_layer(layers[1], layers[2], [
    [0.015, 0.015, 0.03, 0.01],
])

# Forward Propagation

First, let's put in the first input data:

In [8]:
fat.value = 0.2
salt.value = 0.9

To compute the value of a non-input neuron, we first take the weighted sum of each inputs.

In [9]:
def weighted_sum(inputs):
    return sum(input.value * connection.weight
        for (input, connection) in inputs)

In [10]:
weighted_sum(layers[1][0].inputs)

-0.169

Then we pass it through a "logistics" function $g(s)$.
For neural network, it is common to use a _sigmoid_ function $\sigma$, which is defined as:

$$ g(s) = \sigma(s) = \frac{1}{1 + e^{-s}} $$

In [11]:
def g(s):
    return 1 / (1 + math.exp(-s))

In [12]:
g(weighted_sum(layers[1][0].inputs))

0.4578502721432993

So, here's how to compute the value of a neuron with inputs:

In [13]:
def compute(neuron):
    neuron.value = g(weighted_sum(neuron.inputs))

We compute the neuron from the hidden layer into the output layer.

In [14]:
def forward_propagation():
    for i in range(1, len(layers)):
        for neuron in layers[i]:
            compute(neuron)

In [15]:
forward_propagation()

Let's see how our network looks like.

In [16]:
def print_network():
    for layer in layers:
        print ', '.join('%.3f' % neuron.value for neuron in layer)

print_network()

0.200, 0.900
0.458, 0.573, 0.527
0.511


# Backpropagation

The network says "0.511", we call that "output" ($o$). The actual "target" ($t$) result is "1", so we need to adjust the weights so that the network perform better.
First, we calculate the error terms, then we update the weight based on these error terms.

Note that these error terms are only valid for sigmoid units!

## Error term for output neuron

$$ \delta = o(1-o)(t-o) $$

In [17]:
def error_output(neuron, target):
    output = neuron.value
    return output * (1 - output) * (target - output)

error_output(output, 1.0)

0.1221706509801848

Let's create a dictionary holding the error terms.

In [18]:
error_terms = { }
error_terms[id(output)] = error_output(output, 1.0)

## Error term for hidden neuron

$$ \delta = o(1-o)\sum_{k \in \text{outputs}}{w_k\delta_k} $$

In [19]:
def error_hidden(neuron, error_terms):
    output = neuron.value
    return (output * (1 - output) *
        sum(connection.weight * error_terms[id(node)]
            for (node, connection) in neuron.outputs))

error_hidden(layers[1][-1], error_terms)

0.0003045540855178085

## Recap: calculating the error terms

We go backwards from output layer and calculate the error term for each neuron.

In [20]:
def calculate_error_terms(expected_outputs):
    error_terms = {}
    output_neurons = layers[-1]
    for i in range(len(output_neurons)):
        neuron = output_neurons[i]
        error_terms[id(neuron)] = error_output(neuron, expected_outputs[i])
    for k in range(-2, -len(layers), -1):
        for neuron in layers[k]:
            error_terms[id(neuron)] = error_hidden(neuron, error_terms)
    return error_terms

error_terms = calculate_error_terms([1.0])

Let's visualize them!

In [21]:
def print_network_with_errors(error_terms):
    def neuron_repr(neuron):
        weight = '%.3f' % neuron.value
        error_term = error_terms.get(id(neuron))
        if error_term:
            return weight + '(%f)' % error_term
        else:
            return weight
    for layer in layers:
        print ', '.join(neuron_repr(neuron) for neuron in layer)

print_network_with_errors(error_terms)

0.200, 0.900
0.458(0.000455), 0.573(0.000897), 0.527(0.000305)
0.511(0.122171)


## Adjusting weight

$$ \Delta w_{i, j} = \eta \delta_j x_i  $$

There is a variable $\eta$ representing the learning rate. This is usually a small number like 0.1. But for our purpose we'll learn very quickly with $\eta = 0.5$.

In [22]:
learning_rate = 0.5

def adjust_weight(neuron, error_terms):
    if id(neuron) not in error_terms: return
    error = error_terms[id(neuron)]
    for (input, connection) in neuron.inputs:
        connection.weight += learning_rate * error * input.value

def adjust_weights(error_terms):
    for layer in layers:
        for neuron in layer:
            adjust_weight(neuron, error_terms)

adjust_weights(error_terms)

Let's see the new weights!

In [23]:
def print_network_with_weights():
    def neuron_repr(neuron):
        return '%.3f[%s]' % (
            neuron.value,
            ','.join('%.3f' % connection.weight
                         for (_, connection) in neuron.inputs))
    for layer in layers:
        print ', '.join(neuron_repr(neuron) for neuron in layer)

print_network_with_weights()

0.200[], 0.900[]
0.458[-0.200,0.020,0.030], 0.573[0.300,0.010,-0.010], 0.527[0.100,-0.010,0.010]
0.511[0.076,0.043,0.065,0.042]


Let's also perform forward propagation one more time to see the updated output value.

In [24]:
forward_propagation()
output.value

0.5387254506541997

As you can see, the output changed from $0.51$ to $0.53$, gradually learning the pattern.

## Recap

To recap, here's the backpropagation algorithm:

In [25]:
def backpropagation(expected_outputs):
    error_terms = calculate_error_terms(expected_outputs)
    adjust_weights(error_terms)

# Repeat for each data

In [26]:
def train(fat_score, salt_score, acceptance):
    fat.value = fat_score
    salt.value = salt_score
    forward_propagation()
    backpropagation([acceptance])

train(0.1, 0.1, 0.0)
print_network_with_weights()

0.100[], 0.100[]
0.451[-0.200,0.020,0.030], 0.575[0.299,0.010,-0.010], 0.525[0.099,-0.010,0.010]
0.539[0.009,0.013,0.027,0.007]


In [27]:
train(0.2, 0.4, 0.0)
print_network_with_weights()

0.200[], 0.400[]
0.454[-0.201,0.020,0.030], 0.574[0.299,0.010,-0.010], 0.525[0.099,-0.010,0.010]
0.508[-0.054,-0.016,-0.010,-0.026]


# Stay hungry stay foolish

Let's train it even more, like 2000 times. But not too much lest the neural network overfits.

__Note__: We are using _case updating_, where we update the weights after each input data.
There is also _batch updating_ where error terms are first aggregated before updating the weights.

In [28]:
for i in range(2000):
    train(0.2, 0.9, 1.0)
    train(0.1, 0.1, 0.0)
    train(0.2, 0.4, 0.0)

print_network_with_weights()

0.200[], 0.400[]
0.390[0.612,0.060,-2.679], 0.673[2.739,0.271,-5.171], 0.619[2.252,0.214,-4.518]
0.044[4.673,-2.449,-5.671,-4.809]


# Validation

Let's see how well the network works now.

In [29]:
def acceptance(fat_score, salt_score):
    fat.value = fat_score
    salt.value = salt_score
    forward_propagation()
    return output.value

In [30]:
print acceptance(0.2, 0.9)
print acceptance(0.1, 0.1)
print acceptance(0.2, 0.4)

0.945776394993
0.00239254370444
0.043855782356


As you can see, after we trained the model several thousand times, the value becomes very close to what we expected. We can use thresholding to turn this into True/False value.