# Neural Networks #

In this notebook we are going to take a look neural networks and construct one using python 3

Our goal is to make something that takes in some input and detects a pattern to produce an output.  The problem here is that while the human brain is exceptional at pattern detection this is no small task for a computer.

This notebook is was built on a notebook from machine learning mastery: https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/ . We adapted it to out dataset but we used the Iris dataset.


In [15]:
import numpy as np
import pandas as pd
import matplotlib as mpl
from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp

### Read in and clean data ###

The dataset we will be using in this notebook is the iris dataset.  It is a classic example that you should recognize from the lecture on decision trees.  It dates back to a paper from 1936 and contains one class that is linearly separable from the other two, while the other two are not linearly separable from each other.  This means that a single layer perceptron will NOT be able to categorize it properly, and we need a multi-layer perceptron.

We need to first clean and normalize our data

In [2]:
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

#This is our dataset.  It is a classic example that you should recognise from the lecture on Decision Trees.
#It dates back to a paper from 1936 and contains one class that is linearly separable from the other two,
#while those two classes are not linearly separable from each other.
irises = load_csv('bezdekIris.csv')

In [12]:
# Put columns on a right format
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())


def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

In [13]:
for i in range(len(irises[0]) - 1):
    str_column_to_float(irises, i)
# convert class column to integers
str_column_to_int(irises, len(irises[0]) - 1)

AttributeError: 'float' object has no attribute 'strip'

In [5]:
def cross_validation_split(dataset, folds_number):
    dataset_split = list()
    dataset_copy = list(dataset)
    fold_size = int(len(dataset) / folds_number)
    for i in range(folds_number):
        fold = list()
        while len(fold) < fold_size:
            index = randrange(len(dataset_copy))
            fold.append(dataset_copy.pop(index))
        dataset_split.append(fold)
    return dataset_split

def dataset_minmax(dataset):
    minmax = list()
    stats = [[min(column), max(column)] for column in zip(*dataset)]
    return stats

# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)-1):
            row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])


## Feed forward and Back Propagate##

These functions are used to feed training data to the network, and adjust weights using stochastic gradient descent.
This version uses a sigmoid function as the transfer function as it is easy to derive, something that is necessary for gradient descent.

Why the backpropagation does exactly what it does comes down to a bit of calculus.  You can find a derivation of the math behind backpropagation by Ryan Harris on youtube if you're curious, but the backbone of it is that weights are changed according to how much they affect the error. 

In [6]:
#Calculate the weights * inputs + bias
def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights)-1):
        activation += weights[i] * inputs[i]
    return activation
#this returns the sigmoid of x!
def transfer(x):
    return 1 / (1 + np.exp(-x))

#The derivative of the sigmoid function
def deriv(x):
    return x*(1-x)

### Feed forward ###
Feeds inputs through the network and returns the output layer.

In [7]:
def feed_forward(network, inputs):
    for layer in network:
        new_inputs = []
        for neuron in layer:
            activation = activate(neuron['weights'], inputs)
            neuron['output'] = transfer(activation)
            new_inputs.append(neuron['output'])
        inputs = new_inputs
    return inputs

### Backward propagation ###
Propagate errors backwards through the network, one layer at time.

In [8]:
def backward_propagate(network, expected):
    for i in reversed(range(len(network))):
        layer = network[i]
        errors = list()
        
        # Derivative is diffrent for hidden layers and output layers
        if i != len(network) - 1:
            for j in range(len(layer)):
                error = 0.0
                for neuron in network[i + 1]:
                    error += (neuron['weights'][j] * neuron['delta'])
                errors.append(error)
        else:
            for j in range(len(layer)):
                neuron = layer[j]
                errors.append(expected[j] - neuron['output'])
        for j in range(len(layer)):
            neuron = layer[j]
            neuron['delta'] = errors[j] * deriv(neuron['output'])

### Adjusting weights ###
Update all the weights based on the propagated error.

In [9]:
def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]
        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]
        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

## Train network##

This is where it all comes together.  This function feeds a single training input in, backpropagates the error and updates the weights accordingly.  

In [10]:
def train_network(network, train, l_rate, n_epoch, expected_outputs):
    for epoch in range(n_epoch):
        sum_error = 0
        for row in train:
            outputs = feed_forward(network, row)
            expected = [0 for i in range(expected_outputs)]
            expected[row[-1]] = 1
            sum_error += sum([(expected[i] - outputs[i]) ** 2 for i in range(len(expected))])
            backward_propagate(network, expected)
            update_weights(network, row, l_rate)
        if epoch % 20 == 0:
            print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

# Initialize and test out algorithm #

Now that all the functionality is implemented, we can see how the back propagation algorithm does.

We initialize the network with one hidden layer as we need to learn to categorize non-linearly-separable data.  We are not limited to one layer, but for our purposes one hidden layer performs just fine.
The weights we start with are small random numbers, and the last weight in the "weights" list represents the bias of the node that the weight list belongs to.

To gain more intuition for the backpropagation algorithm, try experimenting with adding more nodes, hidden layers and changing the n_epoch (number of epochs: the number of times we iterate through the training data) and l_rate (learning rate) to see their impact on accuracy and speed.

The code outputs current epoch number, learning rate and errors for every 20th epoch circle.

In [14]:
def initialize_network(input_number, hidden_number, output_number):
    network = list()
    hidden_layer = [{'weights': [random() for i in range(input_number + 1)]} for i in range(hidden_number)]
    network.append(hidden_layer)
    output_layer = [{'weights': [random() for i in range(hidden_number + 1)]} for i in range(output_number)]
    network.append(output_layer)
    return network

def test(dataset, n_folds, input_number, output_number):
    folds = cross_validation_split(dataset, n_folds)
    scores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold)
        train_set = sum(train_set, [])
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        predicted = bp(train_set, test_set, 0.3, 500, 5)
        actual = [row[-1] for row in fold]
        accuracy = measure_accuracy(actual, predicted)
        scores.append(accuracy)
    return scores

def bp(train, test, l_rate, n_epoch, n_hidden):
    input_number = len(train[0]) - 1
    output_number = len(set([row[-1] for row in train]))
    network = initialize_network(input_number, n_hidden, output_number)
    train_network(network, train, l_rate, n_epoch, output_number)
    predictions = list()
    for row in test:
        outputs = feed_forward(network, row)
        prediction = predict(network, row)
        predictions.append(prediction)
    return (predictions)

def predict(network, row):
    outputs = feed_forward(network, row)
    return outputs.index(max(outputs))

def measure_accuracy(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100

seed(1)
scores = test(irises, 5, 4, 3)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores) / float(len(scores))))

>epoch=0, lrate=0.300, error=100.072
>epoch=20, lrate=0.300, error=84.545
>epoch=40, lrate=0.300, error=44.385
>epoch=60, lrate=0.300, error=40.170
>epoch=80, lrate=0.300, error=19.624
>epoch=100, lrate=0.300, error=16.094
>epoch=120, lrate=0.300, error=15.406
>epoch=140, lrate=0.300, error=14.901
>epoch=160, lrate=0.300, error=13.317
>epoch=180, lrate=0.300, error=8.370
>epoch=200, lrate=0.300, error=7.308
>epoch=220, lrate=0.300, error=6.712
>epoch=240, lrate=0.300, error=14.484
>epoch=260, lrate=0.300, error=7.518
>epoch=280, lrate=0.300, error=8.001
>epoch=300, lrate=0.300, error=6.512
>epoch=320, lrate=0.300, error=6.411
>epoch=340, lrate=0.300, error=2.649
>epoch=360, lrate=0.300, error=12.856
>epoch=380, lrate=0.300, error=4.122
>epoch=400, lrate=0.300, error=5.786
>epoch=420, lrate=0.300, error=2.320
>epoch=440, lrate=0.300, error=2.389
>epoch=460, lrate=0.300, error=3.153
>epoch=480, lrate=0.300, error=7.198
>epoch=0, lrate=0.300, error=110.900
>epoch=20, lrate=0.300, error=44