<a href="https://colab.research.google.com/github/SummerLife/EmbeddedSystem/blob/master/MachineLearning/project/08_code_nn_from_scratch/coding_nn_with_bp_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Coding a Neural Network with Backpropagation from scratch

- Initialize Network
- Forward Propagate
- Back Propagate Error
- Train Network
- Predict
- Seeds Dataset Case Study

## 1. Initialize Network

In [19]:
from random import seed
from random import random

# Initialize a network
def initialize_network(n_inputs, n_hidden, n_outputs):
    network = list()
    hidden_layer = [{"weights": [random() for i in range(n_inputs + 1)]} for i in range(n_hidden)]
    network.append(hidden_layer)
    output_layer = [{"weights": [random() for i in range(n_hidden + 1)]} for i in range(n_outputs)]
    network.append(output_layer)
    return network

seed(1)
network = initialize_network(2,1,2)
for layer in network:
    print(layer)

[{'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}]
[{'weights': [0.2550690257394217, 0.49543508709194095]}, {'weights': [0.4494910647887381, 0.651592972722763]}]


## 2. Forward Propagate

We can break forward propagation down into three parts:

1. Neuron Activation
2. Neuron Transfer
3. Forward Propagation

### 2.1 Neuron Activation

`activation = sum(weight_i * input_i) + bias`

In [20]:
# Calculate neuron activation for an input
def activate(weights, inputs):
    activation = weights[-1]
    for i in range(len(weights) - 1):
        activation += weights[i] * inputs[i]
    return activation

### 2.2 Neuron Transfer

Once a neuron is activated, we need to transfer the activation to see what the neuron output actually is.

The sigmoid activation function looks like an S shape, it's also called the logistic function. It can take any input value and produce an number between 0 and 1 on an S-curve. It is also a function of which we can easily calculate the derivatice(slope) that we will need later when backpropagating error.

We can transfer an activation function using the sigmoed function as follows:

`output = 1 / (1 + e^(-activation))`

In [21]:
# Transfer neuron activation
from math import exp

def transfer(activation):
    return 1.0 / (1.0 + exp(-activation))

### 2.3 Forward Propagation

Forward propagating an input is straightforward.

We work through each layer of our network calculating the outputs for each neuron. All of the outputs from on layer become in puts to the neurons on the next layer.

In [22]:
# Forward propagate input to a network output
def forward_propagate(network, row):
    inputs = row
    for layer in network:
        new_inputs = []
        for neuron in layer:
            activation = activate(neuron['weights'], inputs)
            neuron['output'] = transfer(activation)
            new_inputs.append(neuron['output'])
        inputs = new_inputs
    return inputs

### 2.4 Test out the forward propagation of out network



In [23]:
row = [1, 0, None]
output = forward_propagate(network, row)
print(output)

[0.6629970129852887, 0.7253160725279748]


## 3. Back Propagate Error

Error is calculated between the expected outputs and the outputs forward propagated from the network. These errors are then propagated backward through the netwoek from the output layer to the hidden layer, assigning blame for the error and updating weights as they go.

1. Transfer Derivative
2. Error Backpropagation

### 3.1 Transfer Derivative

We are using the sigmoid transfer function, the derivative of which can be calculated as follows:

```
derivative = output * (1.0 - output)
```

Using function named `transfer_derivative()` to implements this equation.

In [24]:
# Calculate the derivative of an neuron output
def transfer_derivative(output):
    return output * (1.0 - output)

### 3.2 Error Backpropagation

The error for a given neuron can be calculated as follows:

`error = (expected - output) * transfer_derivative(output)`


The error signal for a neuron in the hidden layer is calculated as the weighted error of each neuron in the output layer. Think of the error traveling back along the weights of the output layer to the neurons in the hidden layer.

The back-propagated error signal is accumulated and then used to determine the error for the neuron in the hidden layer, as follows:

`error = (weight_k * error_j) * transfer_derivative(output)`

Where error_j is the error signal from the jth neuron in the output layer, weight_k is the weight that connects the kth neuron to the current neuron and output is the output for the current neuron.


In [25]:
# Backpropagate error and store in neurons
def backward_propagate_error(network, expected):
    for i in reversed(range(len(network))):
        layer = network[i]
        errors = list()

        if i != len(network) - 1:
            for j in range(len(layer)):
                error = 0.0
                for neuron in network[i + 1]:
                    error += (neuron['weights'][j] * neuron['delta'])
                    errors.append(error)
        else:
            # calculate the error signal from the neuron in the output layer
            for j in range(len(layer)):
                neuron = layer[j]
                errors.append(expected[j] - neuron['output'])

        for j in range(len(layer)):
            neuron = layer[j]
            neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

In [26]:
# test backpropagation of error
network = [[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614]}],
          [{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095]}, 
           {'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763]}]]
expected = [0, 1]

backward_propagate_error(network, expected)

for layer in network:
    print(layer)

[{'output': 0.7105668883115941, 'weights': [0.13436424411240122, 0.8474337369372327, 0.763774618976614], 'delta': -0.007668854370284511}]
[{'output': 0.6213859615555266, 'weights': [0.2550690257394217, 0.49543508709194095], 'delta': -0.14619064683582808}, {'output': 0.6573693455986976, 'weights': [0.4494910647887381, 0.651592972722763], 'delta': 0.0771723774346327}]


## 4. Train Network

This part is broken down into two sectinons:

1. Update Weights
2. Tran Network

### 4.1 update weights

Once errors are calculated for each neuron in the network via the back propagation method above, they can be used to update weights.

Network weights are updated as follows:

`weight = weight + learning_rate * error * input`

Where weight is a given weight, learning_rate is a parameter that you must specify, error is the error calculated by the backpropagation procedure for the neuron and input is the input value that caused the error.

The same procedure can be used for updating the bias weight, except there is no input term, or input is the fixed value of 1.0.

Remember that the input for the output layer is a collection of outputs from the hidden layer.

In [27]:
# Update network weights with error
def update_weights(network, row, l_rate):
    for i in range(len(network)):
        inputs = row[:-1]

        if i != 0:
            inputs = [neuron['output'] for neuron in network[i - 1]]

        for neuron in network[i]:
            for j in range(len(inputs)):
                neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
            neuron['weights'][-1] += l_rate * neuron['delta']

### 4.2 Train Network


In [28]:
# Train a net work for a fixed number of epochs
def train_network(network, train, l_rate, n_epoch, n_outputs):
    for epoch in range(n_epoch):
        sum_error = 0
        for row in train:
            outputs = forward_propagate(network, row)
            expected = [0 for i in range(n_outputs)]
            expected[row[-1]] = 1
            sum_error += sum([(expected[i] - outputs[i])**2 for i in range(len(expected))])
            backward_propagate_error(network, expected)
            update_weights(network, row, l_rate)
        print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

### 4.3 Test Network

We will use 2 neurons in the hidden layer. It is binary classification problem (2 classes) so there will be two neurons in the output layer. The network will be trained for 20 epochs with learning rate of 0.5, which is high because we are training for so few iterations.

In [29]:
from math import exp
from random import seed
from random import random

# Test training backprop algorithm
seed(1)

dataset = [[2.7810836,2.550537003,0],
           [1.465489372,2.362125076,0],
           [3.396561688,4.400293529,0],
           [1.38807019,1.850220317,0],
           [3.06407232,3.005305973,0],
           [7.627531214,2.759262235,1],
           [5.332441248,2.088626775,1],
           [6.922596716,1.77106367,1],
           [8.675418651,-0.242068655,1],
           [7.673756466,3.508563011,1]]

n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset]))
network = initialize_network(n_inputs, 2, n_outputs)
train_network(network, dataset, 0.5, 20, n_outputs)
for layer in network:
    print(layer)

>epoch=0, lrate=0.500, error=6.365
>epoch=1, lrate=0.500, error=5.557
>epoch=2, lrate=0.500, error=5.291
>epoch=3, lrate=0.500, error=5.262
>epoch=4, lrate=0.500, error=5.217
>epoch=5, lrate=0.500, error=4.899
>epoch=6, lrate=0.500, error=4.419
>epoch=7, lrate=0.500, error=3.900
>epoch=8, lrate=0.500, error=3.461
>epoch=9, lrate=0.500, error=3.087
>epoch=10, lrate=0.500, error=2.758
>epoch=11, lrate=0.500, error=2.468
>epoch=12, lrate=0.500, error=2.213
>epoch=13, lrate=0.500, error=1.989
>epoch=14, lrate=0.500, error=1.792
>epoch=15, lrate=0.500, error=1.621
>epoch=16, lrate=0.500, error=1.470
>epoch=17, lrate=0.500, error=1.339
>epoch=18, lrate=0.500, error=1.223
>epoch=19, lrate=0.500, error=1.122
[{'weights': [-0.9766426647918854, 1.0573043092399, 0.7999535671683315], 'output': 0.05429927062285241, 'delta': -0.0035328621774792703}, {'weights': [-1.2245133652927975, 1.4766900503308025, 0.7507113892487565], 'output': 0.03737569585208105, 'delta': -0.005989297622698788}]
[{'weights': 

## 5. Predict

We hace already seen how to forward-propagate an input pattern to get an output. This is all we need to do to make a prediction. We can use the output values themselves directly as the probability of a pattern belonging to each output class.

It may be more useful to turn this output back into a crisp class prediction. We can do this by selecting the class value with the larger probability. This is also called the [arg max function](https://en.wikipedia.org/wiki/Arg_max).

Below is a function named **predict()** that implements this procedure. It returns the index in the network output that has the largest probability. It assumes that class values have been converted to integers starting at 0.

In [30]:
# Make a prediction with a network
def predict(network, row):
    outputs = forward_propagate(network, row)
    return outputs.index(max(outputs))

In [31]:
# Test making predictions with the network
dataset = [[2.7810836,2.550537003,0],
           [1.465489372,2.362125076,0],
           [3.396561688,4.400293529,0],
           [1.38807019,1.850220317,0],
           [3.06407232,3.005305973,0],
           [7.627531214,2.759262235,1],
           [5.332441248,2.088626775,1],
           [6.922596716,1.77106367,1],
           [8.675418651,-0.242068655,1],
           [7.673756466,3.508563011,1]]

network = [[{'weights': [-1.482313569067226, 1.8308790073202204, 1.078381922048799]}, 
            {'weights': [0.23244990332399884, 0.3621998343835864, 0.40289821191094327]}],
          [ {'weights': [2.5001872433501404, 0.7887233511355132, -1.1026649757805829]}, 
            {'weights': [-2.429350576245497, 0.8357651039198697, 1.0699217181280656]}]]

for row in dataset:
    prediction = predict(network, row)
    print('Expected=%d, Got=%d' % (row[-1], prediction))

Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1


## 6. Wheat Seeds Dataset

This section applies the Backpropagation algorithm to the wheat seeds dataset.

Input values vary in scale and need to be normalized to the range of 0 and 1. It is generally good practice to normalize input values to the range of the chosen transfer function, in this case, the sigmoid function that outpus calues between 0 and 1.

We will evaluate the algorithm ising k-fold cross-validation with 5 folds.

A new function named back_propagation() was developed to manage the application of the Backpropagation algorithm, first initializing a network, training it on the training dataset and then using the trained network to make predictions on a test dataset.

In [32]:
# Add some functions to process dataset in csvfile

from random import seed
from random import randrange
from random import random
from csv import reader
from math import exp

# Load a CSV file
def load_csv(filename):
    dataset = list()
    with open(filename, 'r') as file:
        csv_reader = reader(file)
        for row in csv_reader:
            if not row:
                continue
            dataset.append(row)
    return dataset

# Convert string column to float
def str_column_to_float(dataset, column):
    for row in dataset:
        row[column] = float(row[column].strip())

# Convert string column to integer
def str_column_to_int(dataset, column):
    class_values = [row[column] for row in dataset]
    unique = set(class_values)
    lookup = dict()
    for i, value in enumerate(unique):
        lookup[value] = i
    for row in dataset:
        row[column] = lookup[row[column]]
    return lookup

# Find the min and max values for each column
def dataset_minmax(dataset):
    minmax = list()
    stats = [[min(column), max(column)] for column in zip(*dataset)]
    return stats

# Rescale dataset columns to the range 0-1
def normalize_dataset(dataset, minmax):
    for row in dataset:
        for i in range(len(row)-1):
            row[i] = (row[i] - minmax[i][0]) / (minmax[i][1] - minmax[i][0])

# Split a dataset into k folds
def cross_validation_split(dataset, n_folds):
    dataset_split = list()
    dataset_copy = list(dataset)
    fold_size = int(len(dataset) / n_folds)
    for i in range(n_folds):
        fold = list()
        while len(fold) < fold_size:
            index = randrange(len(dataset_copy))
            fold.append(dataset_copy.pop(index))
        dataset_split.append(fold)
    return dataset_split

# Calculate accuracy percentage
def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100.0

# Evaluate an algorithm using a cross validation split
def evaluate_algorithm(dataset, algorithm, n_folds, *args):
    folds = cross_validation_split(dataset, n_folds)
    scores = list()
    for fold in folds:
        train_set = list(folds)
        train_set.remove(fold)
        train_set = sum(train_set, [])
        test_set = list()
        for row in fold:
            row_copy = list(row)
            test_set.append(row_copy)
            row_copy[-1] = None
        predicted = algorithm(train_set, test_set, *args)
        actual = [row[-1] for row in fold]
        print("actual: {0}".format(actual))
        print("predic: {0}".format(predicted))
        accuracy = accuracy_metric(actual, predicted)
        scores.append(accuracy)
    return scores

In [33]:
# Backpropagation Algorithm With Stochastic Gradient Descent
def back_propagation(train, test, l_rate, n_epoch, n_hidden):
	n_inputs = len(train[0]) - 1
	n_outputs = len(set([row[-1] for row in train]))
	network = initialize_network(n_inputs, n_hidden, n_outputs)
	train_network(network, train, l_rate, n_epoch, n_outputs)
	predictions = list()
	for row in test:
		prediction = predict(network, row)
		predictions.append(prediction)
	return(predictions)

In [34]:
# Test Backprop on Seeds dataset
seed(1)

# load and prepare data
filename = '/content/drive/My Drive/data_set_to_train/wheat-seeds.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
    str_column_to_float(dataset, i)
 
# convert class column to integers
str_column_to_int(dataset, len(dataset[0]) -1)

# normalize input variables
minmax = dataset_minmax(dataset)
normalize_dataset(dataset, minmax)

# evaluate algorithm
n_folds = 5
l_rate = 0.3
n_epoch = 500
n_hidden = 5
scores = evaluate_algorithm(dataset, back_propagation, n_folds, l_rate, n_epoch, n_hidden)
print('Scores: %s' % scores)
print('Mean Accuracy: %.3f%%' % (sum(scores)/float(len(scores))))

>epoch=0, lrate=0.300, error=134.696
>epoch=1, lrate=0.300, error=113.344
>epoch=2, lrate=0.300, error=111.213
>epoch=3, lrate=0.300, error=109.484
>epoch=4, lrate=0.300, error=108.181
>epoch=5, lrate=0.300, error=106.915
>epoch=6, lrate=0.300, error=103.834
>epoch=7, lrate=0.300, error=96.623
>epoch=8, lrate=0.300, error=87.317
>epoch=9, lrate=0.300, error=79.185
>epoch=10, lrate=0.300, error=73.421
>epoch=11, lrate=0.300, error=69.657
>epoch=12, lrate=0.300, error=67.224
>epoch=13, lrate=0.300, error=65.616
>epoch=14, lrate=0.300, error=64.515
>epoch=15, lrate=0.300, error=63.729
>epoch=16, lrate=0.300, error=63.145
>epoch=17, lrate=0.300, error=62.699
>epoch=18, lrate=0.300, error=62.360
>epoch=19, lrate=0.300, error=62.106
>epoch=20, lrate=0.300, error=61.922
>epoch=21, lrate=0.300, error=61.792
>epoch=22, lrate=0.300, error=61.694
>epoch=23, lrate=0.300, error=61.595
>epoch=24, lrate=0.300, error=61.436
>epoch=25, lrate=0.300, error=61.064
>epoch=26, lrate=0.300, error=59.929
>epo