# Neural network for determining the sign of the sum of two numbers
### by Börge Göbel

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
import random

## 1 . Prepare training and test data (typically loaded from file) 

- Here we generate it

In [None]:
rangeData = 20                             # Numbers from [-rangeData,+rangeData]
lenData = 1000                             # How many pairs of numbers do we generate
testProportion = 0.3                       # 30% testing, 70% training 
testEnd = round(lenData * testProportion)  # How many pairs of numbers are used for testing

- Generate 1000 pairs of numbers as 1000 seperate inputs for our network

In [None]:
dataIn = np.random.randint(-rangeData, rangeData+1, size=(lenData, 2))

- Generate the corresponding 1000 output values. These will be the sum of the two inputs.
- We do not tell the network that it is the sum. The network shall learn this by itself.

In [None]:
### CHANGE ###
dataOut = dataIn[:,0] + dataIn[:,1]

Sort them into categories [negative, positive]

-1 (negative) \\(\rightarrow \\) [1,0]

+1 (positive) \\(\rightarrow \\) [0,1]

- Adding a '1' element to each input pair (related to bias - more on this later)

In [None]:
dataIn = np.concatenate([np.ones([lenData,1]), dataIn], axis=1)

- The final data sets and 1 example each

In [None]:
testingIn   = dataIn[0:testEnd]
testingOut  = dataOut[0:testEnd]
trainingIn  = dataIn[testEnd:]
trainingOut = dataOut[testEnd:]

In [None]:
print( testingIn[0] )
print( testingOut[0] )
print( trainingIn[0] )
print( trainingOut[0] )

## 2. Setting up neural network

![Sign_network.png](Sign_network.png)

Input layer length: 3 (1 bias + 2 numbers)

Hidden layer length: 5 (5 neurons) \\( \rightarrow\\) We can change this value to improve performance

Output layer length: 2 (result catergories - negative vs positive)

### 2.1 Initialize weights: Numbers in the range from -2 to 2

- We need a starting point for our weights. Let's select them randomly.
- Be careful about the dimension of the arrays:

    - weights[0] connects the input layer with the hidden layer
    - weights[1] connects the hidden layer with the output layer

In [None]:
### CHANGE ###
weights = 4 * np.random.random_sample(3) - 2

### 2.2 Activation function

- Typically a monotonuous function that rescales a value to the range [0,1]
- Here we use the sigmoid function:

Activation function:
\\( a(x) = \frac{1}{1+\exp(-x)} \\)

Derivative:
\\( a'(x) = \frac{\exp(-x)}{\left[1+\exp(-x)\right]^2} \\)

### 2.3 Calculate output of our neural network

The value of a neuron is given as the dot product of the two vectors: 
- weights 
- value of the neurons in the previous layer (including bias: value 1)

This value is then rescaled by the activation function.

First, calculate the hidden layer:
\\( h_i = a\left(w_{0i}^{(0)}x_0 + w_{1i}^{(0)}x_1 + w_{2i}^{(0)}x_2\right) \\)

Then, calculate the output layer:
\\( y_j = a\left(w_{0j}^{(1)}h_0 + w_{1j}^{(1)}h_1 + w_{2j}^{(1)}h_2 + w_{3j}^{(1)}h_3 + w_{4j}^{(1)}h_4 \right) \\)

In [None]:
### CHANGE ###

def calculateOut(x,w):
    # x: input
    # w: weights
    return np.dot(x,w)

In [None]:
### CHANGE ###

testIndex = 10
calculateOut( trainingIn[testIndex], weights )

The calculation of one of these output values corresponds to:

\\( y_j = a\Big[w_{0j}^{(1)}a\left(w_{00}^{(0)}x_0 + w_{10}^{(0)}x_1 + w_{20}^{(0)}x_2\right) \\ \quad\quad + w_{1j}^{(1)}a\left(w_{01}^{(0)}x_0 + w_{11}^{(0)}x_1 + w_{21}^{(0)}x_2\right) \\ \quad\quad + w_{2j}^{(1)}a\left(w_{02}^{(0)}x_0 + w_{12}^{(0)}x_1 + w_{22}^{(0)}x_2\right) \\ \quad\quad + w_{3j}^{(1)}a\left(w_{03}^{(0)}x_0 + w_{13}^{(0)}x_1 + w_{23}^{(0)}x_2\right) \\ \quad\quad + w_{4j}^{(1)}a\left(w_{04}^{(0)}x_0 + w_{14}^{(0)}x_1 + w_{24}^{(0)}x_2\right) \Big] \\)

### 2.4 Functions: Calculate accuracy and individual error

### - Accuracy: 
What is the rate at which the output is predicted correctly (only correct and wrong matter)?

In [None]:
### CHANGE ###

def accuracy(testingIn,testingOut,weights):
    return 1 - np.sum( 
                np.abs( 
                    np.sign (
                        np.round( calculateOut( testingIn, weights )) - testingOut 
                    ) ) ) / testEnd

In [None]:
accuracy(testingIn,testingOut,weights)

- So far, output is random 

### - Error (better for learning): 
For each input the 2 component output vector is compared to the correct 2 component output vector

\\( \Delta = (\vec{y}-\vec{Y})^2=\sum_j (y_j-Y_j)^2 \\)

\\( y_j \\): Predicted output of neuron \\( j \\) (Number between 0 and 1)

\\( Y_j \\): Correct result of neuron \\( j \\) (Number exactly 0 or 1)

In [None]:
def error(predictedValues, correctValues):
    return np.sum(predictedValues - correctValues)**2 

In [None]:
### CHANGE ###

error(calculateOut(trainingIn[testIndex], weights), trainingOut[testIndex])

Or as one long formula

\\(\Delta = \sum_j \Big( a\Big[w_{0j}^{(1)}a\left(w_{00}^{(0)}x_0 + w_{10}^{(0)}x_1 + w_{20}^{(0)}x_2\right) \\ \quad\quad\quad\quad + w_{1j}^{(1)}a\left(w_{01}^{(0)}x_0 + w_{11}^{(0)}x_1 + w_{21}^{(0)}x_2\right) \\ \quad\quad\quad\quad + w_{2j}^{(1)}a\left(w_{02}^{(0)}x_0 + w_{12}^{(0)}x_1 + w_{22}^{(0)}x_2\right) \\ \quad\quad\quad\quad + w_{3j}^{(1)}a\left(w_{03}^{(0)}x_0 + w_{13}^{(0)}x_1 + w_{23}^{(0)}x_2\right) \\ \quad\quad\quad\quad + w_{4j}^{(1)}a\left(w_{04}^{(0)}x_0 + w_{14}^{(0)}x_1 + w_{24}^{(0)}x_2\right) \Big] -Y_j\Big)^2 \\)

### 2.5 Function: Calculate gradient (d Error / d weight)

- All derivatives with respect to the individual weights (Use chain rule)

\\( \frac{\partial }{\partial w_{ij}^{(1)}}\Delta = 2(y_j-Y_j) \cdot a'\left(w_{0j}^{(1)}h_0 + w_{1j}^{(1)}h_1 + w_{2j}^{(1)}h_2 + w_{3j}^{(1)}h_3 + w_{4j}^{(1)}h_4 \right)\cdot a\left(w_{0i}^{(0)}x_0 + w_{1i}^{(0)}x_1 + w_{2i}^{(0)}x_2\right)\\)

\\( \frac{\partial }{\partial w_{ki}^{(0)}}\Delta = \sum_j 2(y_j-Y_j) \cdot a'\left(w_{0j}^{(1)}h_0 + w_{1j}^{(1)}h_1 + w_{2j}^{(1)}h_2 + w_{3j}^{(1)}h_3 + w_{4j}^{(1)}h_4 \right)\cdot w_{ij}^{(1)} \cdot a'\left(w_{0i}^{(0)}x_0 + w_{1i}^{(0)}x_1 + w_{2i}^{(0)}x_2\right)\cdot x_k\\)

In [None]:
### CHANGE ###

def gradient(x,w,correctValues):
    return 2 * (calculateOut(x,w) - correctValues) * x

In [None]:
gradient(trainingIn[testIndex], weights, trainingOut[testIndex])

## 3. Training: Use Gradient descent to change weights to minimize the error

Repeat the following process many time:
- Select an input pair (index)
- Calculate the gradient of the error 
- Change weights accoding to 

\\( w_\mathrm{new} = w_\mathrm{old} - learingRate\cdot gradient\\)

In [None]:
### CHANGE ###

learningRate = 0.001
steps = 10000

# for documentation
errorList = [error(calculateOut(trainingIn[testIndex], weights), trainingOut[testIndex])]
weightList = [weights]

In [None]:
### CHANGE ###

for i in range(steps):
    # pick random input
    index = np.random.randint(lenData-testEnd)
    # update weights (go along opposite gradient)
    weights = weights - learningRate*gradient(trainingIn[index], weights, trainingOut[index])
    weightList.append( weights )
    # calculate new error
    er = error(calculateOut(trainingIn[index], weights), trainingOut[index])
    errorList.append( er )

In [None]:
errorList

In [None]:
print(er)
print(weights)

In [None]:
plt.ylim([0,1])
plt.scatter(range(steps+1),errorList)

In [None]:
plt.plot(range(steps+1),np.log(errorList))

## 4. Application to test data set (new data)

In [None]:
testingOut

In [None]:
### CHANGE ###

np.round(calculateOut( testingIn, weights ))

In [None]:
### CHANGE ###

calculateOut( testingIn, weights )

In [None]:
accuracy(testingIn,testingOut,weights)