# Basics of neural networks 1

### Making neural nets from scratch in Python

The following is meant to be an entrance level tutorial on programming neural networks from scratch in Python 3 for those interested while at the same time serving as my personal notes.

In theory anyone could follow along but if the wish of the reader is to fully understand what’s going on, I recommend a preliminary understanding of neural nets (like the different layers and their relation) along with some coding knowledge (in particular data structures) and basic math. 

### Example 1: Basic ANN - Artificial Neural Network (Neurons: 3 input, 8 hidden, 1 output)

An experienced coder might disapprove of the variable naming in the first example, however this is done in order to clarify each elements particular purpose and the formalization will be simplified in due time.

In [1]:
# Relevant imports and methods
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    return x * (1 - x)

In [2]:
# Simple Neural Network (Neurons: 3 input, 8 hidden, 1 output) training

input_neurons = np.array([[0, 0, 0],
                          [0, 0, 1],
                          [0, 1, 0],
                          [0, 1, 1],
                          [1, 0, 0],
                          [1, 0, 1],
                          [1, 1, 0],
                          [1, 1, 1]])

output_neurons = np.array([[0],
                           [1],
                           [1],
                           [0],
                           [1],
                           [1],
                           [1],
                           [1]])


# Generating random link values (and normalizing)
synapse_InputToHidden = np.random.random((3, 8)) * 2 - 1
synapse_HiddenToOutput = np.random.random((8, 1)) * 2 - 1


for i in range(100000):  # Running 100,000 times to decrease error margin thereby optimizing the network
    
    input_layer = input_neurons
    hidden_layer = sigmoid(np.dot(input_layer, synapse_InputToHidden))
    output_layer = sigmoid(np.dot(hidden_layer, synapse_HiddenToOutput))
    
    # Gradient descent (more specifically a "backward propagation of errors" - Backpropagation)
    errormargin_OutputLayer = output_neurons - output_layer
    delta_OutputLayer = errormargin_OutputLayer * sigmoid_derivative(output_layer)
    errormargin_HiddenLayer = delta_OutputLayer.dot(synapse_HiddenToOutput.T)
    delta_HiddenLayer = errormargin_HiddenLayer * sigmoid_derivative(hidden_layer)
    
    synapse_InputToHidden += input_layer.T.dot(delta_HiddenLayer)
    synapse_HiddenToOutput += hidden_layer.T.dot(delta_OutputLayer)
    
    # Prints error margin for output layer every 10,000 iderations
    if i % 10000 == 0:
        print("Current error margin: {}".format(str(np.mean(abs(errormargin_OutputLayer)))))

Current error margin: 0.47904316518173573
Current error margin: 0.008083540778178439
Current error margin: 0.005549242118860935
Current error margin: 0.004470801050511097
Current error margin: 0.0038397882626866006
Current error margin: 0.0034140780056581247
Current error margin: 0.003102349007059884
Current error margin: 0.002861541103004262
Current error margin: 0.0026683658190178915
Current error margin: 0.0025089899231507213


(Notice how our error margin continuously deceases optimizing our neural network)

The final output layer is printed as well as our trained synapses in order to get a feel for what our network looks like.

In [3]:
# Printing the final output layer
print(output_layer)

[[0.00458214]
 [0.9958229 ]
 [0.99592321]
 [0.00429531]
 [0.9999992 ]
 [0.99954541]
 [0.99927963]
 [0.99931008]]


In [4]:
# Printing the weightings our two synapses
print(synapse_InputToHidden)
print(synapse_HiddenToOutput)

[[ 1.66092836 -1.94255878  0.87375664  3.62548372  0.54571907 -1.63315135
   1.74861516 -4.09367732]
 [-0.57432746 -0.44115475  4.22910483  3.80172052  0.20225282 -5.02637533
   3.36078733  7.52943744]
 [ 0.26149031  1.39593606  4.10617118 -7.68907505 -0.55498175 -4.92374221
   3.23223058 -3.70801771]]
[[ -0.68252249]
 [ -4.28008792]
 [  6.44939477]
 [ 10.65714543]
 [ -1.17421928]
 [-13.22731424]
 [  3.90985808]
 [-12.41425713]]


Let's see what happens in we send a new input through our neural net (output should be 0, since the new input corresponds to the first input in our training data)

In [5]:
new_input = np.array([0, 0, 0])

new_output = sigmoid(np.dot(sigmoid(np.dot(new_input, synapse_InputToHidden)), synapse_HiddenToOutput))

print(new_output)

[0.00458212]


Pretty close! It it worth mentioning that it is possible to mess up the structure of the synapses (e.g. try training the system with output_neurons = 0, 0, 0, 0, 1, 1, 1, 1. The reason for this problem will be investigated later on.

### Example 2: Deep Neural Network (Neurons: 4 input, 8 hidden, 16 hidden, 8 hidden, 2 output)
(Note: A DNN - Deep Neural Network is an ANN with multiple layers between the input and output layers.)

We keep Example 1's methods.

In [6]:
# Deep Neural Network (Neurons: 4 input, 8 hidden, 16 hidden, 8 hidden, 2 output)

input_neurons = np.array([[0, 0, 0, 0],
                          [0, 0, 0, 1],
                          [0, 0, 1, 0],
                          [0, 0, 1, 1],
                          [0, 1, 0, 0],
                          [0, 1, 0, 1],
                          [0, 1, 1, 0],
                          [0, 1, 1, 1]])

output_neurons = np.array([[1, 0],
                           [1, 1],
                           [1, 1],
                           [1, 0],
                           [1, 1],
                           [0, 1],
                           [0, 0],
                           [1, 1]])

# Deterministic generation such that we can better compare different network
np.random.seed(1)

# Generating and normalizing synapses
synapse_1 = np.random.random((4, 8)) * 2 - 1
synapse_2 = np.random.random((8, 16)) * 2 - 1
synapse_3 = np.random.random((16, 8)) * 2 - 1
synapse_4 = np.random.random((8, 2)) * 2 - 1


for i in range(100000):  # Training

    layer_1 = input_neurons
    layer_2 = sigmoid(np.dot(layer_1, synapse_1))
    layer_3 = sigmoid(np.dot(layer_2, synapse_2))
    layer_4 = sigmoid(np.dot(layer_3, synapse_3))
    layer_5 = sigmoid(np.dot(layer_4, synapse_4))

    # Backpropagation
    layer_5_error = output_neurons - layer_5
    layer_5_delta = layer_5_error * sigmoid_derivative(layer_5)
    layer_4_error = layer_5_delta.dot(synapse_4.T)
    layer_4_delta = layer_4_error * sigmoid_derivative(layer_4)
    layer_3_error = layer_4_delta.dot(synapse_3.T)
    layer_3_delta = layer_3_error * sigmoid_derivative(layer_3)
    layer_2_error = layer_3_delta.dot(synapse_2.T)
    layer_2_delta = layer_2_error * sigmoid_derivative(layer_2)

    synapse_1 += layer_1.T.dot(layer_2_delta)
    synapse_2 += layer_2.T.dot(layer_3_delta)
    synapse_3 += layer_3.T.dot(layer_4_delta)
    synapse_4 += layer_4.T.dot(layer_5_delta)


    # Prints error margin
    if i % 10000 == 0:
        print("Current error margin: {}".format(str(np.mean(abs(layer_5_error)))))

Current error margin: 0.44711071583178513
Current error margin: 0.004957724608214143
Current error margin: 0.003368736657040206
Current error margin: 0.002699241313895848
Current error margin: 0.0023102055028723827
Current error margin: 0.00204909238051682
Current error margin: 0.0018586619092116137
Current error margin: 0.00171203849427689
Current error margin: 0.001594739285055455
Current error margin: 0.0014981884017274266


In [7]:
print(layer_5)

[[0.99836656 0.00211058]
 [0.99934177 0.99944946]
 [0.99877575 0.99800481]
 [0.99884066 0.00190899]
 [0.99845804 0.99853785]
 [0.00174104 0.99999543]
 [0.0023237  0.00171322]
 [0.99886795 0.99848799]]


Once again the network is modeled pretty accurately!

We can tweak the parameters of the synapses thereby creating different networks. Subsequently we can compare their error rates thus optimizing the networks structure for this particular task. Here are 3 final error margins of 3 different networks:

(Neurons: 4 input, 8 hidden, 16 hidden, 8 hidden, 2 output): 0.00020308381653301406
<br>(Neurons: 4 input, 8 hidden, 8 hidden, 8 hidden, 2 output): 0.0002564333501877733
<br>(Neurons: 4 input, 16 hidden, 8 hidden, 8 hidden, 2 output): 0.00010261191080388188

As expected the bigger network seems to produce a better error margin, however, interestingly placing a majority of the neurons in the beginning seems to yield even better predictions.