# Neural Net from scratch
I watched a short clip from Lex Fridman interviewing Andrej Karpathy, so here I am building a neural network from scratch, putting in time towards my 10000 hours. By the time I reach this 10000 hours number, I may no longer be :)

## Forward Prop

In [37]:
import numpy as np

num_inputs = 2
num_hidden_layers = 2
num_nodes_hidden_layer_1 = 6
num_nodes_hidden_layer_2 = 4
num_outputs = 1

# weight matrix between the inputs and the first hidden layer
# If our first hidden layer has 6 units
# we'd need to produce a calculation for each hidden unit,
# IE the result of the dot product of the weight matrix for this hidden layer and the input vector should be
# a 6 x 1 vector.
# what we feed that matrix is our input layer:
# the dimension of our input is 2 x 1, but since we also need to add a bias unit,
# for convenience we add the bias into the input vector as a 1 - 
# [
#    1,
#    x1,
#    x2
# ]
# Now we feed the weight matrix a (3 x 1) input vector so we get a 6 x 1 activations vector, hence
# Our weight matrix needs to be 6 x 3 :
# (6 x 3) dot (3 x 1) = 6 x 1

# for the sake of verifying calculation, for now our weight matrices would be 1s
# later we will randomize them
hidden_layer_1_weight_matrix = np.ones((num_nodes_hidden_layer_1, num_inputs + 1))

print(f'Hidden Layer 1 matrix: {hidden_layer_1_weight_matrix.shape}\n')
print(hidden_layer_1_weight_matrix)


print()
# Similarly, we need to feed this  6 x 1 activations vector plus a bias unit(so 7 x 1 vector)
# into the weight matrix for the second hidden layer
# The second hidden layer has 4 nodes, hence needs to produce a 4 x 1 activations vector, so
# The weight matrix for the second hidden layer would be 4 x 7: (4x7) dot (7x1) = 4x1

hidden_layer_2_weight_matrix = np.ones((num_nodes_hidden_layer_2, num_nodes_hidden_layer_1 + 1))

print(f'Hidden Layer 2 matrix: {hidden_layer_2_weight_matrix.shape}\n')
print(hidden_layer_2_weight_matrix)

print()
# And again, we need to feed the 4x1 activations vector plus a bias unit(so 5x1)
# into the weight matrix for the output layer
# The output layer is a singled node (1, 1), so
# The weight matrix for the output layer would be (1x5)
# (1x5)dot(5x1) = (1x1)

output_layer_weight_matrix = np.ones((num_outputs, num_nodes_hidden_layer_2 + 1))

print(f'Output Layer matrix: {output_layer_weight_matrix.shape}\n')
print(output_layer_weight_matrix)



Hidden Layer 1 matrix: (6, 3)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

Hidden Layer 2 matrix: (4, 7)

[[1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1. 1. 1.]]

Output Layer matrix: (1, 5)

[[1. 1. 1. 1. 1.]]


### Actual Forward Prop
##### Now that we have all the matrices, let's see if we can get a sample input vector to turn into output.
##### We will feed a forward prop function our input vector(2x1) and we should end up with a 1x1 input

In [38]:
def forward_prop(input_vector, hidden_layer_1_weight_matrix, hidden_layer_2_weight_matrix, 
                 output_layer_weight_matrix):
    
    # add the bias unit to the input
    input_vector_plus_bias = np.vstack([np.array([1]), input_vector])
    layer_1_activations_vector = np.dot(hidden_layer_1_weight_matrix, input_vector_plus_bias)
    
    print(f'\nhidden_layer_1_weight_matrix shape: {hidden_layer_1_weight_matrix.shape}')
    print(f'input_vector_plus_bias shape: {input_vector_plus_bias.shape}')
    print(f'layer_1_activations_vector shape: {layer_1_activations_vector.shape}\n')
    print(layer_1_activations_vector)
    
    # add the bias unit to the layer_1_activations_vector
    layer_1_activations_vector_plus_bias = np.vstack([np.array([1]), layer_1_activations_vector])
    layer_2_activations_vector = np.dot(hidden_layer_2_weight_matrix, layer_1_activations_vector_plus_bias)
    
    print(f'\nhidden_layer_2_weight_matrix shape: {hidden_layer_2_weight_matrix.shape}')
    print(f'layer_1_activations_vector_plus_bias shape: {layer_1_activations_vector_plus_bias.shape}')
    print(f'layer_2_activations_vector: {layer_2_activations_vector.shape}\n')
    print(layer_2_activations_vector)
    
    
    # add the bias unit to the layer_2_activations_vector
    layer_2_activations_vector_plus_bias = np.vstack([np.array([1]), layer_2_activations_vector])
    output_vector = np.dot(output_layer_weight_matrix, layer_2_activations_vector_plus_bias)
    
    print(f'\noutput_layer_weight_matrix shape: {output_layer_weight_matrix.shape}')
    print(f'layer_2_activations_vector_plus_bias shape: {layer_2_activations_vector_plus_bias.shape}')
    print(f'output_vector shape: {output_vector.shape}\n')
    
    return output_vector

sample_input = np.array([
    [2],
    [2]
])
print(f'sample_input shape: {sample_input.shape}')
output_vector = forward_prop(sample_input, hidden_layer_1_weight_matrix, hidden_layer_2_weight_matrix, 
                 output_layer_weight_matrix)


print(output_vector)



sample_input shape: (2, 1)

hidden_layer_1_weight_matrix shape: (6, 3)
input_vector_plus_bias shape: (3, 1)
layer_1_activations_vector shape: (6, 1)

[[5.]
 [5.]
 [5.]
 [5.]
 [5.]
 [5.]]

hidden_layer_2_weight_matrix shape: (4, 7)
layer_1_activations_vector_plus_bias shape: (7, 1)
layer_2_activations_vector: (4, 1)

[[31.]
 [31.]
 [31.]
 [31.]]

output_layer_weight_matrix shape: (1, 5)
layer_2_activations_vector_plus_bias shape: (5, 1)
output_vector shape: (1, 1)

[[125.]]


Seems to check out, but the activation function is the identity function => this is all very linear
To make it non-linear, we'd have to feed the activations to non-linear functions
##### The sigmoid is a good learning start

In [39]:
def sigmoid(this_vector):
    return 1/(1 + np.exp(-this_vector))

The sigmoid outputs numbers in the range (0,1). For this exercise, we will attempt to create a network that can predict a number in any range, hence we will not run our output activation through the sigmoid.

We will squish all the hidden activations through the sigmoid though. Here's the updated forward_prop function:

In [41]:
def forward_prop(input_vector, hidden_layer_1_weight_matrix, hidden_layer_2_weight_matrix, 
                 output_layer_weight_matrix):
    
    input_vector_plus_bias = np.vstack([np.array([1]), input_vector])
    layer_1_activations_vector = np.dot(hidden_layer_1_weight_matrix, input_vector_plus_bias)   
    layer_1_activations_vector = sigmoid(layer_1_activations_vector)
    
    layer_1_activations_vector_plus_bias = np.vstack([np.array([1]), layer_1_activations_vector])
    layer_2_activations_vector = np.dot(hidden_layer_2_weight_matrix, layer_1_activations_vector_plus_bias)
    layer_2_activations_vector = sigmoid(layer_2_activations_vector)
    
    layer_2_activations_vector_plus_bias = np.vstack([np.array([1]), layer_2_activations_vector])
    output_vector = np.dot(output_layer_weight_matrix, layer_2_activations_vector_plus_bias)
    
    return output_vector

sample_input = np.array([
    [2],
    [2]
])

# As promised, let's make the weight matrices be not 1s
hidden_layer_1_weight_matrix = np.random.random((num_nodes_hidden_layer_1, num_inputs + 1))
hidden_layer_2_weight_matrix = np.random.random((num_nodes_hidden_layer_2, num_nodes_hidden_layer_1 + 1))
output_layer_weight_matrix = np.random.random((num_outputs, num_nodes_hidden_layer_2 + 1))

output_vector = forward_prop(sample_input, hidden_layer_1_weight_matrix, hidden_layer_2_weight_matrix, 
                 output_layer_weight_matrix)


print(output_vector)

[[2.23936876]]
