# Neural networks from scratch (part 1)
These notebooks are me following along the course on Youtube.
I find it a great way to refresh my understanding of neural networks.

If you want to do this yourself, you can find the course here:

https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3

## Video 1: The neuron

A neuron multiplies all the inputs with its weights and then takes the sum of these elements (= dot-product). Then it adds a bias to this value to come to an output. If we keep it like this we have a linear function (output = weights * inputs + bias) and of course the neuron is very limited in what it can learn. Therefore the output of the neuron is passed through an activation function first (activations seem to be discussed more in depth in later videos)


In [1]:
import numpy as np
import matplotlib.pyplot as plt

Fixed implementation of what a single neuron does. (no activation function)
In practice the neuron is some kind of object that keeps track of its weights and bias, as they need to be changed in order for the neuron to learn.
Here we just go through the computation manually.

In [2]:
input_vector = [3.2, 7.2, 0.3]
weights = [0.63, 1.2, 0.93]
bias = 3

output = input_vector[0] * weights[0] + input_vector[1] * weights[1] + input_vector[2] * weights[2] + bias
output

13.935

## Video 2: Layers
A single neuron can only learn so much, that's why we combine multiple neurons. In the same step we call this a layer and multiple layers combined make up the neural network.
So continuing with the previous example we can make a layer with three neurons like this:

In [3]:
# Output of previous layer
input_vector = [3.2, 7.2, 0.3]

# Neuron 1
weights_1 = [0.63, 1.2, 0.93]
bias_1 = 3

# Neuron 2
weights_2 = [1.4, -0.7, 2.1]
bias_2 = -1.23

# Neuron 3
weights_3 = [-1.2, 0.1, 3.5]
bias_3 = 0.23

# Output of this layer
output = [
    input_vector[0] * weights_1[0] + input_vector[1] * weights_1[1] + input_vector[2] * weights_1[2] + bias_1,
    input_vector[0] * weights_2[0] + input_vector[1] * weights_2[1] + input_vector[2] * weights_2[2] + bias_2,
    input_vector[0] * weights_3[0] + input_vector[1] * weights_3[1] + input_vector[2] * weights_3[2] + bias_3
]

output

[13.935, -1.1600000000000006, -1.8399999999999994]

## Video 3: The dot product
Previous parts contain the basic computations of a single neuron, but it can be simplified a lot by using vectors.
All the weights and biases of a single layer are stored together and through vector operations we can easily compute the output vector of the layer, given an input vector.

In [7]:
# We keep the same input vector as before
input_vector = [3.2, 7.2, 0.3]

# The weights of the three neurons are now combined in a single vector
weights = [
    [0.63, 1.2, 0.93],
    [1.4, -0.7, 2.1],
    [-1.2, 0.1, 3.5]
]

# The same is done with the biases
biases = [3, -1.23, 0.23]

# Now we could do the same computation as before through a for-loop, but that's not very clean
# We can use the dot product, for instance through numpy
output = np.dot(weights, input_vector) + biases

output

array([13.935, -1.16 , -1.84 ])

### Intermezzo 1: Shapes
The shape defines the structure of the array. So a (4,) shape means that we have a vector with 4 elements, a (2, 4) shape means we have a matrix of 2 rows with 4 columns each. The shape has to be homologous (within each dimension there needs to be an equal number of elements, e.g. if we have a 2 by four shape we basically have a list with 2 lists of 4 elements each, no more no less)

A tensor is an object that can be represented as an array. In deep learning frameworks we work with tensors in the array format.

To come back to shapes in the previous cell we did the dot product of weights and input_vector, not the other way around.
The case here is kind of special, weights has a shape of (3, 3) and input_vector has one of (3,). Doing it in this order will perform the dot product for each row of weights, which has a shape of (3,) as well.
Doing it the other way around will return something, but it won't be correct:

In [8]:
# What if we turn it around?
output = np.dot(input_vector, weights) + biases
output

array([14.736, -2.4  , 19.376])

Basically, each row of weights was multiplied with one input and then summed up. If we remove one neuron from the example, this will no longer work:

In [9]:
# We keep the same input vector as before
input_vector = [3.2, 7.2, 0.3]

# Remove the third neuron
weights = [
    [0.63, 1.2, 0.93],
    [1.4, -0.7, 2.1]
]

# The same is done with the biases
biases = [3, -1.23]

# What if we turn it around now?
output = np.dot(input_vector, weights) + biases

output

ValueError: shapes (3,) and (2,3) not aligned: 3 (dim 0) != 2 (dim 0)

The error message clearly shows the problem. If we put weights back first again, we get the correct behavior:

In [10]:
# Puttin weights first
output = np.dot(weights, input_vector) + biases
output

array([13.935, -1.16 ])

: 

## Video 4: Batches
In previous parts we only computed the output of one layer for a single input. However, to speed things up we can perform computations in parallel.
By using a GPU we can do lots of these computations in parallel, really speeding up the network.

Aside from the processing speed, batches are really great for another reason. When training the neural network we are trying to figure out the weights and biases of the network as to best fit the training data. Now if we take one sample (that is a single input-output combination) we can make the network perfectly fit this relation. However, when we take another sample it hasn't really improved. The network needs to learn to generalize the data and therefore needs to figure out the weights and biases that best fit all the training samples together. To do this we can change the weights and biases to better fit a batch of training data at a time. (The specifics of how the network learns comes in one of the later lessons). 

In [6]:
# We now have a batch of 4 input vectors
input_batch = [[3.2, 7.2, 0.3],
                [-1.2, 3.4, 0.5],
                [2.8, -4.3, -0.1],
                [-5.1, 2.3, -0.4]
]

# The weights of the three neurons remain exactly the same
weights = [
    [0.63, 1.2, 0.93],
    [1.4, -0.7, 2.1],
    [-1.2, 0.1, 3.5]
]

# The biases also remain the same
biases = [3, -1.23, 0.23]

# Producing the output of the layer is now a matrix multiplication of the input batch and the weights
# Mind however that the shape needs to be correct, in either order the dot product has a shape mismatch
# This because the second element needs to be transposed (write this down if not clear why, it's very straightforward)
output = np.dot(input_batch, np.array(weights).T) + biases

output

array([[ 13.935,  -1.16 ,  -1.84 ],
       [  6.789,  -4.24 ,   3.76 ],
       [ -0.489,   5.49 ,  -3.91 ],
       [  2.175, -10.82 ,   5.18 ]])

These are the basics of a single layer, now we can turn this into object oriented code to make things simpler for future use. (Check notebook 2)