# Neural networks from scratch (part 2)
These notebooks are me following along the course on Youtube.
I find it a great way to refresh my understanding of neural networks.

If you want to do this yourself, you can find the course here:

https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3

In [9]:
import numpy as np
import math

## Video 4 (continued): neural network in object oriented code 
The first notebook covered the basic mathematics behind neurons, layers and batches. Continuing on that knowledge we can turn this into object oriented code to easily create neural networks.

In [5]:
class Dense_Layer:
    def __init__(self, input_size, output_size):
        # We need something like this
        # self.weights = [[np.random.random() for _ in range(input_size)] for _ in output_size]
        # self.biases = [np.random.random() for _ in range(output_size)]
        # A bit cleaner:
        self.weights = np.random.randn(input_size, output_size)
        self.biases = np.zeros((1, output_size))
    def forward(self, inputs):
        return np.dot(inputs, self.weights) + self.biases

In [5]:
# For example make a network with two layers
layer1 = Dense_Layer(3,7)
layer2 = Dense_Layer(7, 2)

inputs = [1.1, -0.4, 3.1]

x = layer1.forward(inputs)
output = layer2.forward(x)

output

array([[-0.48563807,  0.87077944]])

## Video 5: hidden layer activations
Not performing an activation of the outputs of a layer (activation y=x), makes the layer a linear function. Inputs are multiplied by weights and summed up. Combining multiple layers like this, does not change this. The network itself can only learn linear functions. Of course most of the problems have nonlinear solutions and that is why we use nonlinear activation functions in the hidden layers. There are many possible functions that can be used, for instance: step function (y = 0 if x < 0 and y = 1 if x > 0), rectified linear unit or ReLU (y = 0 if x < 0 else y = x), sigmoid (y = 1 / (1 + e^-x)) and many other variants.

### ReLU
By changing the weights and bias of a neuron the activation function is changed. For instance if the weights are negative, the activation function flips around the y-axis. Changing the bias will move the activation point (point where function switches from y = 0 to y = x). By then placing more neurons after one another, there are even more possibilities. They can for instance two neurons can model a function where y = value1 if x < a, y = value 2 if x > b and y = x for a < x < b. Combining more and more neurons allows for more and more complex nonlinear functions to be modeled by the network.

In [2]:
inputs = [1.1, -0.4, 3.1]
output = []

for i in inputs:
    if i > 0:
        output.append(i)
    else:
        output.append(0)
        
    # Alternative: output.append(max(0, i))

print(output)

[1.1, 0, 3.1]


In [3]:
class ReLU:
    def forward(self, inputs):
        return np.maximum(0, inputs)

In [8]:
# For example make a network with two layers with ReLU activations
layer1 = Dense_Layer(3,7)
layer2 = Dense_Layer(7, 2)
relu = ReLU()

inputs = [1.1, -0.4, 3.1]

x = layer1.forward(inputs)
x = relu.forward(x)
x = layer2.forward(x)
output = relu.forward(x)

output

array([[0.        , 0.57219852]])

## Video 6: Softmax activation
When building a classifier we want the output to represent a distribution of the different classes. So all outputs are values between 0 and 1, and when summed up they total 1.The outputs then represent the certainty of the network that the input belongs to each class. To do this we need to perform two steps. First the inputs need to be exponentiated (y = e^x). This way the negative values are removed, while keeping the information of all the values. Then the values need to be normalized in order to bring the total sum to 1. Now we can implement this as follows:

In [10]:
inputs = [1.1, -0.4, 3.1]

exponentiated_values = []

for i in inputs:
    exponentiated_values.append(math.e ** i)

exponentiated_values

[3.0041660239464334, 0.6703200460356393, 22.197951281441632]

In [11]:
normalized_base = sum(exponentiated_values)
normalized_values = []

for value in exponentiated_values:
    normalized_values.append(value / normalized_base)

normalized_values

[0.11611453467414118, 0.02590865471740153, 0.8579768106084572]

Of course this can be cleaned up a bit. The two steps can easily be combined in a single function. We also need to watch out for exploding values during the exponentiation step.

In [None]:
class Softmax:
    def forward(self, inputs):
        # Prevent exploding values
        inputs = inputs - np.max(inputs, axis=1, keepdims=True)
        # Exponentiation
        exponentiated_values = np.exp(inputs)
        # Normalization
        return exponentiated_values / np.sum(exponentiated_values, axis=1, keepdims=True)