# Why we need activation functions

Reading: https://www.geeksforgeeks.org/activation-functions-neural-networks/

In [22]:
# layer 1
b1 = 7

w1 = 2
w2 = -1

# layer 2
b2 =8

w3 = 0.2
w4 = 0.1

In [23]:
# input
x = 200

### Layer 1

- We multiply the input (x) and the weight for each "cell". 
- Then we add up all of the cells and the bias
- We will feed this number to layer 2 cells.

In [24]:
layer_1 = (x * w1) + (x * w2) + b1
layer_1

207

### Layer 2

- Layer 2 takes Layer 1 as input.
- Like layer 1, it multiplies the input and the weight for each cell, then adds the bias.
- Since it is the last layer, its output is also the output of the network

In [25]:
layer_2 = (layer_1 * w3) + (layer_1 * w4) + b2
print("Output of the NN:", layer_2)

Output of the NN: 70.10000000000001


### Chained layers:

In [53]:
def my_neural_net(x):
    layer_1 = (x * w1) + (x * w2) + b1
    layer_2 = (layer_1 * w3) + (layer_1 * w4) + b2
    return layer_2

In [54]:
my_neural_net(200)

70.10000000000001

### Alternate form for layer 1

Here we show how what seemed like 2 parameters it's actually just one

In [5]:
layer_1_weight = w1 + w2
layer_1_weight

1

In [33]:
layer_1_output = x * layer_1_weight + b1
layer_1_output

207

### Alternate form for layer 2

Again, it's just one parameter

In [7]:
layer_2_weight = w3 + w4
layer_2_weight

0.30000000000000004

In [35]:
layer_2_output = (layer_1_output * layer_2_weight) + b2
layer_2_output

70.10000000000001

### Final equation

In [42]:
y = (x * layer_1_weight + b1) * layer_2_weight + b2
y

70.10000000000001

Which can be expressed like this:

In [43]:
y = (layer_2_weight * layer_1_weight) * x +
(layer_2_weight * b1 + b2 )
y

70.10000000000001

Let's brake down the parts of this equation:

In [44]:
final_weight = (layer_2_weight * layer_1_weight)
final_weight

0.30000000000000004

In [45]:
final_bias = (layer_2_weight * b1 + b2)
final_bias

10.100000000000001

In [46]:
y = final_weight * x + final_bias
y

70.10000000000001

So... in the end, the whole neural network was just a linear function! We need to make more transformations if we want the Neural Net to be able to match complex patterns.

(Back to slides)

## Activation functions

Here we define the RELU function

In [10]:
import numpy as np

def relu(X):
    return np.maximum(X, 0)

In [49]:
[relu(i) for i in range(-10, 11)]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Let's update the neural net:

In [56]:
def my_neural_net_2(x):
    layer_1 = relu(x * w1) + relu(x * w2) + b1
    layer_2 = relu(layer_1 * w3) + relu(layer_1 * w4) + b2
    return layer_2

In [57]:
my_neural_net_2(200)

130.10000000000002

We now have a "device" capable of matching complex patterns if we find the right parameters. But, how do we find them? So far, we chose random numbers! Well, that's exactly how neural nets start: with random parameters. 

The trick lyas in updating them over and over again so that the network behaves better and better every time.

To do that, we first need a way to measure how well the neural net is doing. That's what the Loss function does.

## Loss function

There are many loss functions (https://analyticsindiamag.com/loss-functions-in-deep-learning-an-overview/), and each one of them has its caveats. But in the end, they all compare a predicted value and a true value, and output how close of how far they are.

In [13]:
def mean_squared_error(pred, true):
    return (pred - true)**2

In [14]:
true_y = 420

In [58]:
# loss score
mean_squared_error(layer_2, true_y)

84042.00999999998

With the random weights, this was the error. Now let's update the weights.

## Weight update

The "learning rate" will define how big of a "jump" we take when updating the weights. A small learning rate will arrive to a precise result... with a lot of iterations. A big learning rate will arrive fast to a good result, but might miss the perfect values.

In [62]:
learning_rate = 0.2

#### Update 1

In [60]:
# layer 1
b1 = b1 + learning_rate

w1 = w1 + learning_rate
w2 = w2 + learning_rate

# layer 2
b2 = b2 + learning_rate

w3 = w3 + learning_rate
w4 = w4 + learning_rate

In [61]:
layer_1 = relu(x * w1) + relu(x * w2) + b1
layer_2 = relu(layer_1 * w3) + relu(layer_1 * w4) + b2
mean_squared_error(layer_2, true_y)

9753.537599999987

The error has decreased! We are going in the right direction. Let's update the weights again.

#### Update 2

In [63]:
# layer 1
b1 = b1 + learning_rate

w1 = w1 + learning_rate
w2 = w2 + learning_rate

# layer 2
b2 = b2 + learning_rate

w3 = w3 + learning_rate
w4 = w4 + learning_rate

In [64]:
layer_1 = relu(x * w1) + relu(x * w2) + b1
layer_2 = relu(layer_1 * w3) + relu(layer_1 * w4) + b2
mean_squared_error(layer_2, true_y)

15510.211600000019

Now the error increased. We go back to the parameters we had previously and stay with those. The learning process has ended.