## Chapter 4: Activation Functions

---

> We use different activation functions for different cases, so understanding how they work can help us properly pick which of them is best for a task.

> The **activation function** is applied to the output of a neuron (or layer of neurons), which modifies outputs.

> If an activation function itself is nonlinear, it allows for neural networks with multiple hidden layers to map nonlinear functions.

> In general, neural networks will have two types of activation functions. Those used in hidden layers and those used in the output layer. Usually, the activation function used for the hidden neurons will be the same for all of them, but this doesn't always have to be the case.


---------

### The Step Activation Function

> Recall the purpose this activation function serves is to mimic a neuron's "firing" or "not firing" based on input information. The simplest version of this is the **step function**. In a single neuron, if the `weights * inputs + bias` is greater than 0, the neuron fires, otherwise it does not.

Summary:

    - Historically used in hidden layers.
    - Rarely chosen today.

<img src="images/step_function.png" width="550"/>


-----

### The Linear Activation Function

> A **linear function** is simply the equation of a line. It is defined as `f(x) = x` or `y=x`.

Summary:

    - Usually applied to the last layer's output in a regression model.


> Recall a regression model outputs a scalar value instead of a classification.

<img src="images/linear_function.webp" width="550"/>


---------

### The Sigmoid Activation Function



> When it comes to optimizing weights and biases, it's often easier for the optimizer when we have activation functions that are more granular and informative. The original, more granular, activation function used for neural networks was the **Sigmoid** activation function.

<img src="images/sigmoid_function.jpg" width="550"/>


> The function returns a value in the range of 0 for negative infinity, through 0.5 for the input of zero, and to 1 for positive infinity.

> The output of the Sigmoid function, being in the rangeof 0 to 1, make for better use in neural networks. Especially compared to the range of the negative to positive infinity. This adds nonlinearity to the model.

Summary:

    - Historically used in hidden layers.
    - Eventually replaced by the **Rectified Linear Unit (ReLU)** activation function.

---------

### The Rectfied Linear Activation Function

> The rectified linear activation function is simpler than the sigmoid function. It is simply `y=x`, clipped at zero from the negative side. So if x is less than zero, y is zero, otherwise y is equal to x.

> This function can also be calculated as `max(0, x)`.

<img src="images/relu_function.png" width="550"/>

> The ReLU function is the most commonly used activation function in neural networks today. This is mainly due to speed and efficiency.

> The ReLU activation function is quite close to being linear, however it remains nonlinear. This is due to the bend after zero, a simple yet very effective property.

--------

### Why Use Activation Functions?

> In most cases, in order for a neural network to fit a nonlinear function, it must contain two or more hidden layers which use a nonlinear activation function.

-----

### ReLU Activation Function Code


In [1]:
inputs = [0, 2, -1, 3.3, -2.7, 1.1, 2.2, -100]

# Method 1: Using cases --
output = []
for i in inputs:
    if i > 0:
        output.append(i)
    else:
        output.append(0)

print(output)

# Method 2: Using max function --
output = []
for i in inputs:
    output.append(max(0, i))

print(output)

# Method 3: Using numpy maximum function --
import numpy as np

output = np.maximum(0, inputs)

print(output)

[0, 2, 0, 3.3, 0, 1.1, 2.2, 0]
[0, 2, 0, 3.3, 0, 1.1, 2.2, 0]
[0.  2.  0.  3.3 0.  1.1 2.2 0. ]


In [2]:
# Rectified linear activation class:
class Activation_ReLU:
    
    def forward(self, inputs):
        self.output = np.maximum(0, inputs)

In [4]:
# Apply this activation function to the dense layer's output in our code:
import nnfs
from nnfs.datasets import spiral_data
from code.layers import Layer_Dense

nnfs.init()

# Create dataset with spiral data:
X, y = spiral_data(samples=100, classes=3)

# Create Dense layer with 2 input features and 3 output values:
dense1 = Layer_Dense(2, 3)

# Create ReLU activation (to be used with Dense layer):
activation1 = Activation_ReLU()

# Make a forward pass of our training data through this layer:
dense1.forward(X)

# Forward pass through activation function (takes in output from previous layer):
activation1.forward(dense1.output)

# Let's see output of the first few samples:
print(activation1.output[:5])

[[0.         0.         0.        ]
 [0.         0.00011395 0.        ]
 [0.         0.00031729 0.        ]
 [0.         0.00052666 0.        ]
 [0.         0.00071401 0.        ]]
