<img src="../imgs/CampQMIND_banner.png">

# Activation Functions
Author: Connor Winters

## Introduction <a name="introduction"></a>

Prior to diving into the specifics of activation functions, I will provide a quick refresher on some of the critical concepts of neural networks essential for understanding the role of activation functions.

Neural Networks are a type of computational model inspired by the human brain used to compute an output in response to a series of inputs.

A standard Neural Network architecture contains a series of layers, each containing individual nodes. Each node receives signals through the connections from the preceding layer as well as a bias unit. Each connection is assigned a weight used to influence a given connection on a particular node. 

The value at any given node is computed as the weighted sum of the incoming connections and their respective weights plus the bias unit.

### $${Y} = \sum{(weight * input) + bias}$$

The above representation in itself is not sufficient to represent complex relationships, but, with the addition of activation functions, neural networks become much more powerful.

Similar to neurons firing in the human brain, activation function serves as a threshold to determine whether a signal is important enough to pass on to the neurons in subsequent layers. The activation function will determine if the incoming signal warrants the node firing and at what strength. Activation functions are a crucial factor in a neural network's ability to represent complex or non-linear relationships.

Some of the most common activation functions include the step function, linear function, sigmoid function, ReLu function and softmax function. Throughout this notebook, we will look at each of these functions; however, it is worth noting that there is an extensive list of alternative functions. Many of these functions will operate very similarly and may be worth looking into depending on the purpose and type of model used.


## Table of Contents

* 1. [Introduction](#introduction)
* 2. [Linear Function](#linfunc)
* 3. [Step Function](#stepfunc)
* 4. [Sigmoid Function](#sigmoid)
* 5. [Softmax](#softmax)
* 6. [Code Implementation](#code)


## Linear Function <a name="linfunc"></a>

![image.png](attachment:image.png)

The Linear activation is represented by: 
### $$y=mx$$ 
The linear activation function simply maps the input directly to the same output and is akin to not having an activation function. As a result, the linear activation function is very limited in its uses. A neural network comprised solely of linear activation functions is a linear regression model and will not be able to represent non-linear relationships regardless of how many layers and nodes are used. However, there are certain circumstances in which the linear activation function can be useful. One of the most common applications is for regression problems. Since the out values are unconstrained (meaning they can range from $-\infty$ to $\infty$), it can be very useful as the activation function for the output layer.

## Step-Function <a name="stepfunc"></a>

![image.png](attachment:image.png)

The step-function is a simple binary representation of an activation with discrete output values (typically 0 or 1). If the threshold is met, the node is activated; if not, it is not activated. The step function is generally only used in the output layers for a binary classification problem and is not recommended to be used in any hidden layers. A few reasons why it is not recommended to use in hidden layers include the fact the derivate is always zero (Except for at the threshold where it does not exist), posing a major challenge for training your neural network with gradient descent.

## Sigmoid Function <a name="sigmoid"></a>

![image.png](attachment:image.png)


Similar to the step-function, the sigmoid function takes a real value as input and outputs another value between 0 and 1. However, the sigmoid function is continuously differentiable, making it one of today's most popular activation functions.
The sigmoid function is calculcated as:
### $${Y} = \frac{1}{1 + e^{-z}}$$

##  Rectified Linear Unit (ReLU) <a name="relu"></a>

![Picture1.png](attachment:Picture1.png)

The ReLU function is calculcated as:
### $${Y} = max{(0, x)}$$

The rectified linear unit (ReLU) activation function has become the default activation function for many as it is a very efficient function, useful for many types of networks. ReLU solves a few common problems with the sigmoid function, including large values snapping to 1.0 and only really being sensitive to changes around the mid-point of 0.5.

The ReLU function can also accelerate the training speed of deep neural networks compared to traditional activation functions since the derivative of ReLu is 1 for positive input. Due to a constant derivative, deep neural networks do not need additional time for computing error terms during the training phase. 

##  Softmax <a name="softmax"></a>

The final activation function we will cover is the softmax activation function. Softmax is only used on the last layer of the network and outputs probabilities that sum to one. This makes it very useful for classification problems. The softmax is calculated as:
### $${Z(y_i)} = \frac{e^{y_i}}{\sum_{j=1}^{j}{y_i}}$$

## Code Implementation <a name="code"></a>

### Linear Function
Since inputs are mapped one-to-one there is no need to use an additional code to implement a linear activation function from skratch

In [1]:
import matplotlib.pyplot as plt
import numpy as np

In [3]:
def linear(x):
    return x

### Step-Function 

In [25]:
def stepFunction(x):
    y = x > 0 
    return y.astype(int)

### Sigmoid Function 

In [28]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

### ReLU

In [29]:
def relu(x):
    return np.maximum(0,x)

### Softmax

In [44]:
def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# Resources
- https://www.youtube.com/watch?v=s-V7gKrsels&ab_channel=CodeEmporium