## Activation Functions
Previously we were introduced to a step cost function, in this notebook I'll talk about **Step**, **Sigmoid**, **Softmax** and **Relu** activation functions, they decide how active a perceptron is.

![IMG_0672](https://user-images.githubusercontent.com/57009004/138087476-42259db4-8828-4ee8-8f8b-66c9071b7a97.jpg)



## Step Function

* The perceptron outputs a 1 or a 0 ( active not active ).
* Used for binary classification.
* The function is discrete.

#### In Math:
$$f(x_i)= \begin{cases}
    1& \text{if } x_i \geq 0\\
    0& \text{otherwise} 
\end{cases}$$ 

#### In Code:

In [1]:
def step_function(x):
    if x >= 0: return 1
    else: return 0

## Sigmoid Logistic
Sigmoid function refers to an S-shaped curve, it can be \[ Logistic, Hyperbolic, Archtangent\], Logistic function is the most common.
* The inputs should be normalized at ± 4 because all values will be squashed between 0 & 1.
* Used when we have 2 classes ( Cat, dog ).
* Subject to vanishing gradient ( Bad thing ).
* The function is continuous.
* Perceptron outputs a value between 0 & 1 ( strength of activity )

#### In Math:
$$f(x_i)= \frac{1}{(1 + e^{-x_i})}$$ 

#### In Code:

In [2]:
import numpy as np

def sigmoid(x):
    return 1/ ( 1 + np.exp(-x) )

## Softmax
* The inputs need to be normalized.
* used for multiple classes ( Cat, dog, bird ).
* For each class the output will be of range 0 & 1 , but all add up to 1.
* The function is continuous.

#### In Math
$$f(x_i)=\frac{e^{x_i}}{\sum_{j=0}^{k} e^{x_j}}, \text{where k is number of classes}$$
#### In Code:

In [3]:
def softmax(x):
    expX = np.exp(x)
    return np.divide( expX, expX.sum() ) 

## ReLU
* Inputs do not need to be normalized.
* Solves vanishing gradients but subject to dying ReLU ( Bad thing ).
* Fast to compute & favored when network has many layers.
* Perceptron outpts is a range between 0 & x
* The function is continuous.

#### In Math
$$f(x_i) = \text{max( } 0, x\text{ )}$$
#### In Code

In [4]:
def ReLU(x):
    return np.maximum(x, 0)