# Activation Functions 


<br>
An activation function is a function used in artificial neural networks which outputs a small value for small inputs, and a larger value if its inputs exceed a threshold. If the inputs are large enough, the activation function "fires", otherwise it does nothing. In other words, an activation function is like a gate that checks that an incoming value is greater than a critical number.

Activation functions are useful because they add non-linearities into neural networks, allowing the neural networks to learn powerful operations. If the activation functions were to be removed from a feedforward neural network, the entire network could be re-factored to a simple linear operation or matrix transformation on its input, and it would no longer be capable of performing complex tasks such as image recognition.

Well-known activation functions used in data science include the rectified linear unit (ReLU) function, and the family of sigmoid functions such as the logistic sigmoid function, the hyperbolic tangent, and the arctangent function

    1. sigmoid function
    2. step funtion
    3. tanh function
    
    4. relu funtion
    5. leaky relu function
    
<br>
<br>

<div style="text-align:center"><img src='https://miro.medium.com/max/788/1*B3dckq_nbUlQruXA8qTSxg.png' style="background: #fff;"></div>

## 1. _step function_

Taking the concept of the activation function to first principles, a single neuron in a neural network followed by an activation function can behave as a logic gate.

Let us take the threshold step function as our activation function:

<img src='https://images.deepai.org/user-content/0431006986-thumb-6084.svg' style="background: #fff;"><br>

Mathematical definition of the threshold step function, one of the simplest possible activation functions

Binary step function is a threshold-based activation function which means after a certain threshold neuron is activated and below the said threshold neuron is deactivated. In the below graph, the threshold is zero. This activation function can be used in binary classifications as the name suggests, however it can not be used in a situation where you have multiple classes to deal with.


<br><img src='https://images.deepai.org/user-content/5853361052-thumb-1150.svg' style="background: #fff;">



The threshold step function may have been the first activation function, introduced by Frank Rosenblatt while he was modeling biological neurons in 1962.

And let us define a single layer neural network, also called a single layer perceptron, as:


<br>
<img src='https://images.deepai.org/user-content/6846233816-thumb-9076.svg' style="background: #fff;">
<br><img src='https://images.deepai.org/user-content/8031469793-thumb-4117.svg' style="background: #fff;">


## 2. _sigmoid function_

A Sigmoid function is a mathematical function which has a characteristic S-shaped curve. There are a number of common sigmoid functions, such as the logistic function, the hyperbolic tangent, and the arctangent.<br>

In machine learning, the term _sigmoid function_ is normally used to refer specifically to the logistic function, also called the logistic sigmoid function.
All sigmoid functions have the property that they map the entire number line into a small range such as between 0 and 1, or -1 and 1, so one use of a sigmoid function is to convert a real value into one that can be interpreted as a probability.

<br><img src='https://images.deepai.org/user-content/9279272907-thumb-1675.svg' style="background: #fff;">

Graph showing the characteristic S-shape of the logistic sigmoid function
<br><br>

<img src="https://images.deepai.org/user-content/1375463140-thumb-9221.svg" style="background: #fff;">

<br>
<br>


In [1]:
import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

In [2]:
num = [0.0005, 0.1, 0.5, 0.8, 1, 500, 1000, 10000]

for i in num:
    print(i, " : ", sigmoid(i))

0.0005  :  0.5001249999973958
0.1  :  0.52497918747894
0.5  :  0.6224593312018546
0.8  :  0.6899744811276125
1  :  0.7310585786300049
500  :  1.0
1000  :  1.0
10000  :  1.0


<br>
 However, Sigmoid function makes almost no change in the prediction for very high or very low inputs which ultimately results in neural network refusing to learn further, this problem is known as the vanishing gradient.
 

<br>

## 3. _tanh function_

Another common sigmoid function is the hyperbolic function. This maps any real-valued input to the range between -1 and 1.

<br>

<img src="https://images.deepai.org/user-content/1676244843-thumb-4292.svg" style="background: #fff;">
<br>

<img src="https://i0.wp.com/www.arshad-kazi.com/wp-content/uploads/2021/03/tanh.jpg?fit=512%2C284&ssl=1" style="background: #fff;">

<br>

Mathematical definition of the hyperbolic tangent

In [3]:
def tanh(x):
    return ((math.exp(x)-math.exp(-x)) / (math.exp(x) + math.exp(-x)))

In [4]:
num = [-500, -100, -50, -1, 0, 5, 100, 500]

for i in num:
    print(i, " : ", tanh(i))

-500  :  -1.0
-100  :  -1.0
-50  :  -1.0
-1  :  -0.7615941559557649
0  :  0.0
5  :  0.999909204262595
100  :  1.0
500  :  1.0



<br>
However, tanh also comes with the vanishing gradient problem just like sigmoid function.

<br>

## 4. _relu(rectified linear activation function) function_

There are a number of widely used activation functions in deep learning today. One of the simplest is the rectified linear unit, or ReLU function, which is a piecewise linear function that outputs zero if its input is negative, and directly outputs the input otherwise:

<img src="https://images.deepai.org/user-content/1128419011-thumb-5497.svg" style="background: #fff;">

Mathematical definition of the ReLU Function

<img src="https://images.deepai.org/user-content/4015736703-thumb-4932.svg" style="background: #fff;">

<br>
Graph of the ReLU function, showing its flat gradient for negative x.

In [5]:
def relu(x):
    return max(0,x)

In [6]:
for i in num:
    print(i, " : ", relu(i))

-500  :  0
-100  :  0
-50  :  0
-1  :  0
0  :  0
5  :  5
100  :  100
500  :  500


<br>

when the input is zero or a negative value, the function outputs zero and it hinders with the back-propagation. This problem is known as the dying ReLU problem.

<br>

## 5. _leaky relu function_

Leaky ReLU prevents the dying ReLU problem and enable back-propagation. One flaw of Leaky ReLU is the slope being predetermined rather than letting the neural network figure it out.


<img src="https://www.i2tutorials.com/wp-content/media/2019/09/Deep-learning-25-i2tutorials.png" style="background: #fff;">

In [7]:
def leaky_relu(x):
    return max(0.1*x, x)

In [8]:
for i in num:
    print(i, " : ", leaky_relu(i))

-500  :  -50.0
-100  :  -10.0
-50  :  -5.0
-1  :  -0.1
0  :  0.0
5  :  5
100  :  100
500  :  500
