# **Non-linear Activation Functions & their Gradients**

>[Non-linear Activation Functions & their Gradients](#updateTitle=true&folderId=1PRnBShZKV1SGfqZW-SSjl2iu90JnbgeN&scrollTo=4kpKFnHgKTLp)

>>[Sigmoid Activation Function & its Derivative](#updateTitle=true&folderId=1PRnBShZKV1SGfqZW-SSjl2iu90JnbgeN&scrollTo=1Hhwrm2LKaxX)

>>[Hyperbolic Tangent (Tanh) Activation Function & its Derivative](#updateTitle=true&folderId=1PRnBShZKV1SGfqZW-SSjl2iu90JnbgeN&scrollTo=m_8v9oSTtaLe)

>>[Rectified Linear Unit (ReLU) Activation Function & its Derivative](#updateTitle=true&folderId=1PRnBShZKV1SGfqZW-SSjl2iu90JnbgeN&scrollTo=W3yOT2u7K0ZH)

>>[Leaky - Rectified Linear Unit (ReLU) Activation Function & its Derivative](#updateTitle=true&folderId=1PRnBShZKV1SGfqZW-SSjl2iu90JnbgeN&scrollTo=Hp6cTgEGLGGN)



<img src = https://miro.medium.com/max/786/1*R2e5_A-hD3VB3B1wv3qaQA.webp width = "400" height = "300" >
<img src = https://upscfever.com/upsc-fever/en/data/deeplearning/images/relu-leaky-relu-derivative.png width = "500" height = "250" >

## **Sigmoid Activation Function & its Derivative**
Sigmoid Activation function takes a real value as input and gives probability output between 0 or 1. It looks like ‘S’ shape. It’s non-linear, differentiable, monotonic, and has a fixed output range from 0 to 1.
The equation of sigmoid function is:
$$\sigma(x) = \frac {1}{1 + e^{(-x)}} $$

Derivative is: 

$$\sigma'(x) = \sigma(x)(1 - \sigma(x)) $$


In [1]:
import numpy as np

In [2]:
# sigmoid function
def sigmoid(x):
	return (np.exp(x)) / (np.exp(x) + 1)
# Derivative of Sigmoid Activation Function
def sigmoid_prime(x):
	return sigmoid(x)*(1-sigmoid(x))

In [3]:
sigmoid(-10)

4.5397868702434395e-05

In [4]:
sigmoid_prime(-10)

4.5395807735951673e-05

In [5]:
sigmoid_prime(-1)

0.19661193324148185

In [6]:
sigmoid_prime(0)

0.25

In [7]:
sigmoid_prime(1)

0.19661193324148185

In [8]:
sigmoid_prime(10)

4.5395807735907655e-05

Sigmoid derivatives are always positive. So Sigmoid function is monotonic.
![Sigmoid function & its derivative](https://i.stack.imgur.com/QdlcW.jpg)

## **Hyperbolic Tangent (Tanh) Activation Function & its Derivative**

The tanh function, also known as the hyperbolic tangent function, is a non-linear function that takes in a real number and returns a value between -1 and 1. Since the derivative of the function can be calculated, this function is differentiable. Hence, tanh is a non-linear differentiable function as mathematically shown below:
$$\tanh(x) = \frac {e^{(x)} - e^{(-x)}}{e^{(x)} + e^{(-x)}} $$

Derivative is: 

$$\tanh'(x) = 1 - (tanh(x))^2) $$


In [9]:
# tanh activation function
def tanh(x):
	return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))
# Derivative of Tanh Activation Function
def tanh_derivative(x):
	return 1 - np.power(tanh(x), 2)

In [10]:
tanh_derivative(-10)

8.244614768671e-09

In [11]:
tanh_derivative(-1)

0.41997434161402614

In [12]:
tanh_derivative(0)

1.0

In [13]:
tanh_derivative(1)

0.41997434161402614

In [14]:
tanh_derivative(10)

8.244614768671e-09

Tanh derivatives are always positive. So hyperbolic tangent function is monotonic.
![Tanh function and its derivative](https://www.researchgate.net/profile/Rajan-Chaudhari-4/publication/341172494/figure/fig1/AS:888014863601664@1588730650039/Activation-function-based-on-the-Tanh-function-tanhx-and-its-derivative-tanhx.png)


## **Rectified Linear Unit (ReLU) Activation Function & its Derivative**
This is most popular activation function which is used in hidden layer of neural networks. The rectifier function has a graph that's a line with the negative part "rectified" to zero. A neuron with a rectifier attached is called a rectified linear unit.  The formula is: 𝑚𝑎𝑥 (0, 𝑧 = w*x+b). 

In [15]:
# ReLU activation function
def relu(𝑧):
  return max(0, 𝑧)
# Derivative of ReLU Activation Function
def relu_derivative(𝑧):
  return 1 if 𝑧 > 0 else 0

In [16]:
relu(-10)

0

In [17]:
relu(-1)

0

In [18]:
relu(1)

1

In [19]:
relu(10)

10

In [20]:
relu_derivative(-10)

0

In [21]:
relu_derivative(-1)

0

In [22]:
relu_derivative(1)

1

In [23]:
relu_derivative(10)

1

Relu derivatives are always positive. So Rectified Linear Unit (ReLU) Activation function is monotonic.
<img src = https://sebastianraschka.com/images/faq/relu-derivative/relu_3.png width = "300" height = "300" >
<img src = https://i.stack.imgur.com/UtuWP.png width = "300" height = "300" >


## **Leaky - Rectified Linear Unit (ReLU) Activation Function & its Derivative**
One disadvantage of the ReLU is that the derivative is equal to zero, when z is negative. To overcome this, z values are multiplied by alpha in Leaky ReLU to allow for the pass of negative values. Usually, the value of a is 0.01 and hence the range of the Leaky ReLU is -$\alpha$ to +$\alpha$

In [24]:
# Leaky_ReLU activation function
def leakyrelu(z, alpha):
	return max(alpha * z, z)
# Derivative of leaky_ReLU Activation Function
def leakyrelu_derivative(z, alpha):
	return 1 if z > 0 else alpha

In [25]:
leakyrelu(-10, 0.01)

-0.1

In [26]:
leakyrelu(-1, 0.01)

-0.01

In [27]:
leakyrelu(1, 0.01)

1

In [28]:
leakyrelu(10, 0.01)

10

In [29]:
leakyrelu_derivative(-10, 0.01)

0.01

In [30]:
leakyrelu_derivative(-1, 0.01)

0.01

In [31]:
leakyrelu_derivative(0, 0.01)

0.01

In [32]:
leakyrelu_derivative(1, 0.01)

1

In [33]:
leakyrelu_derivative(10, 0.01)

1

Leaky ReLu derivatives are always positive. So Leaky ReLu Activation function is monotonic.
![Leaky-ReLU & its Derivative](https://miro.medium.com/max/786/1*W6WncFaj5dBvqyrFw-jOdg.webp)


