In [7]:
import numpy as np

import plotly.express as px

In [8]:
x = np.arange(-10, 10, 0.1)

# ReLU
A recent invention which stands for Rectified Linear Units. The formula is deceptively simple: max(0,z). Despite its name and appearance, it’s not linear and provides the same benefits as Sigmoid but with better performance.

Pros

1. It avoids and rectifies vanishing gradient problem.
1. ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations.

Cons

1. One of its limitation is that it should only be used within Hidden layers of a Neural Network Model.
1. Some gradients can be fragile during training and can die. It can cause a weight update which will makes it never activate on any data point again. Simply saying that ReLu could result in Dead Neurons.
1. In another words, For activations in the region (x<0) of ReLu, gradient will be 0 because of which the weights will not get adjusted during descent. That means, those neurons which go into that state will stop responding to variations in error/ input ( simply because gradient is 0, nothing changes ). This is called dying ReLu problem.
1. The range of ReLu is [0, inf). This means it can blow up the activation.

$$
ReLU(x) = \left\{
    \begin{array}\\
        x & \mbox{if } \ x >= 0 \\
        0 & \mbox{if } \ x < 0 \\
    \end{array}
\right.
$$

In [9]:
def relu(x):
  return x if x >= 0 else 0

In [10]:
y = [relu(val) for val in x]

In [11]:
px.line(x=x, y=y)

# Derivate of ReLU

$$
\frac{\partial ReLU(x)}{\partial x} = \left\{
  \begin{array}\\
    1 & \mbox{if } \ x >= 0 \\
    0 & \mbox{if } \ x < 0 \\
  \end{array}
\right.
$$

In [12]:
def derivate_relu(x):
  return 1 if relu(x) >= 0 else 0

In [13]:
y = [derivate_relu(val) for val in x]

In [14]:
px.line(x=x, y=y)

In [15]:
x = np.arange(-5, 5, 0.1)
y = [relu(val) for val in x]
dy = [derivate_relu(val) for val in x]

In [16]:
fig = px.line(x=x, y=y)
fig.add_scatter(x=x, y=dy, mode='lines')
fig.show()