Sigmoid function suffers from the problem of “vanishing gradients” as it flattens out at both ends, resulting in very small changes in the weights during backpropagation. This can make the neural network refuse to learn and get stuck. Due to this reason, usage of the sigmoid function is being replaced by other non-linear functions such as Rectified Linear Unit (ReLU).

## ReLU

An __activation function__ is a function which is applied to the output of a neural network layer, which is then passed as the input to the next layer. Activation functions are an essential part of neural networks as they provide non-linearity, without which the neural network reduces to a mere logistic regression model. The most widely used activation function is the __Rectified Linear Unit (ReLU)__.

- ReLU is defined as f(x) = max(0, x).


- Computationally faster: The ReLU is a highly simplified function which is easily computed.


- Fewer vanishing gradients: In machine learning, the update to a parameter is proportional to the partial derivative of the error function with respect to that parameters. If the gradient becomes extremely small, the updates will not be effective and the network might stop training at all. The ReLU does not saturate in the positive direction, whereas other activation functions like sigmoid and hyperbolic tangent saturate in both directions. Therefore, it has fewer vanishing gradients resulting in better training. 

In [1]:
# Importing the Tensorflow library 
import tensorflow as tf 
  
# A constant vector of size 6 
a = tf.constant([1.0, -0.5, 3.4, -2.1, 0.0, -6.5], dtype=tf.float32) 
  
# Applying the ReLu function and 
# storing the result in 'b' 
b = tf.nn.relu(a, name='ReLU') 
  
# Initiating a Tensorflow session 
with tf.Session() as sess: 
    print('Input type:', a) 
    print('Input:', sess.run(a)) 
    print('Return type:', b) 
    print('Output:', sess.run(b)) 


Input type: Tensor("Const:0", shape=(6,), dtype=float32)
Input: [ 1.  -0.5  3.4 -2.1  0.  -6.5]
Return type: Tensor("ReLU:0", shape=(6,), dtype=float32)
Output: [1.  0.  3.4 0.  0.  0. ]


## Leaky ReLU

The ReLU function suffers from what is called the “__dying ReLU__” problem. Since the slope of the ReLU function on the negative side is zero, a neuron stuck on that side is unlikely to recover from it. This causes the neuron to output zero for every input, thus rendering it useless. A solution to this problem is to use Leaky ReLU which has a small slope on the negative side.

In [4]:
# Importing the Tensorflow library 
import tensorflow as tf  
  
# A constant vector of size 6 
a = tf.constant([1.0, -0.5, 3.4, -2.1, 0.0, -6.5], dtype=tf.float32) 
  
# Applying the Leaky ReLu function with 
# slope 0.01 and storing the result in 'b' 
b = tf.nn.leaky_relu(a, alpha=0.01, name='Leaky_ReLU') 
  
# Initiating a Tensorflow session 
with tf.Session() as sess:  
    print('Input type:', a) 
    print('Input:', sess.run(a)) 
    print('Return type:', b) 
    print('Output:', sess.run(b)) 

Input type: Tensor("Const_3:0", shape=(6,), dtype=float32)
Input: [ 1.  -0.5  3.4 -2.1  0.  -6.5]
Return type: Tensor("Leaky_ReLU_2:0", shape=(6,), dtype=float32)
Output: [ 1.    -0.005  3.4   -0.021  0.    -0.065]
