# Activation Functions 
---

## Why they are needed?

- If a neuron's output is not bounded, it cannot decide whether it should "fire" or not. (Brains work by firing some neurons)

- Activation functions check whether the weighted sum computed by the neuron is enough to make the neuron available to preceding layers.

- Without the use of activation functions, the neurons would always compute a **linear combination** of the inputs, no matter how many hidden layers are there. It will essentially become like linear regression.

- It is used to introduce a "little non-linearity" into the network. It makes the network more powerful.

## Different Activation functions

### <ins>Binary Step Function</ins> -
![step function](https://www.saedsayad.com/images/ANN_Unit_step.png)
- Same as declaring a threshold value. If neuron output is greater than the threshold than activate the neuron, otherwise not.
- Can work for a *binary classifier*.

**Disadvantage** 

- Since the gradient of the step function is zero, in the backpropagation process it causes difficulty in learning process, i.e, the weights and biases will not be updated.
- A neuron can be activated or not activated, there are no intermediate value.

### <ins>Linear Function</ins> -
![Linear function](https://www.saedsayad.com/images/LinearFunction.png)

- Activation is proportional to the inputs.
- Derivative exists. Equal to the coefficient of the input.

**Disadvantage** 

- Although the gradient is not zero, it is a constant value. It does not depend upon the input. Therefore, the weights and biases will be updated during the learning process but with the same factor 
- The gradient will be same for every iteration, so complex patterns from data cannot be captured.
- Same problem, as it is a linear function, the output will also be a linear function.

### <ins>Sigmoid Function</ins> -
![Sigmoid](https://www.saedsayad.com/images/ANN_Sigmoid.png)

- One of the most popular choice.
- It is a non-linear function. Therefore, outputs of the neurons will be non-linear.
- It is continuously differentiable. 

**Disadvantage** 

- The gradient value changes from -3 to +3, but flattens out in other regions, which implies that the gradient is close to zero and the network is not learning. (Vanishing gradient problem)
- Function is not symmetric around zero. The output of all the neurons will be of same sign.

### <ins>Tanh Function</ins> -
![tanh function](https://cdn-images-1.medium.com/freeze/max/1000/1*1It8846pzYayiC0G_7FIBA.png?q=20)

- Preferred over sigmoid function.
- Similar to sigmoid, but it is symmetric around origin.
- Ranges from -1 to +1.
- Continuously differentiable. The gradients are steeper than sigmoid function.

**Disadvantage** 

- Same vanishing gradient problem as sigmoid function.

### <ins>ReLU Function</ins> -
![ReLU](https://cdn-images-1.medium.com/freeze/max/1000/1*TbZnkZYI5vwOQGUBd6uXAQ.png?q=20)
- Non-linear function. Stands for the **Rectified Linear Unit**.
- Advantage of ReLU over other functions is that at one time only some of the neurons will be activated making it efficient computationally. The network also becomes lighter.

**Disadvantage** 
- The gradient for the negative side is zero, so it could create dead neurons.

### <ins>Leaky ReLU Function</ins> -
![leaky relu](https://www.i2tutorials.com/wp-content/uploads/2019/09/Deep-learning-25-i2tutorials.png)
- Improved version of ReLU function.
- Since the gradient of negative inputs becomes zero and causes dead neurons, we give a small linear component of x to the output.
- No dead neurons can form because of the small linear component.

### <ins>Parameterised ReLU Function</ins> -
![parametric relu](https://linzhouhan.files.wordpress.com/2015/04/prelu.png)

- Another variant of ReLU function.
- A new parameter acts as the slope of the negative part of the function.
- Here, the parameter becomes trainable.

Image Courtesy -

- https://www.saedsayad.com/artificial_neural_network.htm
- https://mc.ai/activation-functions-in-neural-networks/
- https://www.i2tutorials.com/explain-step-threshold-and-leaky-relu-activation-functions/
- https://linzhouhan.files.wordpress.com/2015/04/prelu.png