# Activation Functions

Artificial neural networks are the backbone of deep learning, machine learning, and artificial intelligence. These networks consist of a series of interconnected nodes called neurons. Each neuron has a weight, bias, and an activation function. Activation functions play a crucial role in determining the output of a neuron and eventually the entire network. In this report, we will dive deep into the most commonly used activation functions - Step, Sigmoid, ReLU, SELU, ELU, and Tanh.

## Step Function:

The step function is the simplest activation function. It takes an input value and returns either 0 or 1. If the input is greater than or equal to zero, it returns 1, otherwise 0. The step function is useful for binary classification problems where we want to classify an input as either a 0 or 1.

However, the step function has some limitations. It is not differentiable, which makes it unsuitable for use in gradient-based optimization algorithms like backpropagation. Additionally, the output is not smooth, and it has a constant value for a range of inputs.

## Sigmoid Function:

The sigmoid function is a popular activation function used in neural networks. It maps any input value to a value between 0 and 1. The output of the sigmoid function represents the probability of the input belonging to the positive class. The sigmoid function has a smooth derivative, which makes it suitable for use in gradient-based optimization algorithms like backpropagation.

However, the sigmoid function has some limitations. When the input is too large or too small, the gradient of the sigmoid function becomes close to zero, which results in the vanishing gradient problem. This problem makes it challenging to train deep neural networks using the sigmoid function.

## ReLU Function:

The Rectified Linear Unit (ReLU) function is a popular activation function used in deep learning. It maps any input value to a value between 0 and infinity. If the input is less than or equal to zero, the output is zero, and if the input is greater than zero, the output is equal to the input. The ReLU function has a smooth derivative, which makes it suitable for use in gradient-based optimization algorithms like backpropagation.


The ReLU function has some advantages over other activation functions. It has a non-linear output, which helps in modeling complex relationships between inputs and outputs. Additionally, it is computationally efficient, which makes it suitable for use in large-scale neural networks.


However, the ReLU function has some limitations. When the input is negative, the output is zero, which results in the dying ReLU problem. This problem makes it challenging to train deep neural networks using the ReLU function.

## SELU Function:

The Scaled Exponential Linear Unit (SELU) function is a variation of the ReLU function. The SELU function has a smooth derivative. 

The SELU function has some advantages over other activation functions. It has a non-linear output, which helps in modeling complex relationships between inputs and outputs. Additionally, it is self-normalizing, which means that the output of the SELU function has zero mean and unit variance. This property helps in reducing the vanishing gradient and exploding gradient problems.


However, the SELU function has some limitations. It requires the input data to be normalized, which can be challenging in some scenarios. Additionally, it is computationally expensive, which makes it unsuitable for use in large-scale neural networks.

## ELU Function:

The Exponential Linear Unit (ELU) function is a variation of the ReLU function. It maps any input value to a value between negative infinity and infinity. If the input is less than or equal to zero, the output is a negative exponential function of the input, and if the input is greater than zero, the output is equal to the input.

The ELU function has some advantages over other activation functions. It has a non-linear output, which helps in modeling complex relationships between inputs and outputs. Additionally, it has a smoother transition at zero than the ReLU function, which helps in reducing the dying ReLU problem.

However, the ELU function has some limitations. When the input is negative, the output is a negative exponential function of the input, which can be computationally expensive.

## Tanh Function:

The Hyperbolic Tangent (Tanh) function is a popular activation function used in neural networks. It maps any input value to a value between -1 and 1. The output of the tanh function represents the probability of the input belonging to the positive class.