#### About

> Activation functions

Activation functions are an important part of deep neural networks because they introduce non-linearity into the model, allowing it to learn complex and non-linear data patterns. Here we discuss some of the more commonly used activation functions, their mathematical formulations, and provide code examples for each function.


1. Sigmoid Activation Function

sigmoid(x) = 1 / (1 + exp(-x))


It maps input values ​​in the range 0 to 1, making it suitable for binary classification problems. However, it suffers from the vanishing gradient problem, where the gradient can become very small for large input values, resulting in slow convergence during training. 



In [6]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.array([-2, -1, 0, 1, 2])
print(sigmoid(x))


[0.11920292 0.26894142 0.5        0.73105858 0.88079708]


2. ReLU (Rectified Linear Unit) Activation Function

ReLU(x) = max(0, x)

It sets all negative input values ​​to zero while sending positive input values ​​unchanged. ReLU is a popular choice of activation function due to its computational efficiency and ability to alleviate the vanishing gradient problem.


In [7]:
import numpy as np

def relu(x):
    return np.maximum(0, x)

x = np.array([-2, -1, 0, 1, 2])
print(relu(x))


[0 0 0 1 2]


3. Leaky ReLU (LReLU) activation function

The Leaky ReLU activation function is a variant of the ReLU function that addresses the "dying ReLU" problem, where some ReLU neurons may become inactive during training and never recover.

LReLU(x) = max(α * x, x), where α is a small positive constant (typically around 0.01)


It introduces a slight slope for negative input values, allowing some gradient flow even for negative inputs.

In [8]:
import numpy as np

def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha * x, x)

x = np.array([-2, -1, 0, 1, 2])
print(leaky_relu(x))


[-0.02 -0.01  0.    1.    2.  ]


4. Softmax activation function

The Softmax activation function is often used in multi-class classification problems because it produces a probability distribution over multiple classes.

softmax(x_i) = exp(x_i) / sum(exp(x_j)), for all i {1, 2, ..., K}



where K is the number of categories and x_i is the input value of the ith category. This ensures that the output values ​​are 1, making it suitable for multi-class classification problems.

In [9]:
import numpy as np

def softmax(x):
    exp_vals = np.exp(x)
    return exp_vals / np.sum(exp_vals)

x = np.array([1, 2, 3, 4, 5])
print(softmax(x))


[0.01165623 0.03168492 0.08612854 0.23412166 0.63640865]


5. Parametric ReLU (PReLU) activation function

The activation function of Parametric ReLU (PReLU) is similar to Leaky ReLU, but instead of using a fixed slope for negative inputs, it allows the slope to be learned during training. The PReLU function is defined by the following mathematical formula:


PReLU(x) = max(α * x, x), where α is a learnable parameter


This makes the PReLU function more flexible than the Leaky ReLU function because it can adjust the negative input slope according to the data. 

In [10]:
import numpy as np

def prelu(x, alpha):
    return np.maximum(alpha * x, x)


x = np.array([-2, -1, 0, 1, 2])
alpha = 0.01  
print(prelu(x, alpha))


[-0.02 -0.01  0.    1.    2.  ]


6. GELU (Gaussian Error Linear Unit) activation function

The GELU activation function is a smooth approximation of the rectifier function designed to combine the best properties of the ReLU and sigmoid functions. The GELU function is defined by the following mathematical formula:

GELU(x) = 0.5 * x * (1 + tanh(sqrt(2 / π) * (x + 0.044715 * x^3)))


The GELU function is smooth and differentiable and does not have the gradient vanishing problem like the sigmoid function.

In [11]:
import numpy as np

def gelu(x):
    return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))

x = np.array([-2, -1, 0, 1, 2])
print(gelu(x))


[-0.04540231 -0.15880801  0.          0.84119199  1.95459769]


7. ELU (Exponential Linear Unit) activation function

The ELU activation function is another smooth approximation of the rectifier function designed to mitigate the vanishing gradient problem.

ELU(x) = x, if x >= 0
       = α * (exp(x) - 1), if x < 0, where α is a positive constant


The ELU function implements a zero output for negative inputs, providing better gradient flow during training.

In [12]:
import numpy as np

def elu(x, alpha=1.0):
    return np.where(x >= 0, x, alpha * (np.exp(x) - 1))

x = np.array([-2, -1, 0, 1, 2])
print(elu(x))


[-0.86466472 -0.63212056  0.          1.          2.        ]
