In [1]:
import warnings
warnings.filterwarnings("ignore",category = FutureWarning)

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Activation Function

An activation function is a mathematical function used in neural networks to introduce non-linearity into the model. This non-linearity is crucial because it allows the neural network to learn and model complex patterns in the data. Without an activation function, a neural network would be just a linear model, regardless of the number of layers it has.

## Sigmoid Activation Function

**Formula:**

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

**Range:**

$$
\sigma(x) \in (0, 1)
$$


In [None]:
# sigoid Activation Function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

x = np.linspace(-10, 10, 400) 
y = sigmoid(x) 

sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot sigmoid function
sns.lineplot(x=x, y=y, ax=ax, color='green', label='Sigmoid Function ')
plt.title('Sigmoid Activation Function')
plt.legend()


# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

plt.tight_layout()
plt.show()


**Uses**:

Sigmoid Activation Function is mainly used for Binary Classsification



 **Problems**:

Vanishing Gradient: For large positive or negative values of x, the gradient becomes very small, which can slow down learning.

Not Zero-Centered: Output values are always positive, which can lead to inefficient gradient updates.

Computationally Expensive: Requires the computation of the exponential function, which can be less efficient.

## Tanh Function



**Formula:**

$$
\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$

**Range:**

$$
\tanh(x) \in (-1, 1)
$$


In [None]:
def tanh(x):
    return np.tanh(x)

# Generate input data
y = tanh(x)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot tanh function
sns.lineplot(x=x, y=y, ax=ax, color='green', label='Tanh Function')
plt.title('Tanh Activation Function')
plt.legend()


# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

fig.tight_layout()
plt.show()


**Uses:**

Used in hidden layers of neural networks as it maps values between -1 and 1, making the output zero-centered.
Useful for regression tasks and certain types of recurrent neural networks.


**Problems**:

Vanishing Gradient: Like sigmoid, tanh suffers from vanishing gradients for large values of x, which can hamper learning in deep networks.

Computational Cost: Involves the computation of exponential functions, which can be computationally expensive.

## ReLU Function



**Formula:**

$$
\text{ReLU}(x) = \max(0, x)
$$

**Range:**

$$
\text{ReLU}(x) \in [0, \infty)
$$



In [None]:
def relu(x):
    return np.maximum(0, x)

# Generate input data
y = relu(x)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot ReLU function
sns.lineplot(x=x, y=y, ax=ax, color='green', label='ReLU Function ')
plt.title('ReLU Activation Function')
plt.legend()


# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')



plt.tight_layout()
plt.show()


**Uses:**

Widely used in hidden layers of neural networks for deep learning due to its simplicity and efficiency.

Helps with the convergence of training by mitigating the vanishing gradient problem.

**Problems:**

Dying ReLU Problem: Neurons can sometimes get stuck in the inactive state (outputting zero) and stop learning if the input is always negative.(Dead Neuron)

Not Zero-Centered: Outputs are always non-negative, which can impact learning dynamics.

## Leaky ReLU Function



**Formula:**

$$
\text{Leaky ReLU}(x) = \begin{cases}
x & \text{if } x > 0 \\
\alpha x & \text{if } x \leq 0
\end{cases}
$$

**Range:**

$$
\text{Leaky ReLU}(x) \in (-\infty, \infty)
$$

In [None]:
def leaky_relu(x, alpha=0.01):
    return np.where(x > 0, x, alpha * x)

# Generate input data
alpha = 0.01
y = leaky_relu(x, alpha)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot Leaky ReLU function
sns.lineplot(x=x, y=y, ax=ax, color='green' , label='Leaky ReLU Function')
plt.title('Leaky ReLU Activation Function')
plt.legend()

# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')


plt.tight_layout()
plt.show()


**Uses:**

A variant of ReLU designed to address the dying ReLU problem by allowing a small, non-zero gradient when 
x is negative.

Used in hidden layers of deep networks.

**Problems:**

Hyperparameter Sensitivity: The choice of 𝛼 can affect performance and requires tuning.

Can Still Suffer from Dead Neurons: Although less likely than with standard ReLU.

## PReLU Function



**Formula:**

$$
\text{PReLU}(x) = \begin{cases}
x & \text{if } x > 0 \\
\alpha x & \text{if } x \leq 0
\end{cases}
$$

**Range:**

$$
\text{PReLU}(x) \in (-\infty, \infty)
$$



In [None]:
def prelu(x, alpha):
    return np.where(x > 0, x, alpha * x)

# Generate input data
alpha = 0.01
y = prelu(x, alpha)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot PReLU function
sns.lineplot(x=x, y=y, ax=ax, color='green', label='PReLU Function')
plt.title('PReLU Activation Function')
plt.legend() 

# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

fig.tight_layout()
plt.show()


**Uses:**

Similar to Leaky ReLU but with the additional flexibility of learning the 
α parameter, which can improve performance in some cases.

**Problems:**

Increased Complexity: The model becomes more complex due to the additional learnable parameters.

Overfitting Risk: The added flexibility might lead to overfitting if not managed properly.


## ELU Function





**Formula:**

$$
\text{ELU}(x) = \begin{cases}
x & \text{if } x > 0 \\
\alpha (e^x - 1) & \text{if } x \leq 0
\end{cases}
$$

**Range:**

$$
\text{ELU}(x) \in (-\alpha, \infty)
$$


In [None]:
def elu(x, alpha=1.0):
    return np.where(x > 0, x, alpha * (np.exp(x) - 1))

# Generate input data
alpha = 1.0
y = elu(x, alpha)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot ELU function
sns.lineplot(x=x, y=y, ax=ax, color='green', label='ELU Function')
plt.title('ELU Activation Function')
plt.legend()

# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')


fig.tight_layout()
plt.show()


**Uses:**

Addresses vanishing gradient issues and provides smoother gradients compared to ReLU.

Useful in deep networks where the output needs to be zero-centered.

**Problems:**

Computational Cost: Involves the computation of exponential functions, which can be expensive.

Negative Output Issue: For negative inputs, the output can become very large if α is large.

## Softmax Function




**Formula:**

$$
\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}
$$

**Range:**

$$
\text{Softmax}(x_i) \in (0, 1)
$$

In [None]:
def softmax(x):
    e_x = np.exp(x - np.max(x))  

    return e_x / e_x.sum(axis=0)

# Generate input data
x = np.linspace(-2, 2, 100)
x = np.vstack([x, np.ones_like(x)]) 
y = softmax(x)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot Softmax function
ax.plot(x[0], y[0], color='green', label='Softmax Function')
plt.title('Softmax Activation Function')
plt.legend()


# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')


fig.tight_layout()
plt.show()

**Uses:**

Used in the output layer of multi-class classification problems to produce a probability distribution over classes.

**Problems:**

Numerical Stability: Can suffer from numerical instability if not computed carefully, especially for very large or very small inputs.

Not Suitable for Hidden Layers: Primarily used in the output layer for classification tasks, not for hidden layers.

## Swish Function



**Formula:**

$$
\text{Swish}(x) = x \cdot \sigma(x) = x \cdot \frac{1}{1 + e^{-x}}
$$

**Range:**

$$
\text{Swish}(x) \in (-\infty, \infty)
$$


In [None]:
def swish(x):
    return x * sigmoid(x)

# Generate input data
x = np.linspace(-10, 10, 400)
y = swish(x)

# Create a seaborn style plot
sns.set(style="white")
fig, ax = plt.subplots(figsize=(10, 6))

# Plot Swish function
sns.lineplot(x=x, y=y, ax=ax,color= 'green' , label='Swish Function')
plt.title('Swish Activation Function')
plt.legend()


# Set the axes to cross at (0,0)
ax.spines['left'].set_position('zero')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

plt.tight_layout()
plt.show()




**Uses:**

Can improve training dynamics and performance in some deep learning models.

Smooth activation function that is continuous and differentiable.


**Problems:**

Computational Cost: Involves both multiplication and sigmoid computation, which can be more expensive than ReLU.

Less Proven: While it has shown benefits in some cases, it is not as widely tested or used as other activation functions.