# Implementation: Visualizing Activation Functions

**Goal**: Plot the most common activation functions and their derivatives to understand the "Vanishing Gradient" problem.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Define range of data
x = np.linspace(-5, 5, 200)

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def leaky_relu(x, alpha=0.1):
    return np.where(x > 0, x, x * alpha)

# Derivative of Sigmoid: f(x) * (1 - f(x))
def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

## Visualizing Sigmoid vs ReLU Gradients

In [None]:
fig, ax = plt.subplots(1, 2, figsize=(14, 5))

# 1. Sigmoid and its Derivative
ax[0].plot(x, sigmoid(x), label="Sigmoid", color="blue")
ax[0].plot(x, sigmoid_derivative(x), label="Gradient (Derivative)", color="red", linestyle="--")
ax[0].set_title("Sigmoid: Gradient vanishes at ends!")
ax[0].legend()
ax[0].grid()

# 2. ReLU
ax[1].plot(x, relu(x), label="ReLU", color="green")
ax[1].set_title("ReLU: Gradient is 1 for x>0, 0 for x<0")
ax[1].legend()
ax[1].grid()

plt.show()

**Observation**: Look at the Red dashed line in the Sigmoid plot. 
*   If the input is large (e.g., 4) or small (e.g., -4), the gradient is almost **zero**.
*   This means the weights essentially stop updating.
*   ReLU fixes this by constant gradient of 1 for positive inputs.