**Need of activation functions**


Need of Activation Functions in Machine Learning
1. Activation Functions Introduce Non-Linearity

Without activation functions, a neural network is just a stack of linear transformations:

y = W2 * (W1 * x + b1) + b2


No matter how many layers you add, the result is still linear, which means the network can only model linear relationships.

Activation functions let the network learn complex, non-linear patterns. For example, image recognition and natural language processing are highly non-linear problems.

2. Activation Decides Neuron “Firing”

Biologically inspired, a neuron either “fires” or not. Similarly, an activation function determines:

If a neuron should activate (pass information forward)

How strongly it activates

Examples:

ReLU outputs 0 for negative inputs → “don’t fire”

Positive inputs → pass the signal forward

3. Enables Gradient-Based Learning

Activation functions like sigmoid, tanh, ReLU are differentiable.

Neural networks use backpropagation, which relies on gradients to update weights:

W := W - η * (∂L / ∂W)


If the activation function isn’t differentiable, we cannot propagate gradients and train the network effectively.

4. Controls Output Range

Some activation functions bound the output:

Sigmoid → [0, 1] (useful for probabilities)

Tanh → [-1, 1] (centered around 0, often helps optimization)

Softmax → [0, 1], sum = 1 (used for multi-class classification)

This is crucial for interpreting outputs and stabilizing training.

5. Helps Avoid Vanishing/Exploding Gradients

Functions like ReLU, Leaky ReLU, GELU help prevent gradients from becoming too small (vanishing) or too large (exploding) in deep networks.

Choosing the right activation function improves training stability.

Summary

Without activation functions → network is just linear → cannot learn complex patterns.

With activation functions → network can approximate any continuous function (universal approximation theorem).

They also provide gradient flow, output scaling, and control neuron firing.

Analogy

Think of a neuron as a light bulb. Activation functions are the switch. Linear networks are like bulbs always dimming proportionally — everything looks the same. Activation functions allow the network to turn bulbs on/off or vary brightness non-linearly, creating rich patterns of light.

Sigmoid


In [5]:
def sigmoid(x):
    """
    Calculates the sigmoid of the input x.

    Args:
        x: A scalar or a NumPy array.

    Returns:
        The sigmoid of x, a value between 0 and 1.
    """
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    """
    Calculates the derivative of the sigmoid function.

    Args:
        x: A scalar or a NumPy array.

    Returns:
        Derivative of sigmoid at x.
    """
    s = sigmoid(x)
    return s * (1 - s)

# Example usage:
x = np.array([-2, -1, 0, 1, 2])
print("Sigmoid:", sigmoid(x))
print("Sigmoid derivative:", sigmoid_derivative(x))


Sigmoid: [0.11920292 0.26894142 0.5        0.73105858 0.88079708]
Sigmoid derivative: [0.10499359 0.19661193 0.25       0.19661193 0.10499359]


In [6]:
def tanh(x):
    """
    Calculates the hyperbolic tangent (tanh) of the input x.
    """
    return np.tanh(x)

def tanh_derivative(x):
    """
    Derivative of the tanh function.

    Returns values in range [0, 1].
    """
    return 1 - np.tanh(x)**2

# Example usage:
x = np.array([-2, -1, 0, 1, 2])
print("Tanh:", tanh(x))
print("Tanh derivative:", tanh_derivative(x))


Tanh: [-0.96402758 -0.76159416  0.          0.76159416  0.96402758]
Tanh derivative: [0.07065082 0.41997434 1.         0.41997434 0.07065082]


In [7]:
def relu(x):
    """
    Rectified Linear Unit activation.
    """
    return np.maximum(0, x)

def relu_derivative(x):
    """
    Derivative of ReLU: 1 for positive values, 0 for negative.
    """
    return np.where(x > 0, 1, 0)

# Example usage:
x = np.array([-2, -1, 0, 1, 2])
print("ReLU:", relu(x))
print("ReLU derivative:", relu_derivative(x))


ReLU: [0 0 0 1 2]
ReLU derivative: [0 0 0 1 1]


In [8]:
def softplus(x):
    """
    Softplus activation: smooth version of ReLU.
    """
    return np.log(1 + np.exp(x))

def softplus_derivative(x):
    """
    Derivative of Softplus is sigmoid.
    """
    return sigmoid(x)

# Example usage:
x = np.array([-2, -1, 0, 1, 2])
print("Softplus:", softplus(x))
print("Softplus derivative:", softplus_derivative(x))


Softplus: [0.12692801 0.31326169 0.69314718 1.31326169 2.12692801]
Softplus derivative: [0.11920292 0.26894142 0.5        0.73105858 0.88079708]
