## Activation functions

Activation functions are foundational components in neural networks, responsible for deciding the output of each neuron and enabling the network to learn complex, non-linear patterns from data. Without them, neural networks would behave like simple linear models, incapable of solving sophisticated problems.

An activation function is a mathematical function applied to the output of a neuron (node) after weighted input summation and bias addition. Its role is to:

- Transform the input signal into a form suitable for the next layer.

- Add **non-linearity** to the network, allowing it to learn complex mappings beyond linear relationships.

- Govern whether a neuron should "activate" (produce significant output) or not.

This non-linearity is crucial; stacking multiple linear layers without activation reduces to a single linear transformation, limiting expressiveness.

#### Sigmoid Activation Function

The sigmoid function maps any real-valued input x to a value between 0 and 1. Commonly used for binary classification as it represents probability-like outputs.

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

Properties:
- Domain: (-∞, ∞)
- Range: (0, 1)
- S-shaped curve, centered at x = 0, where σ(0) = 0.5
- As x → +∞, σ(x) → 1; as x → -∞, σ(x) → 0
- Used for output neurons predicting probabilities
- Drawbacks: Prone to vanishing gradient problem in deep networks, leading to slow or stalled learning in early layers


In [1]:
import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Example:
inputs = np.array([-2, 0, 2])
outputs = sigmoid(inputs)
print("Sigmoid outputs:", outputs)


# Built-in Implementations
# TensorFlow: tf.nn.sigmoid
# PyTorch: torch.sigmoid

Sigmoid outputs: [0.11920292 0.5        0.88079708]


#### Rectified Linear Unit (ReLU)

ReLU outputs zero for any negative input, and outputs the input directly if positive.

$$
f(x) = \max(0, x)
$$

Properties:
- Range: [0, ∞)
- Simple, computationally efficient
- Avoids vanishing gradient problem for positive values, enabling faster training of deep networks
- Can suffer from dying ReLU problem: neurons stuck outputting zero if inputs remain negative


In [3]:
def relu(x):
    return np.maximum(0, x)

# Example:
inputs = np.array([-2, 0, 2])
outputs = relu(inputs)
print("ReLU outputs:", outputs)

# Built-in Implementations
# TensorFlow: tf.nn.relu
# PyTorch: torch.relu

ReLU outputs: [0 0 2]


#### Other Common Activation Functions

1. Hyperbolic Tangent (Tanh)
- Similar shape to sigmoid but outputs values between -1 and 1

        tanh(x) = (e^x - e^-x) / (e^x + e^-x)

- Centered at zero, often leads to better convergence than sigmoid
- Still suffers from vanishing gradients in deep networks

2. Leaky ReLU
- Variant of ReLU that allows a small, non-zero gradient for negative inputs to mitigate dying ReLU

        LeakyReLU(x) = max(αx, x), where α is small (e.g., 0.01)

3. Softmax
- Used in output layer of multi-class classification
- Converts raw logits into probabilities summing to 1
- Normalized exponential

        Softmax(z_i) = e^(z_i) / Σ_j e^(z_j)


In [4]:
# Sources:
# [1](https://encord.com/blog/activation-functions-neural-networks/)
# [2](https://www.v7labs.com/blog/neural-networks-activation-functions)
# [3](https://www.geeksforgeeks.org/machine-learning/activation-functions-neural-networks/)
# [4](https://towardsdatascience.com/activation-functions-in-neural-networks-how-to-choose-the-right-one-cb20414c04e5/)
# [5](https://en.wikipedia.org/wiki/Activation_function)
# [6](https://www.reddit.com/r/learnmachinelearning/comments/12j4hxa/understanding_activation_functions_a_mustknow_for/)
# [7](https://developers.google.com/machine-learning/crash-course/neural-networks/activation-functions)
# [8](https://www.reddit.com/r/MLQuestions/comments/13j1g1y/purpose_of_activation_functions_in_neural_network/)