# Activation Functions Concepts

This notebook provides an overview of the activation functions implemented in this module, including their forward and backward pass formulas.

## Identity

### Forward Pass
The output is the same as the input, $x$.
$$ 
\text{Identity}(x) = x
$$

### Backward Pass
The derivative of the Identity function with respect to $x$ is always 1.
$$ 
\frac{d}{dx} \text{Identity}(x) = 1
$$

## ReLU (Rectified Linear Unit)

### Forward Pass
The ReLU function is defined as:
$$ 
\text{ReLU}(x) = \max(0, x) 
$$ 
Where $x$ is the input.

### Backward Pass
The derivative of ReLU, $\text{ReLU}'(x)$, is:
$$ 
\text{ReLU}'(x) = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \end{cases} 
$$

## Sigmoid

### Forward Pass
The sigmoid function, $\sigma(x)$, is defined as:
$$ 
\sigma(x) = \frac{1}{1 + e^{-x}} 
$$ 
Where $x$ is the input.

### Backward Pass
The derivative, $\sigma'(x)$, is calculated using the formula:
$$ 
\sigma'(x) = \sigma(x) (1 - \sigma(x)) 
$$ 
where $\sigma(x)$ is the activation value from the forward pass.

## Softmax

### Forward Pass
The Softmax function for an input vector $x$ with elements $x_i$ is defined as:
$$ 
\text{Softmax}(x_i) = \frac{e^{x_i - \max(x)}}{\sum_{j} e^{x_j - \max(x)}} 
$$ 
Subtracting $\max(x)$ from $x$ improves numerical stability without changing the output.

### Backward Pass
Given the gradient of the loss $L$ w.r.t the Softmax output $a$ (denoted as $dA = \frac{\partial L}{\partial a}$), this computes the gradient of the loss w.r.t the Softmax input $z$ (denoted as $dZ = \frac{\partial L}{\partial z}$). The calculation for each component $dZ_k$ can be simplified to:
$$ 
dZ_k = a_k \left( dA_k - \sum_j dA_j \cdot a_j \right) 
$$ 
So, $dZ = \text{act} \odot (dA - \text{sum}(dA \odot \text{act}, \text{dim}, \text{keepdim=True}))$, where `act` represents the activation output $a_k$, $dA$ represents $dA_k$, and $\odot$ is element-wise multiplication.

*Note: Softmax also has a `derivative` method that computes the Jacobian matrix $J$ with elements $J_{ij}$:*
$$ 
J_{ij} = \begin{cases} a_i (1 - a_i) & \text{if } i = j \\ -a_i a_j & \text{if } i \neq j \end{cases} 
$$ 
Where $a_i$ is $\text{Softmax}(z_i)$.

## Tanh (Hyperbolic Tangent)

### Forward Pass
The tanh activation function is defined as:
$$ 
\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} 
$$ 
Where $x$ is the input.

### Backward Pass
The derivative, $\text{tanh}'(x)$, is calculated using the formula:
$$ 
\text{tanh}'(x) = 1 - \text{tanh}^2(x) 
$$ 
where $\text{tanh}(x)$ is the activation value from the forward pass.