
# The SoftMax Derivative, Step-by-Step!!!

This notebook explains the SoftMax derivative step-by-step, inspired by the StatQuest video. The SoftMax function is fundamental in machine learning for converting logits to probabilities, and understanding its derivative is key for model optimization.



## SoftMax Function

The SoftMax function for a vector $ z = [z_1, z_2, \ldots, z_k] $ is defined as:

$$
\sigma(z_i) = 
\frac{e^{z_i}}{\sum_{j=1}^k e^{z_j}}
$$

It normalizes logits into a probability distribution.



## Derivatives of the SoftMax Function

### Case 1: 
$\frac{\partial \sigma(z_i)}{\partial z_i} $ (when \( i = k \))

$$
\frac{\partial \sigma(z_i)}{\partial z_i} = \sigma(z_i) (1 - \sigma(z_i))
$$

### Case 2:
$\frac{\partial \sigma(z_i)}{\partial z_k}$ (when \( i 
eq k \))

$$
\frac{\partial \sigma(z_i)}{\partial z_k} = -\sigma(z_i) \cdot \sigma(z_k)
$$

The derivative can be represented as the Jacobian matrix of the SoftMax function.


In [None]:

import numpy as np

def softmax(z):
    """
    Compute the SoftMax of a vector z.
    """
    exp_z = np.exp(z - np.max(z))  # Numerical stability
    return exp_z / np.sum(exp_z)

def softmax_derivative(softmax_probs):
    """
    Compute the Jacobian matrix of the SoftMax function.
    """
    n = len(softmax_probs)
    jacobian = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            if i == j:
                jacobian[i][j] = softmax_probs[i] * (1 - softmax_probs[i])
            else:
                jacobian[i][j] = -softmax_probs[i] * softmax_probs[j]
    return jacobian

# Example usage
z = np.array([1.0, 2.0, 3.0])  # Example logits
softmax_probs = softmax(z)
jacobian = softmax_derivative(softmax_probs)

print("SoftMax probabilities:", softmax_probs)
print("Jacobian matrix:", jacobian)



## Example Calculation

Given logits \( z = [1.0, 2.0, 3.0] \), the SoftMax probabilities are computed as:

$$
\sigma(z) = \text{softmax}(z)
$$

The Jacobian matrix represents the derivatives of each SoftMax probability with respect to all logits.



## References

1. Goodfellow, Ian, et al. *Deep Learning*. MIT Press, 2016.
2. [StatQuest YouTube Channel](https://www.youtube.com/@statquest)
