# Softmax and Cross-Entropy

The softmax function is used with cross-entropy loss when training neural networks on multi-class classification problems.
This notebooks explores the function to gain an understanding of their output during both forward and backward passes.

The softmax function is defines as

$$
\mathbf{p} = \frac{\exp(y_i)}{\sum_{j=1}^{n} \exp(y_j)}.
$$

Cross-Entropy is defined as

$$
L = -\sum_{i=1}^n \hat{y}_i \log(p_i),
$$

where $\mathbf{p}$ is the result of softmax and $\hat{\mathbf{y}}$ is the ground-truth.

In [1]:
import numpy as np


def softmax(x):
    '''Implements the softmax function.
    '''
    num = np.exp(x)
    
    return num / np.sum(num, axis=0, keepdims=True)


def softmax_stable(x):
    '''Implements a stable version of softmax.
    '''
    num = np.exp(x - np.max(x))
    
    return num / np.sum(num, axis=0, keepdims=True)

In [3]:
# Make up some scores -- class 3 has the highest
y = [10, 20, 30]

# Squashes the scores, normalizing the output to sum to 1.
p = softmax(y)

print(p)

[2.06106005e-09 4.53978686e-05 9.99954600e-01]


The original version of softmax is not stable for larger scores.
Consider the next example.

In [5]:
# Make up some scores -- class 3 has the highest
y = [150, 500, 800]

# exp(x) is a large number for even a relatively small number like 150.
p = softmax(y)

print(p)

[ 0.  0. nan]


  import sys
  if __name__ == '__main__':


To resolve this, subtract the largest value from the input vector.
This does not change the result of softmax, but it does result in a more numerically stable implementation.

In [6]:
# Make up some scores -- class 3 has the highest
y = [150, 500, 800]

# exp(x) is a large number for even a relatively small number like 150.
p = softmax_stable(y)

print(p)

[5.11195195e-283 5.14820022e-131 1.00000000e+000]
