## Softargmax and cross-entropy

### Definitions

The cross-entropy between two discrete probability distributions $p$ and $q$ is defined by:
    
\begin{equation*}
H(p, q) = E_p\left[-\log q \right] = -\sum_{x \in \Omega} p(x) \log q(x).
\end{equation*}

Similarly, the cross-entropy loss function in a model with $M$ classes is given by:

\begin{equation*}
L = -\sum_{c = 1}^M y_c \log \hat{y}_c.
\end{equation*}

### Imports

In [1]:
import tensorflow as tf
import numpy as np

### Test Data

In [2]:
y_np = np.array([[0.0, 0.0, 1.0, 0.0], # expected output
                 [0.0, 0.0, 0.0, 1.0], 
                 [1.0, 0.0, 0.0, 0.0]])
y_hat_np = np.array([[0.4, 0.1, 2.5, 0], # predicted output 
                     [0.15, 2, 0.2, 47], 
                     [3.2, 1.3, 1.7, 0.1]])
y = tf.convert_to_tensor(y_np)
y_hat = tf.convert_to_tensor(y_hat_np)

### Calculation by TensorFlow

In [6]:
softmax_tf = tf.nn.softmax(y_hat)
cross_entropy_tf_1 = -tf.reduce_sum(y * tf.log(tf.nn.softmax(y_hat)), 1)
cross_entropy_tf_2 = tf.nn.softmax_cross_entropy_with_logits_v2(labels = y, logits = y_hat)

with tf.Session() as sess:        
    sess.run(softmax_tf)
    print(f'softmax(y_hat) = {softmax_tf.eval()}')
    sess.run(cross_entropy_tf_1)
    print(f'Cross-entropy method #1 = {cross_entropy_tf_1.eval()}')
    sess.run(cross_entropy_tf_2)
    print(f'Cross-entropy method #2 = {cross_entropy_tf_2.eval()}')

softmax(y_hat) = [[9.45420123e-02 7.00384453e-02 7.72046136e-01 6.33734060e-02]
 [4.50094310e-21 2.86251858e-20 4.73171139e-21 1.00000000e+00]
 [7.05343977e-01 1.05497325e-01 1.57383515e-01 3.17751836e-02]]
Cross-entropy method #1 = [ 0.25871097 -0.          0.34906968]
Cross-entropy method #2 = [0.25871097 0.         0.34906968]


### Manual Calculation

In [11]:
exp_y_hat_np = np.exp(y_hat_np)
softmax = exp_y_hat_np / exp_y_hat_np.sum(axis=1, keepdims=True)
print(f'softmax(y_hat) = {softmax}')
cross_entropy = - np.sum(y_np * np.log(softmax), axis=1)
print(f'Cross-entropy = {cross_entropy}')
# For each output vector, only 1 term of the sum in non-zero, hence the calculation can be simplified
print('Simpler calculation:', -np.log(softmax[0, 2]), -np.log(softmax[1, 3]), -np.log(softmax[2, 0]))

softmax(y_hat) = [[9.45420123e-02 7.00384453e-02 7.72046136e-01 6.33734060e-02]
 [4.50094310e-21 2.86251858e-20 4.73171139e-21 1.00000000e+00]
 [7.05343977e-01 1.05497325e-01 1.57383515e-01 3.17751836e-02]]
Cross-entropy = [ 0.25871097 -0.          0.34906968]
Simpler calculation: 0.2587109686601284 -0.0 0.34906968436284574
