In [1]:
import numpy as np
import tensorflow as tf

$logits => \operatorname{softmax}(logits) => \operatorname{crossentropy}(labels, softmax)$

### Softmax

[Wikipedia](https://en.wikipedia.org/wiki/Softmax_function)

The softmax function takes an input vector and returns a probability distribution over the elements of the vector, such that the sum of the probabilities is 1.

$\operatorname{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^n e^{x_j}}$

In this definition, `x_i` is the `i`-th element of the input vector `x`, and `n` is the length of the vector.

Here's an example Python function that calculates the softmax of an input vector `x`.

In this example, the `softmax()` function takes an input vector `x` as a NumPy array. It calculates the exponential of each element in the input vector, then calculates the sum of the exponential values. Finally, it calculates the softmax values by dividing each exponential value by the sum.

Note that this implementation assumes that the input vector is a one-dimensional NumPy array.

In [2]:
def softmax(x):
    """
    Calculate the softmax of an input vector.

    Parameters
    ----------
    x : numpy.ndarray
        Input vector.

    Returns
    -------
    numpy.ndarray
        Softmax of the input vector.
    """
    # Calculate the exponential of each element in the input vector
    exp_x = np.exp(x)

    # Calculate the sum of the exponential values
    sum_exp_x = np.sum(exp_x)

    # Calculate the softmax values by dividing each exponential value by the sum
    softmax_x = exp_x / sum_exp_x

    return softmax_x

In [3]:
x = np.array([1, 2, 3])
print(softmax(x))

[0.09003057 0.24472847 0.66524096]


In [4]:
np.exp(3)/np.sum(np.exp([1,2,3]))

0.6652409557748219

`Tensorflow` provides a function to calculate softmax:

In [6]:
x = tf.constant([1.0, 2.0, 3.0])
print(tf.nn.softmax(x))

tf.Tensor([0.09003057 0.24472848 0.66524094], shape=(3,), dtype=float32)


## Cross-Entropy

[Wikipedia](https://en.wikipedia.org/wiki/Cross_entropy)

Cross-entropy is used as a loss function in ML and DL for classification problems.

$H(p, q) = - \sum_{i=1}^N p_{i} \cdot \log(\hat{q_{i}})$

In this equation, $p$ is the true probability distribution, $q$ is the predicted probability distribution, $N$ is the number of classes, `p_{i}` is the true probability of the $i$-th class, and $\hat{q_{i}}$ is the predicted probability of the $i$-th class.

In [7]:
y_true = [[0, 0, 1]]
y_pred = [[0.1, 0.8, 0.1]]

cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()

2.3025851

In [8]:
def cross_entropy(y_true, y_pred):
    """
    Calculate the cross-entropy loss between two input vectors.

    Parameters
    ----------
    y_true : numpy.ndarray
        True probability distribution.
    y_pred : numpy.ndarray
        Predicted probability distribution.

    Returns
    -------
    float
        Cross-entropy loss between the two input vectors.
    """
    # Calculate the cross-entropy loss
    # The basis of the log function used here is e:
    # https://numpy.org/doc/stable/reference/generated/numpy.log.html
    loss = -np.sum(y_true * np.log(y_pred)) 

    return loss

In [9]:
cross_entropy(y_true, y_pred)

2.3025850929940455

In [10]:
-(0*np.log(0.1) + 0*np.log(0.8) + 1*np.log(0.1))

2.3025850929940455

In [11]:
y_true = [[0, 0, 1]]
y_pred = [[0.2, 0.3, 0.5]]

cce = tf.keras.losses.CategoricalCrossentropy()
cce(y_true, y_pred).numpy()

0.6931472