## What is "soft" in softmax?

Lets say I have an array of numbers, lets call them keys, and another corresponding array, lets call them values. I want to select the value with the biggest key.

In [1]:
import numpy as np

In [2]:
rng = np.random.default_rng()

In [3]:
def softmax(xs):
    return np.exp(xs) / np.sum(np.exp(xs))

In [13]:
ks = np.array([4, 29, 2, 33, 49, 44, 21, 1, 90, 99])
vs = np.array([49, 51, 97, 48, 77, 65, 57, 28, 67, 16])

In [14]:
v = vs[np.argmax(ks)]
v

16

But here $v$ is not differentiable because $argmax$ is not differentiable. If I wanted to use $v$ in any optimization function, I will not be able to do that. The way to get around this limitation is to use softmax.

$$
\mathbf p = softmax(\mathbf k) \\
v = \sum_i p_i v_i
$$

This is almost equivalent to -
$$
j = argmax(\mathbf k) \\
v = v_j \\
$$

Lets see this in action -

In [15]:
v = np.sum(softmax(ks) * vs)
v

16.006293123375297