### Notes on Softmax

#### References
- https://en.wikipedia.org/wiki/Softmax_function
- https://medium.com/@hunter-j-phillips/a-simple-introduction-to-softmax-287712d69bac
- https://www.singlestore.com/blog/a-guide-to-softmax-activation-function/
- And from PyTorch https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html

#### Notes
The softmax function is used for classification purposes in machine learning. Specifically, it is used to classify between 2 or more examples. It works by amplifying larger and minimizing smaller contributions to an array. As such, it follows from the logic of the sigmoid function. 

The softmax works by assigning probabilities to each value. It does this by normalizing rows so that the row sums to one. The softmax function is given in the following equation:

$$\sigma(\overrightarrow{z})=\frac{e^{z-i}}{\sum^k_{j=1}e^{z_j}}$$

Here
- $\overrightarrow{z}$ is the input vector
- $K\ge{1}$, the components of $\overrightarrow{z}$

<br>

In other words, each value as the power of the exponential is divided by the sum of the row's values. 

<br>

In practice, this will be achieved in the following way:

<br>

$\sigma(\overrightarrow{z})=\begin{bmatrix} x_1,\; x_2,\; \dots,\; x_n\end{bmatrix}\;=\;\begin{bmatrix}\frac{e^{x_1}}{\sum_{j}e^{x_j}},\; \frac{e^{x_2}}{\sum_{j}e^{x_j}},\; \dots,\; \frac{e^{x_n}}{\sum_{j}e^{x_j}}  \end{bmatrix}$

<br>

Which, extending to an $i*j$ matrix gives is

<br>

$\begin{bmatrix}
    \frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots  & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \\
    \frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots  & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \\
    \vdots & \vdots & \vdots & \ddots & \vdots \\
    \frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots  & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix}$

In [5]:
# First let's import the libraries and build an array
import numpy as np

# Build and array
array = np.random.randint(low = 0, high = 20, size = (5, 3))
print(array)

[[ 6  5  1]
 [ 2  7  5]
 [17 16 13]
 [ 4 11  8]
 [16 15 17]]


In [6]:
# Now we will compute the softmax
exponent_array = np.exp(array)
exponential_sum = np.sum(exponent_array, axis = 1, keepdims = True)
computed_softmax = exponent_array / exponential_sum
print(computed_softmax)

[[7.27475157e-01 2.67623154e-01 4.90168905e-03]
 [5.89975040e-03 8.75600595e-01 1.18499655e-01]
 [7.21399184e-01 2.65387929e-01 1.32128870e-02]
 [8.67881295e-04 9.51747406e-01 4.73847131e-02]
 [2.44728471e-01 9.00305732e-02 6.65240956e-01]]


In [8]:
# 
print(array.shape)
print(array.ndim)

(5, 3)
2


In [15]:
# Import PyTorch to use the PyTorch softmax
import torch

# Convert the np array to a torch tensor
tensor = torch.from_numpy(array).float()

# Check the tensor
print(tensor)
print(tensor.size())
print(tensor.dtype)

# Run the Softmax operation on the tensor
softmax_function = torch.nn.Softmax(dim=1)
softmax_operation = softmax_function(tensor)
print(softmax_operation)


tensor([[ 6.,  5.,  1.],
        [ 2.,  7.,  5.],
        [17., 16., 13.],
        [ 4., 11.,  8.],
        [16., 15., 17.]])
torch.Size([5, 3])
torch.float32
tensor([[7.2748e-01, 2.6762e-01, 4.9017e-03],
        [5.8998e-03, 8.7560e-01, 1.1850e-01],
        [7.2140e-01, 2.6539e-01, 1.3213e-02],
        [8.6788e-04, 9.5175e-01, 4.7385e-02],
        [2.4473e-01, 9.0031e-02, 6.6524e-01]])
