## Selecting prediction value based on target

If we have some activations (for example, these are some random normalized activations (predictions) from a model with 2 outputs applied to 5 items):

In [1]:
import torch

cnt_outputs = 2
cnt_items = 5

torch.random.manual_seed(42)
logits = torch.randn(cnt_items, cnt_outputs) * 2
activations = torch.softmax(logits, dim=1)
logits, activations


(tensor([[ 0.6734,  0.2576],
         [ 0.4689,  0.4607],
         [-2.2457, -0.3727],
         [ 4.4164, -1.2760],
         [ 0.9233,  0.5347]]),
 tensor([[0.6025, 0.3975],
         [0.5021, 0.4979],
         [0.1332, 0.8668],
         [0.9966, 0.0034],
         [0.5959, 0.4041]]))

And we have some targets as values between 0 and `cnt_outputs-1`:

In [2]:
targets = torch.randint(0, cnt_outputs, (cnt_items,))
targets


tensor([1, 0, 1, 1, 1])

Here is how to use tensor indexing to extract the likelihood assigned to the target corresponding to each row:

In [3]:
sel0 = activations[range(cnt_items), targets]
sel0


tensor([0.3975, 0.5021, 0.8668, 0.0034, 0.4041])

I don't get exactly what is allowed as tensor indexing syntax -- above it seems that we have:

- a range, which is a quasi-vector, spanning the count of items, which matches the first dimension of the predictions
- a vector of the same length with the right target for each row

And we're using the value $n$ from the range and $m$ from the targets to get the right $(n, m)$ from the predictions 2D tensor, which is indeed like indexing.

`nll_loss` does the same as the advanced indexing above, but flips the sign to negative:

In [None]:
sel1 = torch.nn.functional.nll_loss(
    activations,
    targets,
    reduction="none",  # so it doesn't take the mean of the output
)
assert torch.allclose(sel0, -sel1), (
    "Expecting target-based indexing and nll_loss to produce the same result but of inverse sign"
)

sel1


tensor([-0.3975, -0.5021, -0.8668, -0.0034, -0.4041])

But `nll_loss` doesn't take the log, it assumes you already took the log of the softmax. And there's a function called `log_softmax` that combines `log` and `softmax` in a fast _and accurate_ way. That combo (softmax + log) is called _cross-entropy loss_.

And PyTorch's `nn.CrossEntropyLoss` (or`F.cross_entropy`) is a combination of `log_softmax` and then `nll_loss`.

## Logarithm

This is what a logarithm is:

In [None]:
from math import e, log

base = e
a = 5
y = base**a
a == log(y, base)


See how as the probability gets smaller the log gets much smaller. It's amplifying the difference. So even if 0.99 and 0.999 are pretty close, the latter is 10 times as confident, and the log tries to reflect that.

In [None]:
from fastbook import plot_function

plot_function(torch.log, min=0, max=1)


The following relationship is key:

In [None]:
b = 12  # we already have an a, need a b for the thing below
log(a * b, base) == log(a, base) + log(b, base)


The above shows that logarithms increase linearly when the underlying signal increases exponentially or multiplicatively.