## cross entropy

Cross entropy loss is the negative log likelihood of the softmax of the predictions/targets. in order to implement it efficiently, we use the log likelihood trick and stack these one after the other

![alt text](lse.png "Title")

In [12]:
import torch

test_tensor = torch.Tensor([5, 6, 8, 10, -20])

def log_softmax_exp(x: torch.Tensor):
    a = x.max(dim=0)[0]
    return a + torch.exp(x-a).sum().log()

In [8]:
test_tensor

tensor([  5.,   6.,   8.,  10., -20.])

In [9]:
log_softmax_exp(test_tensor)

tensor(10.1488)

In [13]:
def logsumexp(x):
    m = x.max(-1)[0]
    return m + (x-m[:,None]).exp().sum(-1).log()

In [16]:
test_tensor_unsqueezed = test_tensor.unsqueeze(0)
logsumexp(test_tensor_unsqueezed)

tensor([[10.1488]])

Our answer matches that of the correct implementation. the softmax gives us our probabilities across targets. We now want to compute the negative log loss

In [17]:
def nll(input, targets):
    """
    Expects the inputs to be log probabilities of the index class.
    """
    return -input[range(targets.shape[0]), targets].mean()

In [36]:
import torch

input = torch.tensor([[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]])
result = input.logsumexp(-1, keepdim=True)
print(result)

tensor([[2.4076],
        [5.4076]])


In [31]:
def cross_entropy(input, target):
    return nll(input - input.logsumexp(-1,keepdim=True), target)

In [32]:
dummy_activation_outputs = torch.tensor([[0, 1, 2, -1], [0, 2.1, -1, 0]])
dummy_activation_outputs.shape

torch.Size([2, 4])

In [33]:
targets = torch.tensor([1, 2])

In [34]:
targets.shape

torch.Size([2])

In [35]:
cross_entropy(dummy_activation_outputs, targets)

tensor(2.3974)