# Loss Function

We can create a probability distrbution over the outputs by using a softmax function which is defined as:

$$\frac{e^{x}}{\sum_{x'}e^{x'}}$$

We can then use that probability distribution and knowledge of the correct categories to evaluate the classifier.

A classifier with high confidence in an incorrect category is a poor classifier.  A classifier with low confidence in an incorrect category and high confidence in the correct category is a good classifier.

We show below how the softmax function is applied to the output values to turn them into probabilities, and how a measure of the incorrectness, the cross-entropy loss, is computed from them.

In [1]:
import torch
import torch.nn.functional as F

data = torch.Tensor([[6, 2, 1.9]])

weights = torch.Tensor([[1, 0],[0, 1],[0, 1]])

c = torch.mm(data, weights)
print("c1 and c2: " + str(c))

exp = torch.exp(c)
print("e to the power of c: "+str(exp))

soft = exp / torch.sum(exp)
print("e to the power of c normalized: "+str(soft))

result = F.softmax(torch.autograd.Variable(c), dim=1)
print("softmax: " + str(result))

c1 and c2: tensor([[ 6.0000,  3.9000]])
e to the power of c: tensor([[ 403.4288,   49.4025]])
e to the power of c normalized: tensor([[ 0.8909,  0.1091]])
softmax: tensor([[ 0.8909,  0.1091]])


Convince yourself that the output of the 'softmax' function is the same as 'e to the power of c normalized'. nll_loss() is a function that takes in the output of the model (log of softmax), and looks for the target class, or the correct answer (identified by its position) and returns the negative of the calcualtion. For example, if the second class was the right answer, target[0] would be set to 1, and this would return the second item in the result array (after negating it).

In [16]:
result = torch.log(result)
print("log(softmax): " + str(result))

# The correct category
target = torch.LongTensor(1)
target[0] = 0 

loss = F.nll_loss(result, target)
print("Loss: "+str(loss.data.item()))


log(softmax): tensor([[nan, nan]])
Loss: nan


Pytorch also has a ready-made function to compute cross entropy directly from the neural network's outputs.
Convince yourself that the loss calculated by either method is the same.

In [3]:
loss = F.cross_entropy(c, target)
print("Loss: "+str(loss.data.item()))


Loss: 0.11551953107118607


Try this for different values of input data.

You could try them in the following order - in order of increasing difficulty

$\begin{bmatrix}6 & 2 & 1.9\end{bmatrix}$
$\begin{bmatrix}5 & 2 & 1.9\end{bmatrix}$
$\begin{bmatrix}4 & 2 & 1.9\end{bmatrix}$

You should see the loss increase as the input data points become more difficult to decide about (as $f_{1}$ and $f_{2} + f_{3}$ get increasingly closer.  Remember, we're still using the weight matrix for Toy Problem 2 and the decision that classifier has to take is whether $f_{1} < f_{2} + f_{3}$).