In [1]:
import torch
from torch.autograd import Variable
torch.manual_seed(5)

<torch._C.Generator at 0x7ff6e7a1bf60>

In [2]:
input = Variable(torch.randn(3, 5))
target = Variable(torch.LongTensor(3).random_(5))

In [3]:
input

Variable containing:
 0.0590  0.3317  1.2978 -1.3694  0.2554
 0.9160  2.4308 -1.3641 -0.4327  0.0316
-0.7676 -0.6493  0.0933 -0.9304 -1.1471
[torch.FloatTensor of size 3x5]

In [4]:
target

Variable containing:
 3
 0
 2
[torch.LongTensor of size 3]

In [5]:
loss = torch.nn.CrossEntropyLoss()

In [6]:
output = loss(input, target)
print output

Variable containing:
 2.0616
[torch.FloatTensor of size 1]



From the code above, we find that the CrossEntropy Loss allows the `input` parameter to be a matrix, which represents a mini-batch output of neural network, each row of the matrix corresponds to one input data. The `target` parameter is a vector, each element is the groundtruth label of the input data. 

Based on each row of the `input` matrix, the algorithm will first compute the probability of each class of the input data, then will apply the multi-class crossentropy equation.

$$
crossentropy = -\frac{1}{m}\sum_{i=1}^m\sum_{k=1}^K y_{true}^{(k)}\mathrm{log}(y_{predict}^{(k)})
$$

$y_{true}$ is a K element one-hot vector, if the ground truth is the kth class, then $y_{true}^{(k)}$ will be one and others will be 0.

Now, let's calculate the crossentropy seperately to give us more confidence about the above explanation.

In [7]:
import torch.nn.functional as F

In [8]:
prob = F.softmax(input)
log_prob = torch.log(prob)
log_prob

Variable containing:
-1.9771 -1.7044 -0.7383 -3.4055 -1.7807
-1.8442 -0.3295 -4.1243 -3.1929 -2.7286
-1.7959 -1.6776 -0.9350 -1.9587 -2.1754
[torch.FloatTensor of size 3x5]

Since we have 3 examples, the target is `[3,0,2]`, we directely add them up

In [9]:
crossentropy = -1/3.0*(log_prob[0,3]+log_prob[1,0]+log_prob[2,2])

In [10]:
crossentropy

Variable containing:
 2.0616
[torch.FloatTensor of size 1]

The result is consistent with the previous `ouput`.

There is another way to calculate the crossentropy, which is said to be more precise and stable than using softmax activation funtion in the network to get a probablility output. The method uses `torch.nn.LogSoftmax` as activation function, and use `torch.nn.NLLLoss` as loss function.

In [11]:
log_prob = F.log_softmax(input)
log_prob

Variable containing:
-1.9771 -1.7044 -0.7383 -3.4055 -1.7807
-1.8442 -0.3295 -4.1243 -3.1929 -2.7286
-1.7959 -1.6776 -0.9350 -1.9587 -2.1754
[torch.FloatTensor of size 3x5]

In [12]:
crossentropy = F.nll_loss(log_prob, target)

In [13]:
crossentropy

Variable containing:
 2.0616
[torch.FloatTensor of size 1]

It's also correct. 