Class indices in the range [0, C) where CC is the number of classes;
if ignore_index is specified, this loss also accepts this class index (this index may not necessarily be in the class range).
The unreduced (i.e. with reduction set to 'none') loss for this case can be described as:

$$ l(X, \mathbf{y})=L=\left\{l_{1}, \ldots, l_{N}\right\}^{\top}, \quad l_{n}=-\mathbf{w}_{\mathbf{y}_{n}} \log \frac{\exp \left(X_{n, \mathbf{y}_{n}}\right)}{\sum_{c=1}^{C} \exp \left(X_{n, c}\right)} $$

where $X$ is the input, $\mathbf{y}$ is the target, $\mathbf{w}$ is the weight, $C$ is the number of classes,
and $N$ spans the minibatch dimension as well as $ d_{1}, \ldots, d_{k}$ for the K-dimensional case.

* Input: $(N, C)$(where $C$ = number of classes

* Target: $(N) $ where each value is $ 0 \leq \text{targets}[i] \leq C $

* Output: If reduction is ‘none’, same shape as the target. Otherwise, scalar.

In [75]:
import torch
import torch.nn.functional as F
import torch.nn as nn

In [76]:
x_input = torch.arange(12, dtype=torch.float32).reshape(4, 3)
y_target = torch.tensor([0, 2, 1, 2])
x_input

tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 6.,  7.,  8.],
        [ 9., 10., 11.]])

In [77]:
F.cross_entropy(x_input, y_target,
                # 类别权重参考NLLLoss
                weight=None)

tensor(1.1576)

In [78]:
nn.CrossEntropyLoss()(input=x_input,
                      target=y_target)  # ★★★★★要求:整数向量(最小值为0,最大值为input.shape[1]-1)

tensor(1.1576)

In [79]:
print(nn.CrossEntropyLoss(reduction='none')(x_input, y_target))
print(nn.CrossEntropyLoss(reduction='mean')(x_input, y_target))  # 默认reduction='mean',参考BCELos
print(nn.CrossEntropyLoss(reduction='sum')(x_input, y_target))

tensor([2.4076, 0.4076, 1.4076, 0.4076])
tensor(1.1576)
tensor(4.6304)


In [80]:
loss_sum = nn.CrossEntropyLoss(reduction='sum')(input=x_input,
                                                target=y_target)

loss_sum_double = nn.CrossEntropyLoss(reduction='sum')(input=x_input.repeat(2, 1),
                                                       target=y_target.repeat(2))
print(loss_sum)
print(loss_sum_double)  # loss_sum的两倍

tensor(4.6304)
tensor(9.2608)


In [81]:
# target为0的损失直接为0(NLP任务中可指定ignore_index填充项)
'''
ignore_index (int, optional) –
    Specifies a target value that is ignored and does not contribute to the input gradient.
    When size_average is True, the loss is averaged over non-ignored targets.
    Note that ignore_index is only applicable when the target contains class indices.
'''
nn.CrossEntropyLoss(reduction='none', ignore_index=0)(x_input,
                                                      y_target)

tensor([0.0000, 0.4076, 1.4076, 0.4076])

上式计算步骤如下

In [82]:
logsoftmax_output = F.log_softmax(x_input, dim=1)  # 注意:dim=1
logsoftmax_output

tensor([[-2.4076, -1.4076, -0.4076],
        [-2.4076, -1.4076, -0.4076],
        [-2.4076, -1.4076, -0.4076],
        [-2.4076, -1.4076, -0.4076]])

In [83]:
nlloss_output = F.nll_loss(logsoftmax_output, y_target)
print(nlloss_output)

tensor(1.1576)
