## 1. CrossEntropyLoss
### class torch.nn.CrossEntropyLoss(weight=None, size_average=True, ignore_index=-100, reduce=None, reduction='mean')[source]

作用:

- 针对单目标分类问题结合了 nn.LogSoftmax() 和 nn.NLLLoss() 来计算 loss.用于训练 C 类别classes 的分类问题.

    - 参数 weight 是 1D Tensor, 分别对应每个类别class 的权重. 对于类别不平衡的训练数据集比较有用.

    - 输入input 包含了每一类别的概率或score.

    - 输入 input Tensor 的大小是 (minibatch,C) 或 (minibatch,C,d1,d2,...,dK). K≥2 表示 K-dim 场景.

    - 输入 target 是类别class 的索引([0,C−1], C 是类别classes 总数.)

    $loss(score,target)=-log(\frac{exp(score[target])}{\sum_{j}^{C}exp(score[j]])})=-score[target]+log(\sum_{j}^{C}exp(score[j]))$
    - 带 weight形式:
    
    $weight[target](-score[target]+log(\sum_{j}^{C}exp(score[j])))$
    - losses 在 minibatch 内求平均.

- 也支持高维输入 inputs, 如 2D images, 则会逐元素计算 NLL Loss.

参数：
- weight(Tensor, optional) - 每个类别class 的权重. 默认为值为 1 的 Tensor.

- size_average(bool, optional) – 默认为 True.

    - size_average=True, 则 losses 在 minibatch 结合 weight 求平均average.

    - size_average=False, 则losses 在 minibatch 求相加和sum.

    - 当 reduce=False 时,忽略该参数.

- ignore_index(int, optional) - 指定忽略的 target 值, 不影响 input 梯度计算.

    - 当 size_average=True, 对所有非忽略的 targets 求平均.

- reduce(bool, optional) - 不推荐使用，默认为 True.

    - reduce=True, 则 losses 在 minibatch 求平均或相加和.

    - reduce=False, 则 losses 返回 per batch 值, 并忽略 size_average.
- reduction(string,optional) - 
    - reduction='none',逐个像素点求loss,输出的loss的size与target一致
    - reduction='mean',默认，输出总和除以输出元素数量(batch_size)
    - reduction='sum',输出求和

输入:input x, (N,C), C=num_classes 类别总数。输入:target y, (N), 每个值都是 0≤targets[i]≤C−1
输出:如果 reduce=True, 输出标量值. 如果 reduce=False, 输出与输入target一致, (N)(N)

输入:input x, (N,C,d1,d2,...,dK)(N,C,d1,d2,...,dK), K≥2 适用于 K-dim 场景。输入: target y, (N,d1,d2,...,dK), K≥2适用于 K-dim 场景

输出:如果 reduce=True, 输出标量值. 如果 reduce=False, 输出与输入target一致, (N,d1,d2,...,dK), K≥2 适用于 KK-dim 场景

**注意：**size_average和reduce正在被弃用，指定size_average和reduction中的任何一个都将覆盖reduce。 默认值：'mean'

### example:

In [70]:
import torch 
import torch.nn as nn
import numpy as np
import torch.nn.functional as F

In [71]:
loss1 = nn.CrossEntropyLoss(ignore_index=1)#类别1不参与计算loss
#逐个像素点求loss,输出的loss的size与target一致
loss2=nn.CrossEntropyLoss(ignore_index=1,reduction='none')

# input, [batch_size=5,num_classes=3,H,W]
input = torch.randn(5,3,3,4,requires_grad=True)
print(input.dtype)
print(input)

# target, [batch_size=5,H,W]
#target = torch.ones(5,3,4, dtype=torch.long)
target=torch.randint(0,3,size=(5,3,4),dtype=torch.long)
print(target.dtype)
print(target)

torch.float32
tensor([[[[ 5.7864e-01,  6.5832e-01,  9.5113e-01, -1.7745e-01],
          [-5.4101e-01, -1.3930e-01, -5.1329e-01,  7.7888e-01],
          [-8.7849e-02, -2.2284e+00, -6.6949e-01, -9.2986e-01]],

         [[ 6.8890e-01, -1.0171e+00,  9.6870e-01,  1.7296e-01],
          [-6.8021e-01,  1.4829e+00, -7.4961e-01,  1.6926e+00],
          [ 3.9351e-01,  1.0096e+00,  2.5396e-01,  4.4496e-01]],

         [[-1.6500e-01, -3.7595e-01, -1.1480e-01,  1.0660e+00],
          [-2.4347e-01,  3.8786e-01, -8.4753e-01, -7.6839e-02],
          [ 1.7897e+00,  8.6862e-01, -1.3892e-01, -6.2417e-01]]],


        [[[-7.3878e-01, -3.7049e-01,  1.8884e+00, -2.3619e+00],
          [-8.0387e-01,  2.5687e-01, -7.2418e-01,  1.1078e-01],
          [ 4.5610e-01, -5.4978e-01,  1.2423e+00, -7.2299e-01]],

         [[ 1.5842e+00, -5.1898e-01,  1.0901e-01, -2.1579e-01],
          [ 7.8062e-01,  5.3130e-01, -6.9257e-01, -1.7900e+00],
          [-1.2817e+00,  8.3419e-01, -7.6717e-01,  9.5915e-01]],

         [[ 1.

In [72]:
losses = loss1(input, target)

#输出是标量的Tensor
print(losses.size())
print(losses)


#标量的Tensor==>矩阵Tensor
losses=torch.unsqueeze(losses,0)
print(losses)

#对矩阵Tensor求平均==>标量的Tensor
print(losses.mean())


torch.Size([])
tensor(1.5094, grad_fn=<NllLoss2DBackward>)
tensor([1.5094], grad_fn=<UnsqueezeBackward0>)
tensor(1.5094, grad_fn=<MeanBackward1>)


In [73]:
losses = loss2(input, target)

#输出是矩阵的Tensor，size和target一致
#忽略类别的位置loss为0
print(losses.size())
print(losses)


#矩阵的Tensor==>增加1维矩阵Tensor
losses=torch.unsqueeze(losses,0)
print(losses.size())
print(losses)


#对矩阵Tensor求平均==>标量的Tensor
losses=losses.mean()
print(losses.size())
print(losses)


torch.Size([5, 3, 4])
tensor([[[0.9524, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 2.2215],
         [2.2144, 3.8839, 1.1215, 0.0000]],

        [[0.0000, 0.0000, 0.0000, 0.0000],
         [1.6710, 1.0882, 0.8814, 0.6828],
         [0.0000, 0.0000, 0.3532, 0.0000]],

        [[1.0213, 0.0000, 0.0000, 0.0000],
         [0.0000, 1.5105, 1.8683, 0.7398],
         [1.0131, 0.0000, 1.8099, 2.5796]],

        [[2.1416, 1.4608, 0.0000, 0.4400],
         [3.0803, 1.6433, 2.2815, 0.0000],
         [2.4229, 1.3022, 0.6022, 0.0000]],

        [[0.0000, 1.0876, 0.4740, 0.0000],
         [2.1615, 1.1699, 2.4651, 1.3632],
         [0.7605, 1.3165, 1.0446, 0.0000]]], grad_fn=<NllLoss2DBackward>)
torch.Size([1, 5, 3, 4])
tensor([[[[0.9524, 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000, 2.2215],
          [2.2144, 3.8839, 1.1215, 0.0000]],

         [[0.0000, 0.0000, 0.0000, 0.0000],
          [1.6710, 1.0882, 0.8814, 0.6828],
          [0.0000, 0.0000, 0.3532, 0.0000]],

 

In [74]:
ignore_label=1

In [86]:
#[5, 3, 3, 4]
pred=F.softmax(input,dim=1)
#print(pred.size())

#[5*3*4=60]
pixel_losses=loss2(input,target).contiguous().view(-1)
#print(pixel_losses.size())

#获得target中像素不为ignore_label的位置的索引
mask = target.contiguous().view(-1) != ignore_label
#print(mask)

# 将target中像素ignore_label转化为0==>tmp_target
tmp_target = target.clone()
tmp_target[tmp_target == ignore_label] = 0
lll=tmp_target.unsqueeze(1)
print(lll)

print(pred.size())
print(pred)
pred = pred.gather(1, lll)
print(pred.size())
print(pred)

tensor([[[[0, 0, 0, 0],
          [0, 0, 0, 2],
          [0, 0, 2, 0]]],


        [[[0, 0, 0, 0],
          [2, 0, 0, 0],
          [0, 0, 0, 0]]],


        [[[0, 0, 0, 0],
          [0, 0, 0, 0],
          [0, 0, 0, 2]]],


        [[[0, 0, 0, 0],
          [2, 2, 0, 0],
          [0, 0, 2, 0]]],


        [[[0, 0, 2, 0],
          [0, 0, 0, 2],
          [0, 0, 2, 0]]]])
torch.Size([5, 3, 3, 4])
tensor([[[[0.3858, 0.6482, 0.4233, 0.1699],
          [0.3109, 0.1289, 0.3991, 0.2552],
          [0.1092, 0.0206, 0.1916, 0.1584]],

         [[0.4308, 0.1214, 0.4309, 0.2411],
          [0.2705, 0.6527, 0.3151, 0.6364],
          [0.1767, 0.5242, 0.4826, 0.6265]],

         [[0.1834, 0.2304, 0.1458, 0.5890],
          [0.4186, 0.2184, 0.2857, 0.1085],
          [0.7140, 0.4552, 0.3258, 0.2151]]],


        [[[0.0559, 0.3935, 0.6591, 0.0742],
          [0.1382, 0.3368, 0.4142, 0.5052],
          [0.7306, 0.1171, 0.7024, 0.1460]],

         [[0.5709, 0.3392, 0.1112, 0.6347],
          [0.6

In [87]:
pred= pred.contiguous().view(-1,)[mask]
print(pred.size())
print(pred)

pred,ind=pred.contiguous().sort()
print(pred.size())
print(pred)
print(ind)

torch.Size([35])
tensor([0.3858, 0.1085, 0.1092, 0.0206, 0.3258, 0.1881, 0.3368, 0.4142, 0.5052,
        0.7024, 0.3601, 0.2208, 0.1544, 0.4772, 0.3631, 0.1637, 0.0758, 0.1175,
        0.2321, 0.6440, 0.0459, 0.1933, 0.1021, 0.0887, 0.2719, 0.5476, 0.3370,
        0.6225, 0.1152, 0.3104, 0.0850, 0.2558, 0.4674, 0.2681, 0.3518],
       grad_fn=<IndexBackward>)
torch.Size([35])
tensor([0.0206, 0.0459, 0.0758, 0.0850, 0.0887, 0.1021, 0.1085, 0.1092, 0.1152,
        0.1175, 0.1544, 0.1637, 0.1881, 0.1933, 0.2208, 0.2321, 0.2558, 0.2681,
        0.2719, 0.3104, 0.3258, 0.3368, 0.3370, 0.3518, 0.3601, 0.3631, 0.3858,
        0.4142, 0.4674, 0.4772, 0.5052, 0.5476, 0.6225, 0.6440, 0.7024],
       grad_fn=<SortBackward>)
tensor([ 3, 20, 16, 30, 23, 22,  1,  2, 28, 17, 12, 15,  5, 21, 11, 18, 31, 33,
        24, 29,  4,  6, 26, 34, 10, 14,  0,  7, 32, 13,  8, 25, 27, 19,  9])


In [88]:
pred.numel() - 1

34

In [None]:
class CrossEntropy(nn.Module):
    def __init__(self, ignore_label=-1, weight=None):
        super(CrossEntropy, self).__init__()
        self.ignore_label = ignore_label#255
        self.criterion = nn.CrossEntropyLoss(weight=weight, #每个class的加权
                                             ignore_index=ignore_label)#指定忽略的 target 值255,不计算loss

    def forward(self, score, target):
        '''

        :param score: 模型的输出Tensor:[bs,num_classes,128,256]
        :param target: labels Tensor:[bs,512,1024]
        :return:
        '''
        ph, pw = score.size(2), score.size(3)#128,256
        h, w = target.size(1), target.size(2)#512,1024
        #如果模型输出score大小<label的大小，对score上采样至label大小
        if ph != h or pw != w:
            score = F.upsample(
                    input=score, size=(h, w), mode='bilinear')

        loss = self.criterion(score, target)

        return loss