This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as,
by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

$$ l(\mathbf{x}, \mathbf{y}) = L = (l_1, l_2, \cdots, l_N)^T \qquad \text{if  reduction='none'}$$

* $$ l_n = -\mathbf{w}_n [ \mathbf{y}_n \cdot \log \sigma(\mathbf{x}_n) + (1 - \mathbf{y}_n) \cdot ( 1 - \log \sigma( \mathbf{x}_n))] $$

where N is the batch size. If reduction is not 'none' (default 'mean'), then

$$
\begin{equation}
l(X, \mathbf{y}) =\begin{cases}
		\mathrm{mean}(L), & \text{if  reduction='mean'} \\
        \mathrm{sum}(L), & \text{if  reduction='sum'}
     \end{cases}
\end{equation}
$$

Note that the targets $ \mathbf{y}$ should be numbers between 0 and 1

* Input: $(*)$where * means, any number of additional dimensions

* Target: $(*)$, same shape as the input

* Output: scalar. If reduction is 'none', then (*), same shape as input.

In [8]:
import torch
import torch.nn.functional as F
import torch.nn as nn

In [9]:
# enter.shape=(2, 3)
enter = torch.tensor([[0.5, 0.4, 0.3],
                      [0.3, 0.2, 0.5]])
# target.shape=(2, 3)
target = torch.tensor([[0., 1., 0.],
                       [1., 0., 0.]])

# Function that measures Binary Cross Entropy between target and output logits.
print(F.binary_cross_entropy_with_logits(enter, target, reduction='mean'))  # 默认reduction='mean',参考BCELos
print(F.binary_cross_entropy_with_logits(enter, target, reduction='sum'))
print(F.binary_cross_entropy_with_logits(enter, target, reduction='none'))  # shape(2, 3)

tensor(0.7780)
tensor(4.6680)
tensor([[0.9741, 0.5130, 0.8544],
        [0.5544, 0.7981, 0.9741]])


In [10]:
# 参数reduction含义与BCEloss相同
nn.BCEWithLogitsLoss(reduction='none',
                     # weight(样本权重,可间接实现类别权重)形状与enter形状相等;默认weight=None,此时weight为全为1的张量
                     weight=torch.ones_like(enter))(enter, target)

tensor([[0.9741, 0.5130, 0.8544],
        [0.5544, 0.7981, 0.9741]])

### 上式计算步骤如下

In [11]:
weight = torch.ones_like(enter)  # 默认权重
weight

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [12]:
# sigmoid函数对应计算公式中的$\sigma$
step1 = (target * torch.log(torch.sigmoid(enter)) + (1 - target) * torch.log(1 - torch.sigmoid(enter)))  # 样本交叉熵
step1

tensor([[-0.9741, -0.5130, -0.8544],
        [-0.5544, -0.7981, -0.9741]])

In [13]:
the_sum = torch.sum(step1 * weight)  # 最终结果
the_sum

tensor(-4.6680)

In [14]:
the_mean = torch.mean(step1 * weight)
the_mean


tensor(-0.7780)