For tensors of the same shape $y_{\text{pred}},\ y_{\text{true}} $, where $y_{\text{pred}}$ is the input
and $ y_{\text{true}} $ is the target, we define the pointwise KL-divergence as

$$ L\left(y_{\text {pred }}, y_{\text {true }}\right)=y_{\text {true }} \cdot \log \frac{y_{\text {true }}}{y_{\text {pred }}}=y_{\text {true }} \cdot\left(\log y_{\text {true }}-\log y_{\text {pred }}\right) $$

To summarise, this function is roughly equivalent to computing

```python
if not log_target: # default
    loss_pointwise = target * (target.log() - input)
else:
    loss_pointwise = target.exp() * (target - input)
```

and then reducing this result depending on the argument reduction as

```python
if reduction == "mean":  # default
    loss = loss_pointwise.mean()
elif reduction == "batchmean":  # mathematically correct
    loss = loss_pointwise.sum() / input.size(0)
elif reduction == "sum":
    loss = loss_pointwise.sum()
else:  # reduction == "none"
    loss = loss_pointwise
```

In [28]:
import torch.nn as nn
import torch.nn.functional as F
import torch

In [29]:
input = F.log_softmax(torch.randn(3, 5), dim=1)
target = F.softmax(torch.rand(3, 5), dim=1)

In [30]:
# KL散度
F.kl_div(input=input, target=target, reduction='batchmean',
         log_target=True)  # 默认log_target=False

tensor(0.5952)

In [31]:
kl_loss = nn.KLDivLoss(reduction="batchmean")
output_nn = kl_loss(input, target)
output_nn

tensor(0.5952)