Skip to content

F.cross_entropy returns infs sometimes due to it summing the losses. #962

@aced125

Description

@aced125

Hello,

F.cross_entropy returns infs sometimes due to it summing the losses.

Findings:

The following SOMETIMES returns inf loss (the default options in F.cross_entropy)

import torch.nn.functional as F
import torch as th

logits = th.randn(32, 1000)
labels = th.randint(low=0, high=1000, size=(32, ))
labels[18] = -100

loss = F.cross_entropy(logits, labels, ignore_index=-100, reduction="mean")
print(loss)

The following will NOT return inf

loss = F.cross_entropy(logits, labels, ignore_index=-100, reduction="none")
loss = loss.mean()
print(loss)

The following SOMETIMES returns inf

loss = F.cross_entropy(logits, labels, ignore_index=-100, reduction="none")
loss = loss.sum()
print(loss)

The solution I am currently using to do normal cross entropy in DeepSpeed:

loss = F.cross_entropy(logits, labels, reduction="none")
numel = labels.numel()
numel_no_mask = labels.ne(-100).sum()
norm = numel_no_mask / numel

loss = loss.mean() / norm
print(loss)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions