-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Description
Hello,
F.cross_entropy returns infs sometimes due to it summing the losses.
Findings:
The following SOMETIMES returns inf loss (the default options in F.cross_entropy)
import torch.nn.functional as F
import torch as th
logits = th.randn(32, 1000)
labels = th.randint(low=0, high=1000, size=(32, ))
labels[18] = -100
loss = F.cross_entropy(logits, labels, ignore_index=-100, reduction="mean")
print(loss)The following will NOT return inf
loss = F.cross_entropy(logits, labels, ignore_index=-100, reduction="none")
loss = loss.mean()
print(loss)The following SOMETIMES returns inf
loss = F.cross_entropy(logits, labels, ignore_index=-100, reduction="none")
loss = loss.sum()
print(loss)The solution I am currently using to do normal cross entropy in DeepSpeed:
loss = F.cross_entropy(logits, labels, reduction="none")
numel = labels.numel()
numel_no_mask = labels.ne(-100).sum()
norm = numel_no_mask / numel
loss = loss.mean() / norm
print(loss)Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels