Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Feature Request for SoftmaxCrossEntropyLoss with Ignore labels #10799

Open
zhanghang1989 opened this issue May 3, 2018 · 7 comments
Open

Feature Request for SoftmaxCrossEntropyLoss with Ignore labels #10799

zhanghang1989 opened this issue May 3, 2018 · 7 comments

Comments

@zhanghang1989
Copy link
Contributor

Although we can implement this using existing operators, but current implementation is not efficient and very memory consuming. See the code:

class SoftmaxCrossEntropyLoss(Loss):
    """SoftmaxCrossEntropyLoss with ignore labels"""
    def __init__(self, axis=1, sparse_label=True, from_logits=False, weight=None,
                 batch_axis=0, ignore_label=-1, size_average=False, **kwargs):
        super(SoftmaxCrossEntropyLoss, self).__init__(weight, batch_axis, **kwargs)
        self._axis = axis
        self._sparse_label = sparse_label
        self._from_logits = from_logits
        self._ignore_label = ignore_label
        self._size_average = size_average

    def hybrid_forward(self, F, pred, label, sample_weight=None):
        if not self._from_logits:
            pred = F.log_softmax(pred, axis=self._axis)
        if self._sparse_label:
            if self._size_average:
                valid_label_map = (label != self._ignore_label).astype('float32')
                loss = -(F.pick(pred, label, axis=self._axis, keepdims=True) * valid_label_map)
            else:
                loss = -F.pick(pred, label, axis=self._axis, keepdims=True)
                loss = F.where(label.expand_dims(axis=self._axis) == self._ignore_label,
                           F.zeros_like(loss), loss)
        else:
            label = _reshape_like(F, label, pred)
            loss = -F.sum(pred*label, axis=self._axis, keepdims=True)
        loss = _apply_weighting(F, loss, self._weight, sample_weight)
        if self._size_average:
            return F.mean(loss, axis=self._batch_axis, exclude=True) * \
                valid_label_map.size / F.sum(valid_label_map)
        else:
            return F.mean(loss, axis=self._batch_axis, exclude=True)

When the channels/number of classes is very large, the valid_label_map will be huge. Please let me know if there is better solution or someone could implement it using backend like PyTorch does (https://pytorch.org/docs/stable/nn.html?highlight=crossentropyloss#torch.nn.CrossEntropyLoss)?

@roywei
Copy link
Member

roywei commented May 4, 2018

@nswamy could you help to add label feature request, performance? Thanks

@chinakook
Copy link
Contributor

F.SoftmaxOutput is a solution but without weighting

@szha
Copy link
Member

szha commented May 22, 2018

@zhanghang1989 this does not work because valid_label_map.size is used. hybrid_forward doesn't support using shape information if the block is to be hybridized.

@zhanghang1989
Copy link
Contributor Author

Agree. This is just a PoC for feature request :)

@szha
Copy link
Member

szha commented May 23, 2018

For this feature, we will need two more sparse operators for best efficiency:

  1. where(cond, x, y) in which cond is sparse, x/y can be scalar, and return value is dense.
  2. eq(x, y) where the return value is sparse.

@szha szha added the Sparse label May 23, 2018
@zhanghang1989
Copy link
Contributor Author

Another component needed is an operator return the NDArray.size

@zhanghang1989
Copy link
Contributor Author

zhanghang1989 commented Aug 9, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants