Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Commit

Permalink
Warn about order of label_vocab for binary classification (#1435)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #1435

As Junteng reports:

> For example, if you have two possible labels in your training data, namely, "0" and "1". If you specify label_vocab as ["0", "1"], then "0" gets map to 0, and "1" gets map to 1. On the other hand, if you specify label_vocab as ["1", "0"], then "0" gets map to 1, and "1" gets map to 0.
> Although this is not important for multi-class classification with negative log-likelihood loss, whether a label gets mapped to 0 or 1 matters in CosineEmbeddingLoss

Reviewed By: m3rlin45

Differential Revision: D22641684

fbshipit-source-id: f74c83ed3320286d394546cb6394fd34e7e65f04
  • Loading branch information
jeanm authored and facebook-github-bot committed Aug 29, 2020
1 parent 7bf61b2 commit 49a45b7
Showing 1 changed file with 10 additions and 5 deletions.
15 changes: 10 additions & 5 deletions pytext/data/tensorizers.py
Expand Up @@ -948,24 +948,29 @@ def sort_key(self, row):


class LabelTensorizer(Tensorizer):
"""Numberize labels. Label can be used as either input or target """
"""Numberize labels. Label can be used as either input or target.
NB: if the labels are used as targets for binary classification with a loss
such as cosine distance, the order of the `label_vocab` *does* matter,
and it should be `[negative_class, positive_class]`.
"""

__EXPANSIBLE__ = True

class Config(Tensorizer.Config):
#: The name of the label column to parse from the data source.
column: str = "label"
#: Whether to allow for unknown labels at test/prediction time
#: Whether to allow for unknown labels at test/prediction time.
allow_unknown: bool = False
#: if vocab should have pad, usually false when label is used as target
#: Whether vocab should have pad, usually false when label is used as target.
pad_in_vocab: bool = False
#: The label values, if known. Will skip initialization step if provided.
label_vocab: Optional[List[str]] = None
#: File with the label values. This can be used when the label space is
#: too large to specify these as a list. The file should not contain
#: a header
#: a header.
label_vocab_file: Optional[str] = None
# Indicate if it can be used to generate input Tensors for prediction
# Indicate if it can be used to generate input Tensors for prediction.
is_input: bool = False

@classmethod
Expand Down

0 comments on commit 49a45b7

Please sign in to comment.