Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Implementation of Weighted CRF Tagger (handling unbalanced datasets) #341

Merged
merged 14 commits into from Jul 14, 2022

Conversation

eraldoluis
Copy link
Contributor

Closes allennlp issue #4619.

Depends on allennlp PR #5676

Changes proposed in this pull request:

  • I implemented and experimentally compared three sample weighting strategies for CrfTagger.
  • I added two parameters to CrfTagger: label_weights and weight_strategy.
  • The parameter label_weights is a Dict[str, float] with a mapping {label : weight} to be used in the loss function in order to give different weights for each token depending on its label.
  • The parameter weight_strategy can be: None 'emission', 'emission_transition' or 'lannoy'.
  • If label_weights is given and weight_strategy is None or 'emission', then the emission score of each tag is multiplied by the corresponding weight (as given by label_weights).
  • If emission_transition, both emission and transition scores of each tag are multiplied by the corresponding weight.
  • If weight_strategy is 'lannoy', then we use the strategy proposed by Lannoy et al. (2019).
  • An experimental comparison among these three strategies and a brief discussion of their differences here.
  • Tests were created to cover the new feature.

Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

CHANGELOG.md Outdated Show resolved Hide resolved
@epwalsh epwalsh enabled auto-merge (squash) July 14, 2022 00:42
@epwalsh epwalsh merged commit 97df196 into allenai:main Jul 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants