Skip to content

Commit

Permalink
Punctuation must be checked first.
Browse files Browse the repository at this point in the history
  • Loading branch information
dan-zeman committed Apr 5, 2018
1 parent 963858f commit 0178047
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions remove_sense_suffixes_from_lemmas.pl
Expand Up @@ -22,18 +22,18 @@
{
@misc = split(/\|/, $f[9]);
}
# Lemma should not contain a numerical suffix that disambiguates word senses.
# Such disambiguation, if desired, should go to the LId attribute in MISC.
if($form !~ m/\d/ && $lemma =~ m/(.*\D)-?\d+$/)
# Lemma of punctuation symbols should be the symbols themselves, as in most other treebanks.
if($form =~ m/^\pP+$/ && $lemma =~ m/\PP/)
{
$f[2] = $1;
$f[2] = $form;
@misc = grep {!m/^LId=/} (@misc);
push(@misc, "LId=$lemma");
}
# Lemma of punctuation symbols should be the symbols themselves, as in most other treebanks.
elsif($form =~ m/^\pP+$/ && $lemma =~ m/\PP/)
# Lemma should not contain a numerical suffix that disambiguates word senses.
# Such disambiguation, if desired, should go to the LId attribute in MISC.
elsif($form !~ m/\d/ && $lemma =~ m/(.*\D)-?\d+$/)
{
$f[2] = $form;
$f[2] = $1;
@misc = grep {!m/^LId=/} (@misc);
push(@misc, "LId=$lemma");
}
Expand Down

0 comments on commit 0178047

Please sign in to comment.