Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plural nouns not using singular lemma, possible pluralia tantum #33

Open
rhdunn opened this issue Nov 28, 2023 · 8 comments
Open

Plural nouns not using singular lemma, possible pluralia tantum #33

rhdunn opened this issue Nov 28, 2023 · 8 comments

Comments

@rhdunn
Copy link

rhdunn commented Nov 28, 2023

These are instances of nouns (NN) and proper nouns (NNPS) marked as plurals (Number=Plur) where the lemma is the plural form. Each of these (on a case by case basis) should either:

  1. use the singular lemma form;
  2. use Number=Ptan to mark them as plurale tantum -- see also Ambiguous lemmatization of pluralia tantum UD_English-EWT#374.

nouns

ERROR: Sentence n01009054 token 15 -- NNS/Number=Plur lemma 'species' does not match lemma-exception applied to form 'species', expected 'specie'
ERROR: Sentence n01017013 token 3 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01023034 token 21 -- NNS/Number=Plur lemma 'species' does not match lemma-exception applied to form 'species', expected 'specie'
ERROR: Sentence n01024016 token 8 -- NNS/Number=Plur lemma 'schoolchildren' does not match plural-common-noun applied to form 'schoolchildren', expected 'schoolchild'
ERROR: Sentence n01025025 token 19 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01027041 token 9 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01030008 token 19 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01035013 token 22 -- NNS/Number=Plur lemma 'whereabouts' does not match plural-common-noun applied to form 'whereabouts', expected 'whereabout'
ERROR: Sentence n01043014 token 17 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01049033 token 10 -- NNS/Number=Plur lemma 'grassroots' does not match plural-common-noun applied to form 'grassroots', expected 'grassroot'
ERROR: Sentence n01054011 token 12 -- NNS/Number=Plur lemma 'media' does not match lemma-exception applied to form 'media', expected 'medium'
ERROR: Sentence n01054017 token 2 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01058052 token 9 -- NNS/Number=Plur lemma 'rights' does not match plural-common-noun applied to form 'rights', expected 'right'
ERROR: Sentence n01058052 token 20 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01064096 token 3 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01065073 token 16 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01070020 token 1 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'People', expected 'person'
ERROR: Sentence n01079015 token 11 -- NNS/Number=Plur lemma 'jeans' does not match plural-common-noun applied to form 'jeans', expected 'jean'
ERROR: Sentence n01079065 token 3 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01082014 token 3 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01089007 token 8 -- NNS/Number=Plur lemma 'clothes' does not match plural-common-noun applied to form 'clothes', expected 'clothe'
ERROR: Sentence n01095004 token 11 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01096006 token 15 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01096040 token 9 -- NNS/Number=Plur lemma 'traps' does not match plural-common-noun applied to form 'traps', expected 'trap'
ERROR: Sentence n01101007 token 14 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'People', expected 'person'
ERROR: Sentence n01110008 token 12 -- NNS/Number=Plur lemma 'savings' does not match plural-common-noun applied to form 'savings', expected 'saving'
ERROR: Sentence n01119019 token 11 -- NNS/Number=Plur lemma 'pyjamas' does not match plural-common-noun applied to form 'pyjamas', expected 'pyjama'
ERROR: Sentence n01128021 token 6 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01128025 token 6 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01128033 token 9 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01148029 token 11 -- NNS/Number=Plur lemma 'rises' does not match plural-common-noun applied to form 'rises', expected 'rise'
ERROR: Sentence n01149027 token 7 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01150031 token 2 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n01150051 token 7 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w01001049 token 21 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w01005021 token 3 -- NNS/Number=Plur lemma 'troops' does not match plural-common-noun applied to form 'troops', expected 'troop'
ERROR: Sentence w01026024 token 4 -- NNS/Number=Plur lemma 'goods' does not match plural-common-noun applied to form 'goods', expected 'good'
ERROR: Sentence w01065018 token 2 -- NNS/Number=Plur lemma 'remains' does not match plural-common-noun applied to form 'remains', expected 'remain'
ERROR: Sentence w01067032 token 6 -- NNS/Number=Plur lemma 'defens' does not match plural-common-noun applied to form 'defenses', expected 'defense'
ERROR: Sentence w01075038 token 10 -- NNS/Number=Plur lemma 'earnings' does not match plural-common-noun applied to form 'earnings', expected 'earning'
ERROR: Sentence w01075038 token 20 -- NNS/Number=Plur lemma 'earnings' does not match plural-common-noun applied to form 'earnings', expected 'earning'
ERROR: Sentence w01086037 token 13 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w01089017 token 15 -- NNS/Number=Plur lemma 'goods' does not match plural-common-noun applied to form 'goods', expected 'good'
ERROR: Sentence w01096055 token 20 -- NNS/Number=Plur lemma 'economics' does not match plural-common-noun applied to form 'economics', expected 'economic'
ERROR: Sentence w01112060 token 11 -- NNS/Number=Plur lemma 'rights' does not match plural-common-noun applied to form 'rights', expected 'right'
ERROR: Sentence w01119100 token 7 -- NNS/Number=Plur lemma 'rights' does not match plural-common-noun applied to form 'rights', expected 'right'
ERROR: Sentence w01131060 token 4 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w01132056 token 14 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w01132056 token 19 -- NNS/Number=Plur lemma 'defens' does not match plural-common-noun applied to form 'defenses', expected 'defense'
ERROR: Sentence w01140032 token 14 -- NNS/Number=Plur lemma 'emissions' does not match plural-common-noun applied to form 'emissions', expected 'emission'
ERROR: Sentence n02011003 token 15 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n02027021 token 27 -- NNS/Number=Plur lemma 'troops' does not match plural-common-noun applied to form 'troops', expected 'troop'
ERROR: Sentence n02042028 token 11 -- NNS/Number=Plur lemma 'seconds' does not match plural-common-noun applied to form 'seconds', expected 'second'
ERROR: Sentence n02074009 token 19 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n04001013 token 16 -- NNS/Number=Plur lemma 'rights' does not match plural-common-noun applied to form 'rights', expected 'right'
ERROR: Sentence n05002004 token 26 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence n05004025 token 11 -- NNS/Number=Plur lemma 'rights' does not match plural-common-noun applied to form 'rights', expected 'right'
ERROR: Sentence n05010004 token 14 -- NNS/Number=Plur lemma 'media' does not match lemma-exception applied to form 'media', expected 'medium'
ERROR: Sentence w02002093 token 4 -- NNS/Number=Plur lemma 'grounds' does not match plural-common-noun applied to form 'grounds', expected 'ground'
ERROR: Sentence w02012042 token 13 -- NNS/Number=Plur lemma 'politics' does not match plural-common-noun applied to form 'politics', expected 'politic'
ERROR: Sentence w02014013 token 5 -- NNS/Number=Plur lemma 'means' does not match plural-common-noun applied to form 'means', expected 'mean'
ERROR: Sentence w02015086 token 16 -- NNS/Number=Plur lemma 'troops' does not match plural-common-noun applied to form 'troops', expected 'troop'
ERROR: Sentence w02017061 token 7 -- NNS/Number=Plur lemma 'clothes' does not match plural-common-noun applied to form 'clothes', expected 'clothe'
ERROR: Sentence w02019048 token 14 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w03002055 token 13 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w04001027 token 21 -- NNS/Number=Plur lemma 'people' does not match lemma-exception applied to form 'people', expected 'person'
ERROR: Sentence w04008006 token 2 -- NNS/Number=Plur lemma 'regards' does not match plural-common-noun applied to form 'regards', expected 'regard'
ERROR: Sentence w05010026 token 25 -- NNS/Number=Plur lemma 'troops' does not match plural-common-noun applied to form 'troops', expected 'troop'

proper nouns

ERROR: Sentence n01001011 token 13 -- NNPS/Number=Plur lemma 'States' does not match plural-proper-noun applied to form 'States', expected 'State'
ERROR: Sentence n01034060 token 13 -- NNPS/Number=Plur lemma 'States' does not match plural-proper-noun applied to form 'States', expected 'State'
ERROR: Sentence n01050014 token 21 -- NNPS/Number=Plur lemma 'Services' does not match plural-proper-noun applied to form 'Services', expected 'Service'
ERROR: Sentence n01054017 token 12 -- NNPS/Number=Plur lemma 'Nations' does not match plural-proper-noun applied to form 'Nations', expected 'Nation'
ERROR: Sentence n01069006 token 3 -- NNPS/Number=Plur lemma 'Democrats' does not match plural-proper-noun applied to form 'Democrats', expected 'Democrat'
ERROR: Sentence n01110022 token 12 -- NNPS/Number=Plur lemma 'Savings' does not match plural-proper-noun applied to form 'Savings', expected 'Saving'
ERROR: Sentence n01110022 token 14 -- NNPS/Number=Plur lemma 'Investments' does not match plural-proper-noun applied to form 'Investments', expected 'Investment'
ERROR: Sentence n01130025 token 24 -- NNPS/Number=Plur lemma 'States' does not match plural-proper-noun applied to form 'States', expected 'State'
ERROR: Sentence n01131007 token 5 -- NNPS/Number=Plur lemma 'Floridians' does not match plural-proper-noun applied to form 'Floridians', expected 'Floridian'
ERROR: Sentence n01136006 token 8 -- NNPS/Number=Plur lemma 'Lovings' does not match plural-proper-noun applied to form 'Lovings', expected 'Loving'
ERROR: Sentence n01143009 token 2 -- NNPS/Number=Plur lemma 'Australians' does not match plural-proper-noun applied to form 'Australians', expected 'Australian'
ERROR: Sentence n01145034 token 23 -- NNPS/Number=Plur lemma 'Mexicans' does not match plural-proper-noun applied to form 'Mexicans', expected 'Mexican'
ERROR: Sentence w01003051 token 5 -- NNPS/Number=Plur lemma 'Islands' does not match plural-proper-noun applied to form 'Islands', expected 'Island'
ERROR: Sentence w01003051 token 11 -- NNPS/Number=Plur lemma 'Maldives' does not match plural-proper-noun applied to form 'Maldives', expected 'Maldive'
ERROR: Sentence w01005020 token 15 -- NNPS/Number=Plur lemma 'Balkans' does not match plural-proper-noun applied to form 'Balkans', expected 'Balkan'
ERROR: Sentence w01005021 token 14 -- NNPS/Number=Plur lemma 'Paeonians' does not match plural-proper-noun applied to form 'Paeonians', expected 'Paeonian'
ERROR: Sentence w01005023 token 10 -- NNPS/Number=Plur lemma 'Balkans' does not match plural-proper-noun applied to form 'Balkans', expected 'Balkan'
ERROR: Sentence w01010044 token 17 -- NNPS/Number=Plur lemma 'Slavs' does not match plural-proper-noun applied to form 'Slavs', expected 'Slav'
ERROR: Sentence w01018029 token 10 -- NNPS/Number=Plur lemma 'Cameroons' does not match plural-proper-noun applied to form 'Cameroons', expected 'Cameroon'
ERROR: Sentence w01020019 token 18 -- NNPS/Number=Plur lemma 'Phoenicians' does not match plural-proper-noun applied to form 'Phoenicians', expected 'Phoenician'
ERROR: Sentence w01020020 token 11 -- NNPS/Number=Plur lemma 'Romans' does not match plural-proper-noun applied to form 'Romans', expected 'Roman'
ERROR: Sentence w01029049 token 2 -- NNPS/Number=Plur lemma 'Americans' does not match plural-proper-noun applied to form 'Americans', expected 'American'
ERROR: Sentence w01029049 token 5 -- NNPS/Number=Plur lemma 'Americas' does not match plural-proper-noun applied to form 'Americas', expected 'America'
ERROR: Sentence w01030092 token 2 -- NNPS/Number=Plur lemma 'Alps' does not match plural-proper-noun applied to form 'Alps', expected 'Alp'
ERROR: Sentence w01030093 token 17 -- NNPS/Number=Plur lemma 'Alps' does not match plural-proper-noun applied to form 'Alps', expected 'Alp'
ERROR: Sentence w01030096 token 32 -- NNPS/Number=Plur lemma 'Alps' does not match plural-proper-noun applied to form 'Alps', expected 'Alp'
ERROR: Sentence w01033067 token 11 -- NNPS/Number=Plur lemma 'Melanesians' does not match plural-proper-noun applied to form 'Melanesians', expected 'Melanesian'
ERROR: Sentence w01037080 token 23 -- NNPS/Number=Plur lemma 'Americas' does not match plural-proper-noun applied to form 'Americas', expected 'America'
ERROR: Sentence w01039032 token 13 -- NNPS/Number=Plur lemma 'Aborigines' does not match plural-proper-noun applied to form 'Aborigines', expected 'Aborigine'
ERROR: Sentence w01045002 token 9 -- NNPS/Number=Plur lemma 'Caribs' does not match plural-proper-noun applied to form 'Caribs', expected 'Carib'
ERROR: Sentence w01045003 token 29 -- NNPS/Number=Plur lemma 'Antilles' does not match plural-proper-noun applied to form 'Antilles', expected 'Antille'
ERROR: Sentence w01045005 token 6 -- NNPS/Number=Plur lemma 'Europeans' does not match plural-proper-noun applied to form 'Europeans', expected 'European'
ERROR: Sentence w01058013 token 4 -- NNPS/Number=Plur lemma 'Games' does not match plural-proper-noun applied to form 'Games', expected 'Game'
ERROR: Sentence w01060041 token 2 -- NNPS/Number=Plur lemma 'Khitans' does not match plural-proper-noun applied to form 'Khitans', expected 'Khitan'
ERROR: Sentence w01068027 token 13 -- NNPS/Number=Plur lemma 'Hills' does not match plural-proper-noun applied to form 'Hills', expected 'Hill'
ERROR: Sentence w01069040 token 20 -- NNPS/Number=Plur lemma 'Netherlands' does not match plural-proper-noun applied to form 'Netherlands', expected 'Netherland'
ERROR: Sentence w01069094 token 26 -- NNPS/Number=Plur lemma 'Netherlands' does not match plural-proper-noun applied to form 'Netherlands', expected 'Netherland'
ERROR: Sentence w01072065 token 16 -- NNPS/Number=Plur lemma 'Romans' does not match plural-proper-noun applied to form 'Romans', expected 'Roman'
ERROR: Sentence w01076054 token 13 -- NNPS/Number=Plur lemma 'Ottomans' does not match plural-proper-noun applied to form 'Ottomans', expected 'Ottoman'
ERROR: Sentence w01079077 token 14 -- NNPS/Number=Plur lemma 'Jews' does not match plural-proper-noun applied to form 'Jews', expected 'Jew'
ERROR: Sentence w01080131 token 37 -- NNPS/Number=Plur lemma 'Athenians' does not match plural-proper-noun applied to form 'Athenians', expected 'Athenian'
ERROR: Sentence w01080131 token 39 -- NNPS/Number=Plur lemma 'Spartans' does not match plural-proper-noun applied to form 'Spartans', expected 'Spartan'
ERROR: Sentence w01085007 token 3 -- NNPS/Number=Plur lemma 'States' does not match plural-proper-noun applied to form 'States', expected 'State'
ERROR: Sentence w01085008 token 10 -- NNPS/Number=Plur lemma 'States' does not match plural-proper-noun applied to form 'States', expected 'State'
ERROR: Sentence w01085008 token 21 -- NNPS/Number=Plur lemma 'Philippines' does not match plural-proper-noun applied to form 'Philippines', expected 'Philippine'
ERROR: Sentence w01096055 token 12 -- NNPS/Number=Plur lemma 'Allies' does not match plural-proper-noun applied to form 'Allies', expected 'Ally'
ERROR: Sentence w01125034 token 13 -- NNPS/Number=Plur lemma 'Commons' does not match plural-proper-noun applied to form 'Commons', expected 'Common'
ERROR: Sentence w01125035 token 10 -- NNPS/Number=Plur lemma 'Tories' does not match plural-proper-noun applied to form 'Tories', expected 'Tory'
ERROR: Sentence w01130100 token 14 -- NNPS/Number=Plur lemma 'Stealers' does not match plural-proper-noun applied to form 'Stealers', expected 'Stealer'
ERROR: Sentence w01130101 token 24 -- NNPS/Number=Plur lemma 'Records' does not match plural-proper-noun applied to form 'Records', expected 'Record'
ERROR: Sentence w01130102 token 8 -- NNPS/Number=Plur lemma 'Humblebums' does not match plural-proper-noun applied to form 'Humblebums', expected 'Humblebum'
ERROR: Sentence w01135036 token 12 -- NNPS/Number=Plur lemma 'Muppets' does not match plural-proper-noun applied to form 'Muppets', expected 'Muppet'
ERROR: Sentence w01150044 token 3 -- NNPS/Number=Plur lemma 'Powers' does not match plural-proper-noun applied to form 'Powers', expected 'Power'
ERROR: Sentence w01150045 token 17 -- NNPS/Number=Plur lemma 'Powers' does not match plural-proper-noun applied to form 'Powers', expected 'Power'
ERROR: Sentence w01150047 token 22 -- NNPS/Number=Plur lemma 'Powers' does not match plural-proper-noun applied to form 'Powers', expected 'Power'
ERROR: Sentence n03002003 token 6 -- NNPS/Number=Plur lemma 'States' does not match plural-proper-noun applied to form 'States', expected 'State'
ERROR: Sentence n05008018 token 10 -- NNPS/Number=Plur lemma 'Americans' does not match plural-proper-noun applied to form 'Americans', expected 'American'
ERROR: Sentence w02005027 token 13 -- NNPS/Number=Plur lemma 'Franciscans' does not match plural-proper-noun applied to form 'Franciscans', expected 'Franciscan'
ERROR: Sentence w02007061 token 2 -- NNPS/Number=Plur lemma 'Ottomans' does not match plural-proper-noun applied to form 'Ottomans', expected 'Ottoman'
ERROR: Sentence w02013076 token 9 -- NNPS/Number=Plur lemma 'Socialists' does not match plural-proper-noun applied to form 'Socialists', expected 'Socialist'
ERROR: Sentence w03005012 token 14 -- NNPS/Number=Plur lemma 'Ages' does not match plural-proper-noun applied to form 'Ages', expected 'Age'
ERROR: Sentence w03005013 token 12 -- NNPS/Number=Plur lemma 'Greeks' does not match plural-proper-noun applied to form 'Greeks', expected 'Greek'
ERROR: Sentence w03006024 token 4 -- NNPS/Number=Plur lemma 'Ages' does not match plural-proper-noun applied to form 'Ages', expected 'Age'
ERROR: Sentence w03006024 token 7 -- NNPS/Number=Plur lemma 'Christians' does not match plural-proper-noun applied to form 'Christians', expected 'Christian'
ERROR: Sentence w03010096 token 9 -- NNPS/Number=Plur lemma 'Remis' does not match plural-proper-noun applied to form 'Remis', expected 'Remi'
ERROR: Sentence w04007021 token 19 -- NNPS/Number=Plur lemma 'Austrians' does not match plural-proper-noun applied to form 'Austrians', expected 'Austrian'
ERROR: Sentence w04007037 token 17 -- NNPS/Number=Plur lemma 'Isles' does not match plural-proper-noun applied to form 'Isles', expected 'Isle'
ERROR: Sentence w05001045 token 17 -- NNPS/Number=Plur lemma 'Jews' does not match plural-proper-noun applied to form 'Jews', expected 'Jew'
ERROR: Sentence w05002014 token 3 -- NNPS/Number=Plur lemma 'Andes' does not match plural-proper-noun applied to form 'Andes', expected 'Ande'
ERROR: Sentence w05002032 token 3 -- NNPS/Number=Plur lemma 'Andes' does not match plural-proper-noun applied to form 'Andes', expected 'Ande'
ERROR: Sentence w05010023 token 28 -- NNPS/Number=Plur lemma 'Caesarians' does not match plural-proper-noun applied to form 'Caesarians', expected 'Caesarian'
@AngledLuffa
Copy link
Contributor

There's a lot of species in EWT and GUM which are lemmatized species, such as here. I don't see any particular feature alerting that it's a non-standard plural lemma, either.

# sent_id = GUM_interview_ants-60
# s_prominence = 2
# s_type = decl
# speaker = NickBos
# transition = smooth-shift
# text = However, of course this is proof in just one species.
1       However however ADV     RB      _       7       advmod  7:advmod        Discourse=adversative-concession:134->123:3:dm-however-1040|MSeg=How-ever|SpaceAfter=No
2       ,       ,       PUNCT   ,       _       1       punct   1:punct _
3       of      of      ADP     IN      _       7       advmod  7:advmod        _
4       course  course  NOUN    NN      Number=Sing     3       fixed   3:fixed _
5       this    this    PRON    DT      Number=Sing|PronType=Dem        7       nsubj   7:nsubj Entity=(171-abstract-giv:act-cf1*-1-ana)
6       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   7       cop     7:cop   _
7       proof   proof   NOUN    NN      Number=Sing     0       root    0:root  Entity=(182-abstract-new-cf3-1-sgl)
8       in      in      ADP     IN      _       11      case    11:case _
9       just    just    ADV     RB      _       10      advmod  10:advmod       Entity=(6-animal-giv:inact-cf2-3-coref
10      one     one     NUM     CD      NumForm=Word|NumType=Card       11      nummod  11:nummod       _
11      species species NOUN    NN      Number=Sing     7       nmod    7:nmod:in       Entity=6)|SpaceAfter=No
12      .       .       PUNCT   .       _       7       punct   7:punct _

whereabouts and grassroots also seem like they are naturally plural words.

media used in a context other than measuring the speed of light through jello is again only ever really used in the plural sense, and lemmatized that way in EWT and GUM

I've never heard of single jeans (would that just be one leg?), and in EWT they lemmatize it with the plural, but they also lemmatize pants as the plural whereas GUM lemmatizes pants to pant, so that probably deserves a separate issue. In fact, a similar case has already been addressed once before: UniversalDependencies/UD_English-EWT#147

new issue created:
UniversalDependencies/UD_English-EWT#476

clothes and pyjamas are similarly weird to lemmatize as a singular IMO

schoolchildren and rights are pretty unambiguous. Human Rights or the like is frequently lemmatized to Right in both EWT and GUM, for example.

traps -> trap is also cut & dry IMO. Same with rises -> rise and seconds -> second

remains I kinda want to keep the plural, which is what EWT does, as opposed to making it singular such as in GUM

UniversalDependencies/UD_English-EWT#477

economics and regards almost certainly will also stay plural, right? means also. I think grounds might be a bit less clear

UniversalDependencies/UD_English-EWT#478

@martinpopel
Copy link
Member

There are many more such words, some questionable. I've analyzed the Morpha lexicon, BNC annotations and several other resources when building this lemmatizer. Perhaps there are better resources today, but maybe it can still be of some use.

@rhdunn
Copy link
Author

rhdunn commented Nov 28, 2023

For "species" -- as with other words such as "glasses" -- it should be annotated as a plurale tantum:

Some nouns appear only in the plural form even though they denote one thing (semantic singular); some tagsets mark this distinction.

Hence me linking to the plurale tantum issue in EWT.

Note: My validator is applying plural stemming rules for Number=Plur nouns and proper nouns. For collective and plural tantum tokens, my validator expects lower case forms for nouns and capitalized forms for proper nouns.

@AngledLuffa
Copy link
Contributor

That makes sense, but I've never seen the plurale tantum feature on a word in either EWT or GUM. I'd prefer to wait for the larger treebanks to settle on a standard before implementing that here

@rhdunn
Copy link
Author

rhdunn commented Nov 28, 2023

I've created an issue in the docs repo to ask about adding Number=Ptan linked above.

@amir-zeldes
Copy link

Discussed with @nschneid earlier, I think we'd both be willing to use Ptan. I've posted it elsewhere too, but the GUM validator also maintains a list of acceptable xpos=NNS where form=lemma. Here again for convenience:

https://github.com/amir-zeldes/gum/blob/master/_build/utils/validate.py#L725-L735

I can trivially assign Ptan to exactly the same items that the validator accepts, but it would be nice to have a definitive list/guidelines for consistency across corpora.

@nschneid
Copy link
Contributor

See UniversalDependencies/docs#999 for the suggested definition of Ptan (but not a full lexical list)

@AngledLuffa
Copy link
Contributor

Where are we with this at the moment? @rhdunn would you recheck the list of words which are not properly featurized / lemmatized, and we'll take a look at fixing those up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants