Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrected tokens with LEMMA reflecting the CorrectForm, not the FORM #41

Open
rhdunn opened this issue Nov 28, 2023 · 2 comments
Open

Comments

@rhdunn
Copy link

rhdunn commented Nov 28, 2023

The following have a CorrectForm=inand are correctly marked up as Typo=Yes.

ERROR: Sentence n01104011 token 20 -- IN lemma 'in' does not match lowercase-form applied to form 'is', expected 'is'
ERROR: Sentence n01125009 token 15 -- IN lemma 'in' does not match lowercase-form applied to form 'is', expected 'is'

According to Wrong Morphology or Syntax the main fields should reflect the source text ("is") and not the corrected text ("in"). As such:

  1. The LEMMA should be "is" if the surface POS is IN, but shoud be "be" if a verb.
  2. A CorrectLemma=in should be added to reflect the corrected form.

I haven't confirmed this, but I suspect that the POS fields need to reflect that is is a verb, in which case it would need CorrectUPOS and CorrectXPOS per [1] as well.

[1] https://universaldependencies.org/misc.html#correctfeature

@AngledLuffa
Copy link
Contributor

There are quite a few instances of lemmas having the corrected lemma in EWT, and I'm a little shy about deviating from the standard they set there

@rhdunn
Copy link
Author

rhdunn commented Nov 28, 2023

I'm planning on raising similar issues in the other treebanks. Maybe this is something that needs discussion in a docs issue?

My rationale for this issue is that it deviates from UD guidelines per the linked typo and MISC annotation docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants