Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Missing Annotations on certain sentence #27

Closed
aashish-kumar opened this issue Jul 1, 2021 · 5 comments
Closed

Handling Missing Annotations on certain sentence #27

aashish-kumar opened this issue Jul 1, 2021 · 5 comments

Comments

@aashish-kumar
Copy link

I am not able to generate m2 files for the case when annotations are missing for certain sentences for some of the annotators. Choosing orig==annotated has its side effects. Am I missing something?

@chrisjbryant
Copy link
Owner

Hi, I'm not sure what you mean exactly. Can you give an example?

@aashish-kumar
Copy link
Author

Example: (to generate .m2 file like this one, assuming two annotators and some sentences having only single annotator)
'''
This is a tet sentence.
A 3 3 ||..........|||0
A 3 3 ||..........|||1
This is second tet sentence.
A 3 3 ||...........|||0
This is the third tet sentence.
A 4 4 ||............|||1
'''

@chrisjbryant
Copy link
Owner

Aha right ok. So in some files (mainly NUCLE and CoNLL) whenever an annotator is "missing", the implication is that they made no changes to the sentence. So in your above example, Annotator 1 thought the second sentence is already correct, and Annotator 0 thought the third sentence is already correct.

If it makes things easier, you can add a noop edit for each "missing" annotator to explicitly indicate that they made no changes to the sentence; e.g.
'''
This is second tet sentence.
A 3 3 ||...........|||0
A -1 -1|||noop|||-NONE-|||REQUIRED|||-NONE-|||1
This is the third tet sentence.
A -1 -1|||noop|||-NONE-|||REQUIRED|||-NONE-|||0
A 4 4 ||............|||1
'''

@aashish-kumar
Copy link
Author

I think the case you mentioned can be handled by the current errant_parallel apis by making correction= source sentence.

In my case, all the annotators have not annotated all the sentences.
Continuing the example, the second sentence was not annotated by Annotator 1, i.e. Annotator 1 does not know that the second sentence is correct or not. In such cases, noop will not work.
I was wondering if in the correction file, we can insert a symbol saying the annotation is not present.

@chrisjbryant
Copy link
Owner

Ah I see. Yes, ERRANT was never designed for this as it's generally assumed that all annotators will annotate the same number of sentences otherwise correct sentences are indistinguishable from unannotated sentences.

There's no easy way around this; I wouldn't want to add a special symbol for missing annotations because this would also affect other datasets too.

The only other option I can think of is to change the annotator IDs based on how many annotations there are; e.g.
S Sentence1
A 2 2|||...|||0 - Annotator 0
A 2 3|||...|||1 - Annotator 1
A 4 6|||...|||2 - Annotator 2
S Sentence2
Missing - Annotator 0
A 4 5|||...|||0 - Annotator 1
A -1 -1|||...|||1 - Annotator 2
S Sentence3
A 2 2|||...|||0 - Annotator 0
Missing - Annotator 1
A 2 3|||...|||1 - Annotator 2

In Sentence1, all 3 annotators saw the sentence and made edits.
In Sentence2, Annotator 0 missed the sentence, Annotator 1 made an edit and became Annotator 0, and Annotator 2 saw the sentence but made no edits and became Annotator 1.
In Sentence 3, Annotator 0 made an edit and stayed as Annotator 0, Annotator 1 missed the sentence, and Annotator 2 made an edit and became Annotator 1.

This kind of structure will work in ERRANT, but note that the M2 annotator IDs will no longer refer to a specific annotator (if that's important, which it usually isn't).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants