Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doccano is duplicating the words displayed for some unknown reason #1105

Closed
milos-cuculovic opened this issue Dec 10, 2020 · 11 comments · Fixed by #1583
Closed

Doccano is duplicating the words displayed for some unknown reason #1105

milos-cuculovic opened this issue Dec 10, 2020 · 11 comments · Fixed by #1583
Labels
bug Something isn't working

Comments

@milos-cuculovic
Copy link

milos-cuculovic commented Dec 10, 2020

I have imported a pre-labeled dataset to doccano and asked some colleagues to check and fix the annotations. In some cases, the sentences displayed in doccano have one annotated word duplicated, see the following screenshots:

  1. I open doccano, the word should is duplicated:
    https://ibb.co/3WKKd2m

  2. I remove both duplicate annotations, should is shown only once
    https://ibb.co/SJnV1mL

  3. When I try to annotate should, the label selection window is not popping up

  4. When I annotate something before the word should:
    https://ibb.co/nzB28bx

  5. The annotation is placed on a wrong position:
    https://ibb.co/7QpWvjR

If I remove all annotations and redo them again, all looks ok.

Your Environment

  • Operating System: Ubuntu 20.04
  • Python Version Used: 3.8.5
  • When you install doccano: 4 weeks ago
  • How did you install doccano (Heroku button etc): Docker
@issue-label-bot issue-label-bot bot added the bug Something isn't working label Dec 10, 2020
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.91. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@Hironsan
Copy link
Member

Could you please show me an example of pre-labeled data? It helps me a lot to investigate the problem.

@milos-cuculovic
Copy link
Author

milos-cuculovic commented Dec 10, 2020

Hi @Hironsan, sure, here you go:

{ "text": "2, The main text under the \u2018results and discussion\u2019 is unnecessarily wordy. Shorten the text and minimize the discussion and reference to literature in this section as the \u2018results and discussion\u2019 should focus on the results obtained by the authors. In some cases it is difficult to understand whether the findings are original, or already reported in the literature.", "labels": [[12,16,"LOCATION"], [197,203,"MODAL"]]}

@Hironsan
Copy link
Member

Thank you for your response.

I uploaded the data but I couldn't find the duplication:

Do you have a pre-labeled data which has word duplicated? Thanks.

@milos-cuculovic
Copy link
Author

milos-cuculovic commented Dec 10, 2020

This one is when I imported the pre-labelled text into doccano. After labelling, somehow it had duplicates.
Here is another one I just exported from doccano, that has duplicates:

{"id": 6111, "text": "10. The“Adsorption (2018) 24:691” article should be acknowledged , it discussed the same material and separation but from a different approach. Similarly,“Chem. Mater., 2018, 30 (2), 447-455” also discussed the same material for a novel application, which should be briefly incorporated  into the introduction about the target materials.", "meta": {}, "annotation_approver": "xxx@yyy.com", "labels": [[7, 33, "LOCATION"], [51, 64, "ACTION"], [42, 48, "MODAL"], [113, 116, "TRIGGER"], [42, 48, "MODAL"], [52, 64, "ACTION"], [4, 41, "CONTENT"], [154, 191, "CONTENT"], [256, 262, "MODAL"], [274, 286, "ACTION"], [297, 309, "LOCATION"]]}

@milos-cuculovic
Copy link
Author

https://ibb.co/BN9Fzyj

@Hironsan
Copy link
Member

Hironsan commented Dec 10, 2020

Thank you.

I looked at the data and found [42, 48, "MODAL"] is duplicated.

I have a hypothesis. Did you turn on the share annotation option when you create the project?

@milos-cuculovic
Copy link
Author

Collaborative annotation is now checked, but I do not remember if I checked it when I created the project and imported the dataset, or afterwards. Shall I disable it and try to export again?

@milos-cuculovic
Copy link
Author

Ok, the good news is that once I disable the "Collaborative annotation", I do not see the duplicates in doccano any longer, however, when I export them from doccano, there are duplicates in the json file.

@Hironsan
Copy link
Member

The way to reproduce the problem

  • There are two annotators: annotator A and B.
  • The share annotation option is on.
  • The annotator A annotate a word should:

image

  • The annotator B also annotate a word should:

image

  • Once we refresh the app, we can see the duplication.

image

Problem

  • The SequenceAnnotation model has a unique constraint.
  • It includes user, so different users can annotate the same offset, label, and document.
  • This is not preferable if share annotation is enabled.
class SequenceAnnotation(Annotation):
    ...
    class Meta:
        unique_together = ('document', 'user', 'label', 'start_offset', 'end_offset')

Solution idea

I can't come up with anything right now. I will think the solution.

@milos-cuculovic
Copy link
Author

Thanks @Hironsan ! The main issue I have now is that even after disabling the share annotation option, the view is fixed for the user B, however while exporting the annotations in JSON, the two are still shown. To bypass the problem, I am querying directly the PostgreSQL db with a WHERE cause user_id = user B. Looking forward to a cleaner solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants