Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutli-label text classification export issues: same classes but in different orders #1466

Closed
xegulon opened this issue Aug 4, 2021 · 0 comments · Fixed by #1558
Closed
Labels
bug Something isn't working enhancement Improvement on existing feature

Comments

@xegulon
Copy link

xegulon commented Aug 4, 2021

How to reproduce the behaviour

We are two annotators on a multi-label classification project. When I export the annotations, for some examples, me and my co-annotator have put the same labels, but on the exported CSV, they do not appear in the same order:

Annotator 1:

| text | labels |
| example 1 | label1#label2#label3 |

Annotator 2:

| text | labels |
| example 1 | label2#label3#label1 |

As I try to use these CSVs for comparing our annotations, this brings more difficulty.

Your Environment

  • Operating System: Debian
  • Python Version Used: Don't know, I pulled the latest version from Docker Hub
  • When you install doccano: 3 days ago
  • How did you install doccano (Heroku button etc): Docker
@Hironsan Hironsan added bug Something isn't working enhancement Improvement on existing feature labels Nov 12, 2021
Hironsan added a commit that referenced this issue Nov 12, 2021
Hironsan added a commit that referenced this issue Nov 12, 2021
ghontolux added a commit to ghontolux/doccano that referenced this issue Jan 25, 2022
* Fixing Data Annotation Issues

When uploading datasets, the code uses a `bulk_create` to upload
Examples and Labels. It then filters the data from the database
based on when it was created. However, [Django doesn't enforce
the list order when calling filter](https://stackoverflow.com/questions/7163640/what-is-the-default-order-of-a-list-returned-from-a-django-filter-call)
unless ordering is specified. The previous behavior mismatched labels
and examples. When this was shown in the UI, the data would show labels
for incorrect examples (i.e. a label for message #2 would be shown on
message #1). This fix enforces that the data is returned in the order
it was inserted so that the data, label pair is as expected.

* move later to copy files in Dockerfile.prod

* fix client-side types about comment as backend returns

* add annotation link in commentList page

* Add admin interface for AutoLabelingConfigs. Solves doccano#1423

Thanks to @uklft for the idea.

* Sort imports

* Return a Response with a status if the task is not yet ready.

* Remove unneeded query

Bulk create returns the created objects in the same order as they have been added.

In Postgres, the query was wrong, because ordering was not guaranteed.

* Remove unneed import

* removing debugging statement

* iss1348: fix colors when importing labels

Signed-off-by: Dimid Duchovny <dimidd@localize.city>

* Updated various dependency and image versions

* Python version pinning fix

* update cloudformation template to modify the sample env file, now that
all the config params are stored in environment variables as per commit
5728636

* show a check button for annotators

* filter by role in the confirm API

* add a property to the ExampleState model

* separate confirm status for each role or user

* fix flake8

* fix TestExampleStateConfirmCollaborative

* fix isort

* move ExampleSerializer tests to test_document.py

* add tests

* Sequence labelling: fix background color in dark mode

* add confirmed count to statistics api response

* receive confirmed count value in frontend statistics models

* make progress data per role

* show progress of each role

* not display legend of bar-chart

* Increase the allowed max length for uploaded dataset filepath

* Bump django from 3.2.4 to 3.2.5

Bumps [django](https://github.com/django/django) from 3.2.4 to 3.2.5.
- [Release notes](https://github.com/django/django/releases)
- [Commits](django/django@3.2.4...3.2.5)

---
updated-dependencies:
- dependency-name: django
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Add EntityEditor

* Fix flake8 warnings

* Update Dockerfiles

* Add v-annotator

* Update ner demo

* Update sequence labeling page

* Support RTL in sequence labeling

* Update index.md

* Update package

* Add fields to SequenceLabelingProject

* Update serializer in ProjectDetail

* Enable to handle allowOverlapping and graphemeMode option in sequence labeling page

* Enable to create project with allowOverlapping and graphemeMode option

* Remove unused import

* Update v-annotator to fix the problem

The problem occurred when the user changes the state of RTL.
Once the state changes, the entities are visually disappeared.

* Show shortcut key on menu

* Add explanation for nested mode

* Add explanation for grapheme mode

* Update shortcut on menu

* Update package version

* Enable to pass grapheme-mode to EntityEditor.vue

* Add explanation for project creation

* Support doccano init on windows

* Fix cli

* Add dependency, fix doccano#1481

* Update cli, fix doccano#1408

* Add explanation on create user, close doccano#1410

* Update faq, close doccano#1496

* Remove old tests

* Update test config

* Update components, fix doccano#1541

* Add test for FormGuideline component

* Update the name of test case

* Apply linter

* Update eslint config

* Update docker-compose.dev.yml, fix doccano#1536

* Change example id from auto field to uuid field

* Update import method of urls

* Add test cases for ingest classification data

* Move test data

* Rename classification.jsonl

* Fix CoNLLDataset

* Add test cases for ingesting sequence labeling data

* Refactor test_tasks.py

* Move test data

* Add test cases for ingesting seq2seq data

* Update test cases for ingesting data to check mapping

* Improve error handling for jsonl parser

* Improve error handling for json parser

* Improve error handling for excel parser

* Add csv test case

* Add conll test case

* Change doc/example id type from number to string

* Update order of examples

* Revert primary key change

* Add migration file

* Update task queue command to support windows

* Create FUNDING.yml

* Update README.md

* Update compose files, fix doccano#1546

* Update CsvWriter, fix doccano#1497

* Sort exported labels, fix doccano#1466

* Add keyboard shortcut back to accept button

* Add how to use PostgreSQL

* Assign label colors automatically

* Add a test case for generating color function

* Fix typo: injest -> ingest

* Add PostgreSQL related env in docker compose mode

* Update README.md

* Add a validator to the text field

* Enable to ingest lines without errors even if an exception occurs during parsing

* Fix TextLineDataset to raise exception

* Enable to delete relation if one of the entities are deleted

* Update Span model

* Add a migration

* Refactor CoNLLDataset

* Enable to return line number of exception occured

* Update Cleaner to change error the message by the project type

* Install mdi font

* Set icons locally

* Support offline font

* Remove font awesome script

* Add a demo image to show it in offline environment

* Fix speech to text demo

* Remove unused scripts

* Update publish-image.yml

* Enable to list all labels

* Fix unique constraint

* Add clean up after closing menu

* Update the way of clean up selected items

* Wrap by nexttick

* Update Dockerfile to change the default value of DEBUG, fix doccano#1457

* Update cleanup method

* Update unique constraint of Span

* Handle unique constraint exception

* Add try/catch to update/delete method

* Show number of deleting rows only in confirm dialog, resolve doccano#1077

* Speed up fetching comment

Co-authored-by: zanussbaum <zanussbaum@gmail.com>
Co-authored-by: youichiro <cinnamon416@gmail.com>
Co-authored-by: ayanamizuta <ayanamizuta832@gmail.com>
Co-authored-by: Roland Szabo <rolisz@gmail.com>
Co-authored-by: Dimid Duchovny <dimidd@localize.city>
Co-authored-by: rcarew@xelerance.com <rcarew@xelerance.com>
Co-authored-by: Dale Evans <dale.evans@mycanadapayday.com>
Co-authored-by: Colin Darie <colin@darie.eu>
Co-authored-by: Yosua Michael M <yosua.maranatha@grabtaxi.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Hironsan <light.tree.1.13@gmail.com>
Co-authored-by: Hiroki Nakayama <hiroki.nakayama.py@gmail.com>
Co-authored-by: Talha Oz <oztalha@users.noreply.github.com>
Co-authored-by: Fynn Schmitt-Ulms <fynnsu@outlook.com>
Co-authored-by: Zader Zheng <yumaoshu@gmail.com>
Co-authored-by: Gerhard Haß <gerhard.hass@neofonie.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Improvement on existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants