New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doccano init causes a ModuleNotFoundError for chardet #1481
Labels
bug
Something isn't working
Comments
ghontolux
added a commit
to ghontolux/doccano
that referenced
this issue
Oct 28, 2021
* Fixing Data Annotation Issues When uploading datasets, the code uses a `bulk_create` to upload Examples and Labels. It then filters the data from the database based on when it was created. However, [Django doesn't enforce the list order when calling filter](https://stackoverflow.com/questions/7163640/what-is-the-default-order-of-a-list-returned-from-a-django-filter-call) unless ordering is specified. The previous behavior mismatched labels and examples. When this was shown in the UI, the data would show labels for incorrect examples (i.e. a label for message #2 would be shown on message #1). This fix enforces that the data is returned in the order it was inserted so that the data, label pair is as expected. * move later to copy files in Dockerfile.prod * fix client-side types about comment as backend returns * add annotation link in commentList page * Add admin interface for AutoLabelingConfigs. Solves doccano#1423 Thanks to @uklft for the idea. * Sort imports * Return a Response with a status if the task is not yet ready. * Remove unneeded query Bulk create returns the created objects in the same order as they have been added. In Postgres, the query was wrong, because ordering was not guaranteed. * Remove unneed import * removing debugging statement * iss1348: fix colors when importing labels Signed-off-by: Dimid Duchovny <dimidd@localize.city> * Updated various dependency and image versions * Python version pinning fix * update cloudformation template to modify the sample env file, now that all the config params are stored in environment variables as per commit 5728636 * show a check button for annotators * filter by role in the confirm API * add a property to the ExampleState model * separate confirm status for each role or user * fix flake8 * fix TestExampleStateConfirmCollaborative * fix isort * move ExampleSerializer tests to test_document.py * add tests * Sequence labelling: fix background color in dark mode * add confirmed count to statistics api response * receive confirmed count value in frontend statistics models * make progress data per role * show progress of each role * not display legend of bar-chart * Increase the allowed max length for uploaded dataset filepath * Bump django from 3.2.4 to 3.2.5 Bumps [django](https://github.com/django/django) from 3.2.4 to 3.2.5. - [Release notes](https://github.com/django/django/releases) - [Commits](django/django@3.2.4...3.2.5) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Add EntityEditor * Fix flake8 warnings * Update Dockerfiles * Add v-annotator * Update ner demo * Update sequence labeling page * Support RTL in sequence labeling * Update index.md * Update package * Add fields to SequenceLabelingProject * Update serializer in ProjectDetail * Enable to handle allowOverlapping and graphemeMode option in sequence labeling page * Enable to create project with allowOverlapping and graphemeMode option * Remove unused import * Update v-annotator to fix the problem The problem occurred when the user changes the state of RTL. Once the state changes, the entities are visually disappeared. * Show shortcut key on menu * Add explanation for nested mode * Add explanation for grapheme mode * Update shortcut on menu * Update package version * Enable to pass grapheme-mode to EntityEditor.vue * Add explanation for project creation * Support doccano init on windows * Fix cli * Add dependency, fix doccano#1481 * Update cli, fix doccano#1408 * Add explanation on create user, close doccano#1410 * Update faq, close doccano#1496 Co-authored-by: zanussbaum <zanussbaum@gmail.com> Co-authored-by: youichiro <cinnamon416@gmail.com> Co-authored-by: ayanamizuta <ayanamizuta832@gmail.com> Co-authored-by: Roland Szabo <rolisz@gmail.com> Co-authored-by: Dimid Duchovny <dimidd@localize.city> Co-authored-by: rcarew@xelerance.com <rcarew@xelerance.com> Co-authored-by: Dale Evans <dale.evans@mycanadapayday.com> Co-authored-by: Colin Darie <colin@darie.eu> Co-authored-by: Yosua Michael M <yosua.maranatha@grabtaxi.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hironsan <light.tree.1.13@gmail.com> Co-authored-by: Hiroki Nakayama <hiroki.nakayama.py@gmail.com> Co-authored-by: Talha Oz <oztalha@users.noreply.github.com> Co-authored-by: Gerhard Haß <gerhard.hass@neofonie.de>
ghontolux
added a commit
to ghontolux/doccano
that referenced
this issue
Nov 1, 2021
* Fixing Data Annotation Issues When uploading datasets, the code uses a `bulk_create` to upload Examples and Labels. It then filters the data from the database based on when it was created. However, [Django doesn't enforce the list order when calling filter](https://stackoverflow.com/questions/7163640/what-is-the-default-order-of-a-list-returned-from-a-django-filter-call) unless ordering is specified. The previous behavior mismatched labels and examples. When this was shown in the UI, the data would show labels for incorrect examples (i.e. a label for message #2 would be shown on message #1). This fix enforces that the data is returned in the order it was inserted so that the data, label pair is as expected. * move later to copy files in Dockerfile.prod * fix client-side types about comment as backend returns * add annotation link in commentList page * Add admin interface for AutoLabelingConfigs. Solves doccano#1423 Thanks to @uklft for the idea. * Sort imports * Return a Response with a status if the task is not yet ready. * Remove unneeded query Bulk create returns the created objects in the same order as they have been added. In Postgres, the query was wrong, because ordering was not guaranteed. * Remove unneed import * removing debugging statement * iss1348: fix colors when importing labels Signed-off-by: Dimid Duchovny <dimidd@localize.city> * Updated various dependency and image versions * Python version pinning fix * update cloudformation template to modify the sample env file, now that all the config params are stored in environment variables as per commit 5728636 * show a check button for annotators * filter by role in the confirm API * add a property to the ExampleState model * separate confirm status for each role or user * fix flake8 * fix TestExampleStateConfirmCollaborative * fix isort * move ExampleSerializer tests to test_document.py * add tests * Sequence labelling: fix background color in dark mode * add confirmed count to statistics api response * receive confirmed count value in frontend statistics models * make progress data per role * show progress of each role * not display legend of bar-chart * Increase the allowed max length for uploaded dataset filepath * Bump django from 3.2.4 to 3.2.5 Bumps [django](https://github.com/django/django) from 3.2.4 to 3.2.5. - [Release notes](https://github.com/django/django/releases) - [Commits](django/django@3.2.4...3.2.5) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Add EntityEditor * Fix flake8 warnings * Update Dockerfiles * Add v-annotator * Update ner demo * Update sequence labeling page * Support RTL in sequence labeling * Update index.md * Update package * Add fields to SequenceLabelingProject * Update serializer in ProjectDetail * Enable to handle allowOverlapping and graphemeMode option in sequence labeling page * Enable to create project with allowOverlapping and graphemeMode option * Remove unused import * Update v-annotator to fix the problem The problem occurred when the user changes the state of RTL. Once the state changes, the entities are visually disappeared. * Show shortcut key on menu * Add explanation for nested mode * Add explanation for grapheme mode * Update shortcut on menu * Update package version * Enable to pass grapheme-mode to EntityEditor.vue * Add explanation for project creation * Support doccano init on windows * Fix cli * Add dependency, fix doccano#1481 * Update cli, fix doccano#1408 * Add explanation on create user, close doccano#1410 * Update faq, close doccano#1496 * Remove old tests * Update test config * Update components, fix doccano#1541 * Add test for FormGuideline component * Update the name of test case * Apply linter * Update eslint config * Update docker-compose.dev.yml, fix doccano#1536 * added merge * added merge * added merge * add merge * add merge Co-authored-by: zanussbaum <zanussbaum@gmail.com> Co-authored-by: youichiro <cinnamon416@gmail.com> Co-authored-by: ayanamizuta <ayanamizuta832@gmail.com> Co-authored-by: Roland Szabo <rolisz@gmail.com> Co-authored-by: Dimid Duchovny <dimidd@localize.city> Co-authored-by: rcarew@xelerance.com <rcarew@xelerance.com> Co-authored-by: Dale Evans <dale.evans@mycanadapayday.com> Co-authored-by: Colin Darie <colin@darie.eu> Co-authored-by: Yosua Michael M <yosua.maranatha@grabtaxi.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hironsan <light.tree.1.13@gmail.com> Co-authored-by: Hiroki Nakayama <hiroki.nakayama.py@gmail.com> Co-authored-by: Talha Oz <oztalha@users.noreply.github.com> Co-authored-by: Gerhard Haß <gerhard.hass@neofonie.de>
ghontolux
added a commit
to ghontolux/doccano
that referenced
this issue
Nov 11, 2021
* Fixing Data Annotation Issues When uploading datasets, the code uses a `bulk_create` to upload Examples and Labels. It then filters the data from the database based on when it was created. However, [Django doesn't enforce the list order when calling filter](https://stackoverflow.com/questions/7163640/what-is-the-default-order-of-a-list-returned-from-a-django-filter-call) unless ordering is specified. The previous behavior mismatched labels and examples. When this was shown in the UI, the data would show labels for incorrect examples (i.e. a label for message #2 would be shown on message #1). This fix enforces that the data is returned in the order it was inserted so that the data, label pair is as expected. * move later to copy files in Dockerfile.prod * fix client-side types about comment as backend returns * add annotation link in commentList page * Add admin interface for AutoLabelingConfigs. Solves doccano#1423 Thanks to @uklft for the idea. * Sort imports * Return a Response with a status if the task is not yet ready. * Remove unneeded query Bulk create returns the created objects in the same order as they have been added. In Postgres, the query was wrong, because ordering was not guaranteed. * Remove unneed import * removing debugging statement * iss1348: fix colors when importing labels Signed-off-by: Dimid Duchovny <dimidd@localize.city> * Updated various dependency and image versions * Python version pinning fix * update cloudformation template to modify the sample env file, now that all the config params are stored in environment variables as per commit 5728636 * show a check button for annotators * filter by role in the confirm API * add a property to the ExampleState model * separate confirm status for each role or user * fix flake8 * fix TestExampleStateConfirmCollaborative * fix isort * move ExampleSerializer tests to test_document.py * add tests * Sequence labelling: fix background color in dark mode * add confirmed count to statistics api response * receive confirmed count value in frontend statistics models * make progress data per role * show progress of each role * not display legend of bar-chart * Increase the allowed max length for uploaded dataset filepath * Bump django from 3.2.4 to 3.2.5 Bumps [django](https://github.com/django/django) from 3.2.4 to 3.2.5. - [Release notes](https://github.com/django/django/releases) - [Commits](django/django@3.2.4...3.2.5) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Add EntityEditor * Fix flake8 warnings * Update Dockerfiles * Add v-annotator * Update ner demo * Update sequence labeling page * Support RTL in sequence labeling * Update index.md * Update package * Add fields to SequenceLabelingProject * Update serializer in ProjectDetail * Enable to handle allowOverlapping and graphemeMode option in sequence labeling page * Enable to create project with allowOverlapping and graphemeMode option * Remove unused import * Update v-annotator to fix the problem The problem occurred when the user changes the state of RTL. Once the state changes, the entities are visually disappeared. * Show shortcut key on menu * Add explanation for nested mode * Add explanation for grapheme mode * Update shortcut on menu * Update package version * Enable to pass grapheme-mode to EntityEditor.vue * Add explanation for project creation * Support doccano init on windows * Fix cli * Add dependency, fix doccano#1481 * Update cli, fix doccano#1408 * Add explanation on create user, close doccano#1410 * Update faq, close doccano#1496 * Remove old tests * Update test config * Update components, fix doccano#1541 * Add test for FormGuideline component * Update the name of test case * Apply linter * Update eslint config * Update docker-compose.dev.yml, fix doccano#1536 * Change example id from auto field to uuid field * Update import method of urls * Add test cases for ingest classification data * Move test data * Rename classification.jsonl * Fix CoNLLDataset * Add test cases for ingesting sequence labeling data * Refactor test_tasks.py * Move test data * Add test cases for ingesting seq2seq data * Update test cases for ingesting data to check mapping * Improve error handling for jsonl parser * Improve error handling for json parser * Improve error handling for excel parser * Add csv test case * Add conll test case * Change doc/example id type from number to string * Update order of examples * Revert primary key change * Add migration file * Update task queue command to support windows * Create FUNDING.yml * Update README.md * Update compose files, fix doccano#1546 * Update CsvWriter, fix doccano#1497 * revised uri docstrings Co-authored-by: zanussbaum <zanussbaum@gmail.com> Co-authored-by: youichiro <cinnamon416@gmail.com> Co-authored-by: ayanamizuta <ayanamizuta832@gmail.com> Co-authored-by: Roland Szabo <rolisz@gmail.com> Co-authored-by: Dimid Duchovny <dimidd@localize.city> Co-authored-by: rcarew@xelerance.com <rcarew@xelerance.com> Co-authored-by: Dale Evans <dale.evans@mycanadapayday.com> Co-authored-by: Colin Darie <colin@darie.eu> Co-authored-by: Yosua Michael M <yosua.maranatha@grabtaxi.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hironsan <light.tree.1.13@gmail.com> Co-authored-by: Hiroki Nakayama <hiroki.nakayama.py@gmail.com> Co-authored-by: Talha Oz <oztalha@users.noreply.github.com> Co-authored-by: Gerhard Haß <gerhard.hass@neofonie.de>
ghontolux
added a commit
to ghontolux/doccano
that referenced
this issue
Jan 25, 2022
* Fixing Data Annotation Issues When uploading datasets, the code uses a `bulk_create` to upload Examples and Labels. It then filters the data from the database based on when it was created. However, [Django doesn't enforce the list order when calling filter](https://stackoverflow.com/questions/7163640/what-is-the-default-order-of-a-list-returned-from-a-django-filter-call) unless ordering is specified. The previous behavior mismatched labels and examples. When this was shown in the UI, the data would show labels for incorrect examples (i.e. a label for message #2 would be shown on message #1). This fix enforces that the data is returned in the order it was inserted so that the data, label pair is as expected. * move later to copy files in Dockerfile.prod * fix client-side types about comment as backend returns * add annotation link in commentList page * Add admin interface for AutoLabelingConfigs. Solves doccano#1423 Thanks to @uklft for the idea. * Sort imports * Return a Response with a status if the task is not yet ready. * Remove unneeded query Bulk create returns the created objects in the same order as they have been added. In Postgres, the query was wrong, because ordering was not guaranteed. * Remove unneed import * removing debugging statement * iss1348: fix colors when importing labels Signed-off-by: Dimid Duchovny <dimidd@localize.city> * Updated various dependency and image versions * Python version pinning fix * update cloudformation template to modify the sample env file, now that all the config params are stored in environment variables as per commit 5728636 * show a check button for annotators * filter by role in the confirm API * add a property to the ExampleState model * separate confirm status for each role or user * fix flake8 * fix TestExampleStateConfirmCollaborative * fix isort * move ExampleSerializer tests to test_document.py * add tests * Sequence labelling: fix background color in dark mode * add confirmed count to statistics api response * receive confirmed count value in frontend statistics models * make progress data per role * show progress of each role * not display legend of bar-chart * Increase the allowed max length for uploaded dataset filepath * Bump django from 3.2.4 to 3.2.5 Bumps [django](https://github.com/django/django) from 3.2.4 to 3.2.5. - [Release notes](https://github.com/django/django/releases) - [Commits](django/django@3.2.4...3.2.5) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Add EntityEditor * Fix flake8 warnings * Update Dockerfiles * Add v-annotator * Update ner demo * Update sequence labeling page * Support RTL in sequence labeling * Update index.md * Update package * Add fields to SequenceLabelingProject * Update serializer in ProjectDetail * Enable to handle allowOverlapping and graphemeMode option in sequence labeling page * Enable to create project with allowOverlapping and graphemeMode option * Remove unused import * Update v-annotator to fix the problem The problem occurred when the user changes the state of RTL. Once the state changes, the entities are visually disappeared. * Show shortcut key on menu * Add explanation for nested mode * Add explanation for grapheme mode * Update shortcut on menu * Update package version * Enable to pass grapheme-mode to EntityEditor.vue * Add explanation for project creation * Support doccano init on windows * Fix cli * Add dependency, fix doccano#1481 * Update cli, fix doccano#1408 * Add explanation on create user, close doccano#1410 * Update faq, close doccano#1496 * Remove old tests * Update test config * Update components, fix doccano#1541 * Add test for FormGuideline component * Update the name of test case * Apply linter * Update eslint config * Update docker-compose.dev.yml, fix doccano#1536 * Change example id from auto field to uuid field * Update import method of urls * Add test cases for ingest classification data * Move test data * Rename classification.jsonl * Fix CoNLLDataset * Add test cases for ingesting sequence labeling data * Refactor test_tasks.py * Move test data * Add test cases for ingesting seq2seq data * Update test cases for ingesting data to check mapping * Improve error handling for jsonl parser * Improve error handling for json parser * Improve error handling for excel parser * Add csv test case * Add conll test case * Change doc/example id type from number to string * Update order of examples * Revert primary key change * Add migration file * Update task queue command to support windows * Create FUNDING.yml * Update README.md * Update compose files, fix doccano#1546 * Update CsvWriter, fix doccano#1497 * Sort exported labels, fix doccano#1466 * Add keyboard shortcut back to accept button * Add how to use PostgreSQL * Assign label colors automatically * Add a test case for generating color function * Fix typo: injest -> ingest * Add PostgreSQL related env in docker compose mode * Update README.md * Add a validator to the text field * Enable to ingest lines without errors even if an exception occurs during parsing * Fix TextLineDataset to raise exception * Enable to delete relation if one of the entities are deleted * Update Span model * Add a migration * Refactor CoNLLDataset * Enable to return line number of exception occured * Update Cleaner to change error the message by the project type * Install mdi font * Set icons locally * Support offline font * Remove font awesome script * Add a demo image to show it in offline environment * Fix speech to text demo * Remove unused scripts * Update publish-image.yml * Enable to list all labels * Fix unique constraint * Add clean up after closing menu * Update the way of clean up selected items * Wrap by nexttick * Update Dockerfile to change the default value of DEBUG, fix doccano#1457 * Update cleanup method * Update unique constraint of Span * Handle unique constraint exception * Add try/catch to update/delete method * Show number of deleting rows only in confirm dialog, resolve doccano#1077 * Speed up fetching comment Co-authored-by: zanussbaum <zanussbaum@gmail.com> Co-authored-by: youichiro <cinnamon416@gmail.com> Co-authored-by: ayanamizuta <ayanamizuta832@gmail.com> Co-authored-by: Roland Szabo <rolisz@gmail.com> Co-authored-by: Dimid Duchovny <dimidd@localize.city> Co-authored-by: rcarew@xelerance.com <rcarew@xelerance.com> Co-authored-by: Dale Evans <dale.evans@mycanadapayday.com> Co-authored-by: Colin Darie <colin@darie.eu> Co-authored-by: Yosua Michael M <yosua.maranatha@grabtaxi.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Hironsan <light.tree.1.13@gmail.com> Co-authored-by: Hiroki Nakayama <hiroki.nakayama.py@gmail.com> Co-authored-by: Talha Oz <oztalha@users.noreply.github.com> Co-authored-by: Fynn Schmitt-Ulms <fynnsu@outlook.com> Co-authored-by: Zader Zheng <yumaoshu@gmail.com> Co-authored-by: Gerhard Haß <gerhard.hass@neofonie.de>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How to reproduce the behaviour
Create a fresh virtualenv in which to test, then install the latest release of doccano from PyPi (v1.4.1):
Now run
doccano init
:This results in a set of long stack traces all rooted on doccano/backend/api/views/upload/dataset.py:L7 -
import chardet
pip install chardet
resolves the issue anddoccano init
then completes successfully and I'm able to run the app.Your Environment
pip install doccano
into a clean python virtualenv.The text was updated successfully, but these errors were encountered: