Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uploader rotates images and not annotations #674

Open
geezacoleman opened this issue Apr 20, 2022 · 5 comments
Open

uploader rotates images and not annotations #674

geezacoleman opened this issue Apr 20, 2022 · 5 comments
Assignees

Comments

@geezacoleman
Copy link
Collaborator

Just tried uploading a new dataset that has images annotated in portrait orientation. I guess the uploader removes all exif data about orientation, so they are now landscape, however, the annotations are still portrait and don't line up in the visualiser.

Options seem to be either:

  • prevent any portrait uploads
  • allow portrait uploads but ensure orientation is known and annotations converted appropriately
@spikelynch spikelynch self-assigned this Apr 25, 2022
@spikelynch
Copy link
Collaborator

I'll have a look at this, the release we're currently working on had some changes to the code which removes the exif data. There may be a third option, which is to retain the exif orientation data while removing everything else.

@spikelynch
Copy link
Collaborator

Hi @geezacoleman - can you send me one of the original images which got rotated? I'm assuming it was the Broadleaf Grass Weeds dataset here: https://weed-ai.sydney.edu.au/datasets/ff19a895-dd0c-447e-9499-2ea898a376f6

spikelynch pushed a commit that referenced this issue Apr 26, 2022
the image according to an Orientation tag (see issue #674)
@geezacoleman
Copy link
Collaborator Author

Sent an image to you earlier today @spikelynch - thanks for looking into this.

I'll reupload the corrected landscape dataset or the portrait one if you'd like to test the new functionality.

@geezacoleman
Copy link
Collaborator Author

I uploaded rotated images and the annotations but weirdly enough it's rotated some images in the opposite direction even though they are all landscape.

E.g. this is what Weed-AI shows:
image

and this is how it appears in Explorer:
image

Seems to be inconsistent - so I'm assuming it must be the EXIF data from when I collected it overhead that's causing issues. Because it's flat, any subtle angles in a direction must cause the EXIF to change?

@spikelynch
Copy link
Collaborator

I think you're right - the device is deciding which way is "up" based on a very small angle and setting the exif data accordingly. (Different devices handle this in different ways - some phones just rotate the bitmap, others set the exif flag.)

The PIL function I'm using to handle it looks at the Orientation flag in the exif and does a transform to convert the image to something which looks the right way up without an exif flag.

The first example you sent had an exif Orientation of 6, which is a 90-degree rotation, so without the exif it got switched to landscape rather than portrait.

The example above looks like exif 3 (180 degrees) but the fix should work for that as well.

We won't be deploying the new release until next week, but if you want to email me the 180-degree example I can test it locally.

hlydecker added a commit that referenced this issue Aug 4, 2022
* change cvat basepath to env variable

* add mystatic volume in dev and expose 8080 for cvat admin

* Initial version of cvat_copy api endpoint

* Added a temporary component to test the endpoint which copies images from
CVAT

* Fixed the data path

* init version of stub dataset upload

* Took out the manual cvat-copy component

* Took out the CVAT item from the uploader drop-down

* copy_cvat puts images in {user}/{upload_id}/images

* add /cvat-static to traefik rule

* Added material-ui autocomplete to frontend dependencies.
Updated docker-compose-dev.yml to use a local image of the cvat server to keep it in version sync with the frontend

* Adding cvat as a submodule - removing cvat/ path

* committing before deiniting - botched git submodules

* Removed first attempt at submodule

* revert uploader zip images

* add spinner for loading copy cvat

* CVAT as submodule

* add cypress test for cvat uploader

* submodule changes

* init cypress test for cvat creation and upload

* add DJANGO_AUTH_DEFAULT_GROUPS in docker-compose

* new version of cvat

* fix a cvat loading issue

* add other tests back

* Updated cvat submodule branch

* Added git submodule update to frontend.yml.
Changed prod docker-compose.yml to build our own.
Updated search/cvat commit to match the branch in .gitmodules

* Added CVAT basepath and default groups to docker-compose.yml, made sure that
the cvat storage was available to the weedai django container, put cvat
config settings in the environment for CI

* Fixed linting issues

* Added HTTPS_HOST and CVAT_HOST to workflow env

* Fixed path prefix for cvat-ui static files in prod docker-compose.yml

* Added a timeout to allow cvat to load, enabled electron logging

* improved wait-for-cvat-ui behaviour

* Don't try to login to weedai on return from cvat

* Reverting to waiting for 20s rather than waiting for the registration field

* Increasing wait time after importing cvat dataset

* cypress.get with a timeout to wait for CVAT login field to appear.
Checking out the submodules via the action config rather than a separate command

* Corrected syntax of submodules option

* Fix bug from uploader_helper (#615)

* fix uploader issue and black

* take scheme from request to helper

* Tweaked the random password generator so that it always has numeric characters as cvat requires this

* Set cvat default user groups in the docker-compose rather than .env
Documented CVAT_BASEPATH in the README.md
Fixed some old references to CVAT_HOST in the traefik debug sections

* Removed CVAT_DEFAULT_AUTH_GROUPS and CVAT_HOST from ci setup

* Tried clearing the weedai_sessionid cookie before returning to weedai

* More tweaks on the tests, bug fix for +10 tasks

* Extended the timeout for waiting for cvat registration to appear. Added a couple of extra clicks on the Apply button, which for reasons I don't understand, is allowing the stepper to proceed in Cypress.

* re-add CVAT dataset type

* syntax fix

* update test to new format selection UI

* Half-baked commit of ocfl deposit

* Cvat uploader in stepper was missing 'select-type' step

* Refactor of repo/deposit.py, tests are now starting to fail usefully

* add toggle to switch between dataset and editing page

* fix upload bug

* not showing edit toggle when in specific dataset summary

* add select for editing prompt

fix margin

* fix uploader issue and black

* take scheme from request to helper

* connect front and back for editing

fix the display issue because of the key

* make it more functional

fixes

* modernise UI for selecting annotation type (#618)

* modernise UI for selecting annotation type

* flake8 lint

* keep upload_id for editing

* handle submit for overwriting editing

* flake8

* remove more cvat stuffs

* more ocfl deposit groundwork

* manual fixes

* rename payload to preset

* Working out how to work with the test fixtures

* rewrite_outputs now uses the new repository interface to get fixture data

* Basic deposit and extract is passing

* Multiple datasets is passing

* Lint

* Fixed typo in weedcoco.json fixture which only showed up because duplicate checking isn't working.

* Duplicate image checking is passing

* check-in-repo test is working

* Made ocfl add and update operations atomic

* Creating zipfiles

* Added weedcoco.repo.deposit with API close to original. Adapted celery task to pass in version metadata based on the user.

* Added ocfl-py as weedcoco requirement

* Ported some of the indexing code to ocfl

* Added parameter types to docstrings for ocfl classes

* security check on ocfl identifiers

Co-authored-by: Joel Nothman <joel.nothman@gmail.com>

* fix indentation

* remove edit toggle and show editability on the dataset page

* import shutil

* change to edit icon and put it aside name

* fix

* change to tooltip

* Refactored the Repository and RepositoryDataset objects with a lot of @properties plus more code review cleanups

* Cleaned up old RepositoryDataset.path

* Test for thumbnailing added to test_deposit

* Convert pathlib.Paths to str before writing thumbnail images

* Explicitly convert temp dir in test_deposit.py to pathlib so that is_file() works

* ocfl is now in repository/ocfl because that works nicer with the ocfl-py library, tidied up a few more bugs

* Updated the indexing code to use the new ocfl deposit. Re-implemented
compress_to_download to use ocfl and be more consistent about how it
uses temp directories.

* include annotations in the image hash

* rename test images names

* change weedcoco.json in the test

* fix duplicate images test

* remove print

* Some logging in the celery reindex task to understand what's going on
with metadata

* add to mapping redis

* add redis module

* Updated remove_entity_local_record to use the Repository class

* Made redeposit_dataset work with ocfl

* Tidying up more bugs arising from the edit/ocfl merge

* image mapping init

* editing_init extracts last ocfl version into uploads/USER/DATASET

* changes on deposit

* check existing images with both original and hash file names

* flake8

* fix images dir not existing

* fix the weedcoco upload

* manage the consequence of different mode (upload, edit, admin)

* Added a field for head version to the Dataset table

* Versioned zipfiles with rollbacks

* More thorough testing of ocfl versioning

* Fixed rewrite_outputs - was using the original ocfl API which had
changed. Added sample outputs for ocfl versioning tests.

* Regenerated sample versioned datasets, versioning tests are passing

* Some of the new test fixtures which I forgot to add

* Added tests which fetch and extract versioned zipfiles (which fail)

* fix exception

* show notice email content in logger for DEBUG mode

* Fixed bugs in zipfile generation, added a dataset.update method which
ensures that a dataset's inventory gets refreshed after a new ocfl
version is added

* handle validation error in the frontend

* First working ocfl inspection / migration tool

* Added a command-line option to initialise an ocfl repo in an empty
directory

* add mode args for reindex_dataset task

* Validation for objects and repositories

* Improved text outputs for dry runs, fixed a bug in ingest

* fix

* fix traceback

* don't allow removing individual image

* show missing images when going to images section

* Fixed bug in getting head version from ocfl for dataset summary

* fix missing images when no new annotations being uploaded during editing

* flake8

* Remove test for unreferenced images

* Building the upload dir with extract_original_images, which remaps
image files and the weedcoco using redis

* fix import

* rename all images to original ones and change names in the weedcoco

* check original_name

* add extract function to deposit.py

* use extraction from deposit.py

* flake8

* fixes

* add deposit of zipfile with original image names

* mising method

* remove identical checks on weedcoco.json

* extract_original_images now only rewrites the weedcoco.json if it's
been passed a redis URL. This means that the tests can use
assert_files_equal to test contents of zipsfiles again.

* Allowed Repository.deposit to pass through redis_url so that zipfiles
will be generated with remapped files

* Writing hash mappings to redis before generating zipfiles

* Tidied up extract_original_images so that it only loads the weedcoco
and images if it has a redis_url, and pretty-prints the rewritten JSON

* fix cvat cypress test

* Using repository.dataset(id) to get a RepositoryDataset object, and
assumes that the REPOSITORY_DIR setting points to the ocfl root dir

* reindex_dataset had keyword arg 'upload_mode' and was being called with
'mode'. Renamed the arg to 'mode' for consistency.

* Adding logging to image remapping

* allow un-annotated images to be thumbnailed without bbox

* allow un-annotated images to be indexed

* fix

* reindex_dataset was rebuilding the zipfile without a redis URL - this
was overwriting the zipfile with remapped images which deposit had
already built, with one with hashed image names. Fixes #666

* Replacing the exif manipulation stuff with a straight copy to test my
theory about ocfl duplicates

* print duplicate images

* show image mapping and add test

* black

* add repo id to error message

* transfer upload_id to retrieve cvat task

* Test case for file changes on deposit

* fix thumbnails path

* try fixing remove index ccode

* Version with no exif stripping to test

* fix index query

* Revert "fix index query"

This reverts commit 14fe988.

* Moved the exif-stripping code out of migrate_images and into its own
weedcoco.utils method. It only gets called now at upload, via the
store_tmp_image and store_tmp_image_from_zip functions in django.

* Adding test fixtures for testing deduplication of hashed images

* fix on elasticsearch query

* Added more comprehensive tests for updating a dataset which adds one
image and checks hash mapping. --rewrite-deposit-truth now updates the
hash mappings as test fixtures.

* move removing index to un-dry env

* Removed ocfl debugging function

* add dry run to test_index

* click apply cvat task twice

* check upload_id length

* Fixed bug in tests - version 2 of updates didn't have a correct unhashed
reference dataset to compare to. Test is now building this as part of
--rewrite-deposit-truth.

* Adding new test fixtures (and one of the old ones I'd forgotten to add)

* remove double clicking apply

* change attempting apply button to twice

* resolve elasticsearch conflict to proceed

* Added PIL.ImageOps.exif_transpose to copy_without_exif, which orients
the image according to an Orientation tag (see issue #674)

* load eppo singleton (#559)

* fix email logic (#649)

* Editable Datasets and CVAT (#657)

* try integrating cvat

* dev works now

* remove it from production for now

* Revert "remove it from production for now"

This reverts commit 0cf769c.

* remove exposing port 80 from nginx

* add cvat usyd branch as submodule

* to have both http and https host

* update readme for cvat

* remove HTTPS_HOST from dev

* change cvat basepath to env variable

* add mystatic volume in dev and expose 8080 for cvat admin

* Initial version of cvat_copy api endpoint

* Added a temporary component to test the endpoint which copies images from
CVAT

* Fixed the data path

* init version of stub dataset upload

* Took out the manual cvat-copy component

* Took out the CVAT item from the uploader drop-down

* copy_cvat puts images in {user}/{upload_id}/images

* add /cvat-static to traefik rule

* Added material-ui autocomplete to frontend dependencies.
Updated docker-compose-dev.yml to use a local image of the cvat server to keep it in version sync with the frontend

* Adding cvat as a submodule - removing cvat/ path

* committing before deiniting - botched git submodules

* Removed first attempt at submodule

* revert uploader zip images

* add spinner for loading copy cvat

* CVAT as submodule

* add cypress test for cvat uploader

* submodule changes

* init cypress test for cvat creation and upload

* add DJANGO_AUTH_DEFAULT_GROUPS in docker-compose

* new version of cvat

* fix a cvat loading issue

* add other tests back

* Updated cvat submodule branch

* Added git submodule update to frontend.yml.
Changed prod docker-compose.yml to build our own.
Updated search/cvat commit to match the branch in .gitmodules

* Added CVAT basepath and default groups to docker-compose.yml, made sure that
the cvat storage was available to the weedai django container, put cvat
config settings in the environment for CI

* Fixed linting issues

* Added HTTPS_HOST and CVAT_HOST to workflow env

* Fixed path prefix for cvat-ui static files in prod docker-compose.yml

* Added a timeout to allow cvat to load, enabled electron logging

* improved wait-for-cvat-ui behaviour

* Don't try to login to weedai on return from cvat

* Reverting to waiting for 20s rather than waiting for the registration field

* Increasing wait time after importing cvat dataset

* cypress.get with a timeout to wait for CVAT login field to appear.
Checking out the submodules via the action config rather than a separate command

* Corrected syntax of submodules option

* Fix bug from uploader_helper (#615)

* fix uploader issue and black

* take scheme from request to helper

* Tweaked the random password generator so that it always has numeric characters as cvat requires this

* Set cvat default user groups in the docker-compose rather than .env
Documented CVAT_BASEPATH in the README.md
Fixed some old references to CVAT_HOST in the traefik debug sections

* Removed CVAT_DEFAULT_AUTH_GROUPS and CVAT_HOST from ci setup

* Tried clearing the weedai_sessionid cookie before returning to weedai

* More tweaks on the tests, bug fix for +10 tasks

* Extended the timeout for waiting for cvat registration to appear. Added a couple of extra clicks on the Apply button, which for reasons I don't understand, is allowing the stepper to proceed in Cypress.

* re-add CVAT dataset type

* syntax fix

* update test to new format selection UI

* Half-baked commit of ocfl deposit

* Cvat uploader in stepper was missing 'select-type' step

* Refactor of repo/deposit.py, tests are now starting to fail usefully

* add toggle to switch between dataset and editing page

* fix upload bug

* not showing edit toggle when in specific dataset summary

* add select for editing prompt

fix margin

* fix uploader issue and black

* take scheme from request to helper

* connect front and back for editing

fix the display issue because of the key

* make it more functional

fixes

* modernise UI for selecting annotation type (#618)

* modernise UI for selecting annotation type

* flake8 lint

* keep upload_id for editing

* handle submit for overwriting editing

* flake8

* remove more cvat stuffs

* more ocfl deposit groundwork

* manual fixes

* rename payload to preset

* Working out how to work with the test fixtures

* rewrite_outputs now uses the new repository interface to get fixture data

* Basic deposit and extract is passing

* Multiple datasets is passing

* Lint

* Fixed typo in weedcoco.json fixture which only showed up because duplicate checking isn't working.

* Duplicate image checking is passing

* check-in-repo test is working

* Made ocfl add and update operations atomic

* Creating zipfiles

* Added weedcoco.repo.deposit with API close to original. Adapted celery task to pass in version metadata based on the user.

* Added ocfl-py as weedcoco requirement

* Enable django access logs

* Ported some of the indexing code to ocfl

* Async unpacking of zip files with polling

* Stop raising exceptions in an unexceptional upload_status

* prod celery now needs tus_dir

* Added parameter types to docstrings for ocfl classes

* security check on ocfl identifiers

Co-authored-by: Joel Nothman <joel.nothman@gmail.com>

* fix indentation

* remove edit toggle and show editability on the dataset page

* import shutil

* change to edit icon and put it aside name

* fix

* change to tooltip

* rm trailing whitespace

* black

* Refactored the Repository and RepositoryDataset objects with a lot of @properties plus more code review cleanups

* Cleaned up old RepositoryDataset.path

* Test for thumbnailing added to test_deposit

* Convert pathlib.Paths to str before writing thumbnail images

* Explicitly convert temp dir in test_deposit.py to pathlib so that is_file() works

* ocfl is now in repository/ocfl because that works nicer with the ocfl-py library, tidied up a few more bugs

* Updated the indexing code to use the new ocfl deposit. Re-implemented
compress_to_download to use ocfl and be more consistent about how it
uses temp directories.

* include annotations in the image hash

* rename test images names

* change weedcoco.json in the test

* fix duplicate images test

* remove print

* Some logging in the celery reindex task to understand what's going on
with metadata

* add to mapping redis

* add redis module

* Updated remove_entity_local_record to use the Repository class

* Made redeposit_dataset work with ocfl

* Tidying up more bugs arising from the edit/ocfl merge

* image mapping init

* editing_init extracts last ocfl version into uploads/USER/DATASET

* changes on deposit

* check existing images with both original and hash file names

* flake8

* fix images dir not existing

* fix the weedcoco upload

* manage the consequence of different mode (upload, edit, admin)

* Added a field for head version to the Dataset table

* Versioned zipfiles with rollbacks

* More thorough testing of ocfl versioning

* Fixed rewrite_outputs - was using the original ocfl API which had
changed. Added sample outputs for ocfl versioning tests.

* Regenerated sample versioned datasets, versioning tests are passing

* Some of the new test fixtures which I forgot to add

* Added tests which fetch and extract versioned zipfiles (which fail)

* fix exception

* show notice email content in logger for DEBUG mode

* Fixed bugs in zipfile generation, added a dataset.update method which
ensures that a dataset's inventory gets refreshed after a new ocfl
version is added

* handle validation error in the frontend

* First working ocfl inspection / migration tool

* Added a command-line option to initialise an ocfl repo in an empty
directory

* add mode args for reindex_dataset task

* Validation for objects and repositories

* Improved text outputs for dry runs, fixed a bug in ingest

* fix

* fix traceback

* don't allow removing individual image

* show missing images when going to images section

* Fixed bug in getting head version from ocfl for dataset summary

* fix missing images when no new annotations being uploaded during editing

* flake8

* Remove test for unreferenced images

* Building the upload dir with extract_original_images, which remaps
image files and the weedcoco using redis

* fix import

* rename all images to original ones and change names in the weedcoco

* check original_name

* add extract function to deposit.py

* use extraction from deposit.py

* flake8

* fixes

* add deposit of zipfile with original image names

* mising method

* remove identical checks on weedcoco.json

* extract_original_images now only rewrites the weedcoco.json if it's
been passed a redis URL. This means that the tests can use
assert_files_equal to test contents of zipsfiles again.

* Allowed Repository.deposit to pass through redis_url so that zipfiles
will be generated with remapped files

* Writing hash mappings to redis before generating zipfiles

* Tidied up extract_original_images so that it only loads the weedcoco
and images if it has a redis_url, and pretty-prints the rewritten JSON

* fix cvat cypress test

* Using repository.dataset(id) to get a RepositoryDataset object, and
assumes that the REPOSITORY_DIR setting points to the ocfl root dir

* reindex_dataset had keyword arg 'upload_mode' and was being called with
'mode'. Renamed the arg to 'mode' for consistency.

* Adding logging to image remapping

* allow un-annotated images to be thumbnailed without bbox

* allow un-annotated images to be indexed

* fix

* reindex_dataset was rebuilding the zipfile without a redis URL - this
was overwriting the zipfile with remapped images which deposit had
already built, with one with hashed image names. Fixes #666

* print duplicate images

* show image mapping and add test

* black

* add repo id to error message

* transfer upload_id to retrieve cvat task

* fix thumbnails path

* try fixing remove index ccode

* fix index query

* Revert "fix index query"

This reverts commit 14fe988.

* fix on elasticsearch query

* move removing index to un-dry env

* add dry run to test_index

* click apply cvat task twice

* check upload_id length

* remove double clicking apply

* change attempting apply button to twice

* resolve elasticsearch conflict to proceed

* Older versions of a dataset are available to download via a drop-down
underneath the yellow "download in weedcoco" button

* Add the amount of un-annotated images to weedcoco stat

* rephrase the error message

* change version tag to ocfl version

* set version_tag to be a number

* change version id to string

* update index by switching head to false

* try editing index with source

* another try update index

* change version to string

* add basic filter to frontend

* default to be latest version

* rename value to defaultValue

* Update the submodule hash for the limits/logstash_async fix

Co-authored-by: Elevn Li <zheng.li411@gmail.com>
Co-authored-by: Mike Lynch <mlynch@sydney.edu.au>
Co-authored-by: Elevn Li <51065213+ElevnLi@users.noreply.github.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>

* Rewrote some of the test fixture images, cleaned up an error which was
introduced during the merge

* OCFL repository migration script (#659)

* ocfl migration as manage.py subcommand

* Explicit relative import because it was breaking when called via
manage.py in django

* Moved the ocfl_migrate command to the correct place

* The ocfl_migration command initialises the ocfl repository if it
doesn't already exist

* tasks.migrate_to_ocfl reindexes each dataset after migrating, and the
command-line tool gives feedback on each dataset as it goes

* Added check for when an incoming weedcoco.json has nothing in common
with the dataset's images

* Fixed a migration bug I introduced when I merged a bugfix branch to
this one

* Moved the test for weedcoco.json compatibility with earlier versions
to the correct view

Co-authored-by: Mike Lynch <mlynch@sydney.edu.au>

* hack cypress test

* Add interval of checking status after submit (#644)

* add interval of checking status after submit

* resolve conflict

* Remove reliance on EPPO download by using GBIF API (#593)

* Remove reliance on EPPO download

Fixes #590

Resolves #559

* Ensure no error in upload_status

* Try to debug failed 'Set Form'

* Move logging statement

* This might set correct env

* More logging

* longer wait after Set Form

* Fix flake8 issue from newer version

* rm logging

* Update search/django/weedid/views.py

* fix: remove eppo cache path var

* flake8

Co-authored-by: Henry Lydecker <henry.lydecker@gmail.com>
Co-authored-by: Elevn Li <zheng.li411@gmail.com>

* Allow admin to remove dataset (upload and repository record, database entity, zipfile and elasticsearch index) (#680)

* backend module changes

* add more constraint

* add task and admin command

* change default host value

* hack cypress test

* flake8

* Allow arbitrary subcategorisation of valid categories by dataset contributors (#636)

* Allow for more fields in category editor

* Add possibility of subcategory to category name

* Adapt tests for addition of mapping summary on accordion summary

* fix

* finally found this bug..

* fixe issues

* add subcat to test case

* fix test

* more fix on cypress test

* raise exception for category with the same name

* remove apply button

* frontend fix editing category

* finally found this bug

Co-authored-by: Elevn Li <zheng.li411@gmail.com>

* update readme for build and start (#682)

* add decorator

* remove HTTP_HOST

* another try

* upsize single image size to 30MB

* change back to HTTPS_HOST

* call check_image_zip if there is an error

* fix the MPO format out of PIL

* remove env variable for cvat from docker-compose

* remove apply button and use next as control for cvat tasks retrival

* fix the inconsistency of missing image error for weedcoco and other uploads

* increase size limit from backend

* modify cypress test

* more fix on frontend

* check status code instead of status code

* remove the next hack

* config tus to restart when docker does (#689)

* Instructions hotfix (#691)

* Update README.md

* Add Django build

* Update README.md

* Add Annotation Tab (#690)

* Create annotate.md

* Update annotate.md

* Update annotate.md

Co-authored-by: Elevn Li <zheng.li411@gmail.com>
Co-authored-by: Mike Lynch <mlynch@sydney.edu.au>
Co-authored-by: Mike Lynch <mike@mikelynch.org>
Co-authored-by: Elevn Li <51065213+ElevnLi@users.noreply.github.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: jnot4610 <jnot4610@ac02ztcxgmd6t.lan>
hlydecker added a commit that referenced this issue Aug 10, 2022
* Initial version of cvat_copy api endpoint

* Added a temporary component to test the endpoint which copies images from
CVAT

* Fixed the data path

* init version of stub dataset upload

* Took out the manual cvat-copy component

* Took out the CVAT item from the uploader drop-down

* copy_cvat puts images in {user}/{upload_id}/images

* add /cvat-static to traefik rule

* Added material-ui autocomplete to frontend dependencies.
Updated docker-compose-dev.yml to use a local image of the cvat server to keep it in version sync with the frontend

* Adding cvat as a submodule - removing cvat/ path

* committing before deiniting - botched git submodules

* Removed first attempt at submodule

* revert uploader zip images

* add spinner for loading copy cvat

* CVAT as submodule

* add cypress test for cvat uploader

* submodule changes

* init cypress test for cvat creation and upload

* add DJANGO_AUTH_DEFAULT_GROUPS in docker-compose

* new version of cvat

* fix a cvat loading issue

* add other tests back

* Updated cvat submodule branch

* Added git submodule update to frontend.yml.
Changed prod docker-compose.yml to build our own.
Updated search/cvat commit to match the branch in .gitmodules

* Added CVAT basepath and default groups to docker-compose.yml, made sure that
the cvat storage was available to the weedai django container, put cvat
config settings in the environment for CI

* Fixed linting issues

* Added HTTPS_HOST and CVAT_HOST to workflow env

* Fixed path prefix for cvat-ui static files in prod docker-compose.yml

* Added a timeout to allow cvat to load, enabled electron logging

* improved wait-for-cvat-ui behaviour

* Don't try to login to weedai on return from cvat

* Reverting to waiting for 20s rather than waiting for the registration field

* Increasing wait time after importing cvat dataset

* cypress.get with a timeout to wait for CVAT login field to appear.
Checking out the submodules via the action config rather than a separate command

* Corrected syntax of submodules option

* Fix bug from uploader_helper (#615)

* fix uploader issue and black

* take scheme from request to helper

* Tweaked the random password generator so that it always has numeric characters as cvat requires this

* Set cvat default user groups in the docker-compose rather than .env
Documented CVAT_BASEPATH in the README.md
Fixed some old references to CVAT_HOST in the traefik debug sections

* Removed CVAT_DEFAULT_AUTH_GROUPS and CVAT_HOST from ci setup

* Tried clearing the weedai_sessionid cookie before returning to weedai

* More tweaks on the tests, bug fix for +10 tasks

* Extended the timeout for waiting for cvat registration to appear. Added a couple of extra clicks on the Apply button, which for reasons I don't understand, is allowing the stepper to proceed in Cypress.

* re-add CVAT dataset type

* syntax fix

* update test to new format selection UI

* Half-baked commit of ocfl deposit

* Cvat uploader in stepper was missing 'select-type' step

* Refactor of repo/deposit.py, tests are now starting to fail usefully

* add toggle to switch between dataset and editing page

* fix upload bug

* not showing edit toggle when in specific dataset summary

* add select for editing prompt

fix margin

* fix uploader issue and black

* take scheme from request to helper

* connect front and back for editing

fix the display issue because of the key

* make it more functional

fixes

* modernise UI for selecting annotation type (#618)

* modernise UI for selecting annotation type

* flake8 lint

* keep upload_id for editing

* handle submit for overwriting editing

* flake8

* remove more cvat stuffs

* more ocfl deposit groundwork

* manual fixes

* rename payload to preset

* Working out how to work with the test fixtures

* rewrite_outputs now uses the new repository interface to get fixture data

* Basic deposit and extract is passing

* Multiple datasets is passing

* Lint

* Fixed typo in weedcoco.json fixture which only showed up because duplicate checking isn't working.

* Duplicate image checking is passing

* check-in-repo test is working

* Made ocfl add and update operations atomic

* Creating zipfiles

* Added weedcoco.repo.deposit with API close to original. Adapted celery task to pass in version metadata based on the user.

* Added ocfl-py as weedcoco requirement

* Ported some of the indexing code to ocfl

* Added parameter types to docstrings for ocfl classes

* security check on ocfl identifiers

Co-authored-by: Joel Nothman <joel.nothman@gmail.com>

* fix indentation

* remove edit toggle and show editability on the dataset page

* import shutil

* change to edit icon and put it aside name

* fix

* change to tooltip

* Refactored the Repository and RepositoryDataset objects with a lot of @properties plus more code review cleanups

* Cleaned up old RepositoryDataset.path

* Test for thumbnailing added to test_deposit

* Convert pathlib.Paths to str before writing thumbnail images

* Explicitly convert temp dir in test_deposit.py to pathlib so that is_file() works

* ocfl is now in repository/ocfl because that works nicer with the ocfl-py library, tidied up a few more bugs

* Updated the indexing code to use the new ocfl deposit. Re-implemented
compress_to_download to use ocfl and be more consistent about how it
uses temp directories.

* include annotations in the image hash

* rename test images names

* change weedcoco.json in the test

* fix duplicate images test

* remove print

* Some logging in the celery reindex task to understand what's going on
with metadata

* add to mapping redis

* add redis module

* Updated remove_entity_local_record to use the Repository class

* Made redeposit_dataset work with ocfl

* Tidying up more bugs arising from the edit/ocfl merge

* image mapping init

* editing_init extracts last ocfl version into uploads/USER/DATASET

* changes on deposit

* check existing images with both original and hash file names

* flake8

* fix images dir not existing

* fix the weedcoco upload

* manage the consequence of different mode (upload, edit, admin)

* Added a field for head version to the Dataset table

* Versioned zipfiles with rollbacks

* More thorough testing of ocfl versioning

* Fixed rewrite_outputs - was using the original ocfl API which had
changed. Added sample outputs for ocfl versioning tests.

* Regenerated sample versioned datasets, versioning tests are passing

* Some of the new test fixtures which I forgot to add

* Added tests which fetch and extract versioned zipfiles (which fail)

* fix exception

* show notice email content in logger for DEBUG mode

* Fixed bugs in zipfile generation, added a dataset.update method which
ensures that a dataset's inventory gets refreshed after a new ocfl
version is added

* handle validation error in the frontend

* First working ocfl inspection / migration tool

* Added a command-line option to initialise an ocfl repo in an empty
directory

* add mode args for reindex_dataset task

* Validation for objects and repositories

* Improved text outputs for dry runs, fixed a bug in ingest

* fix

* fix traceback

* don't allow removing individual image

* show missing images when going to images section

* Fixed bug in getting head version from ocfl for dataset summary

* fix missing images when no new annotations being uploaded during editing

* flake8

* Remove test for unreferenced images

* Building the upload dir with extract_original_images, which remaps
image files and the weedcoco using redis

* fix import

* rename all images to original ones and change names in the weedcoco

* check original_name

* add extract function to deposit.py

* use extraction from deposit.py

* flake8

* fixes

* add deposit of zipfile with original image names

* mising method

* remove identical checks on weedcoco.json

* extract_original_images now only rewrites the weedcoco.json if it's
been passed a redis URL. This means that the tests can use
assert_files_equal to test contents of zipsfiles again.

* Allowed Repository.deposit to pass through redis_url so that zipfiles
will be generated with remapped files

* Writing hash mappings to redis before generating zipfiles

* Tidied up extract_original_images so that it only loads the weedcoco
and images if it has a redis_url, and pretty-prints the rewritten JSON

* fix cvat cypress test

* Using repository.dataset(id) to get a RepositoryDataset object, and
assumes that the REPOSITORY_DIR setting points to the ocfl root dir

* reindex_dataset had keyword arg 'upload_mode' and was being called with
'mode'. Renamed the arg to 'mode' for consistency.

* Adding logging to image remapping

* allow un-annotated images to be thumbnailed without bbox

* allow un-annotated images to be indexed

* fix

* reindex_dataset was rebuilding the zipfile without a redis URL - this
was overwriting the zipfile with remapped images which deposit had
already built, with one with hashed image names. Fixes #666

* Replacing the exif manipulation stuff with a straight copy to test my
theory about ocfl duplicates

* print duplicate images

* show image mapping and add test

* black

* add repo id to error message

* transfer upload_id to retrieve cvat task

* Test case for file changes on deposit

* fix thumbnails path

* try fixing remove index ccode

* Version with no exif stripping to test

* fix index query

* Revert "fix index query"

This reverts commit 14fe988.

* Moved the exif-stripping code out of migrate_images and into its own
weedcoco.utils method. It only gets called now at upload, via the
store_tmp_image and store_tmp_image_from_zip functions in django.

* Adding test fixtures for testing deduplication of hashed images

* fix on elasticsearch query

* Added more comprehensive tests for updating a dataset which adds one
image and checks hash mapping. --rewrite-deposit-truth now updates the
hash mappings as test fixtures.

* move removing index to un-dry env

* Removed ocfl debugging function

* add dry run to test_index

* click apply cvat task twice

* check upload_id length

* Fixed bug in tests - version 2 of updates didn't have a correct unhashed
reference dataset to compare to. Test is now building this as part of
--rewrite-deposit-truth.

* Adding new test fixtures (and one of the old ones I'd forgotten to add)

* remove double clicking apply

* change attempting apply button to twice

* resolve elasticsearch conflict to proceed

* Added PIL.ImageOps.exif_transpose to copy_without_exif, which orients
the image according to an Orientation tag (see issue #674)

* load eppo singleton (#559)

* fix email logic (#649)

* Editable Datasets and CVAT (#657)

* try integrating cvat

* dev works now

* remove it from production for now

* Revert "remove it from production for now"

This reverts commit 0cf769c.

* remove exposing port 80 from nginx

* add cvat usyd branch as submodule

* to have both http and https host

* update readme for cvat

* remove HTTPS_HOST from dev

* change cvat basepath to env variable

* add mystatic volume in dev and expose 8080 for cvat admin

* Initial version of cvat_copy api endpoint

* Added a temporary component to test the endpoint which copies images from
CVAT

* Fixed the data path

* init version of stub dataset upload

* Took out the manual cvat-copy component

* Took out the CVAT item from the uploader drop-down

* copy_cvat puts images in {user}/{upload_id}/images

* add /cvat-static to traefik rule

* Added material-ui autocomplete to frontend dependencies.
Updated docker-compose-dev.yml to use a local image of the cvat server to keep it in version sync with the frontend

* Adding cvat as a submodule - removing cvat/ path

* committing before deiniting - botched git submodules

* Removed first attempt at submodule

* revert uploader zip images

* add spinner for loading copy cvat

* CVAT as submodule

* add cypress test for cvat uploader

* submodule changes

* init cypress test for cvat creation and upload

* add DJANGO_AUTH_DEFAULT_GROUPS in docker-compose

* new version of cvat

* fix a cvat loading issue

* add other tests back

* Updated cvat submodule branch

* Added git submodule update to frontend.yml.
Changed prod docker-compose.yml to build our own.
Updated search/cvat commit to match the branch in .gitmodules

* Added CVAT basepath and default groups to docker-compose.yml, made sure that
the cvat storage was available to the weedai django container, put cvat
config settings in the environment for CI

* Fixed linting issues

* Added HTTPS_HOST and CVAT_HOST to workflow env

* Fixed path prefix for cvat-ui static files in prod docker-compose.yml

* Added a timeout to allow cvat to load, enabled electron logging

* improved wait-for-cvat-ui behaviour

* Don't try to login to weedai on return from cvat

* Reverting to waiting for 20s rather than waiting for the registration field

* Increasing wait time after importing cvat dataset

* cypress.get with a timeout to wait for CVAT login field to appear.
Checking out the submodules via the action config rather than a separate command

* Corrected syntax of submodules option

* Fix bug from uploader_helper (#615)

* fix uploader issue and black

* take scheme from request to helper

* Tweaked the random password generator so that it always has numeric characters as cvat requires this

* Set cvat default user groups in the docker-compose rather than .env
Documented CVAT_BASEPATH in the README.md
Fixed some old references to CVAT_HOST in the traefik debug sections

* Removed CVAT_DEFAULT_AUTH_GROUPS and CVAT_HOST from ci setup

* Tried clearing the weedai_sessionid cookie before returning to weedai

* More tweaks on the tests, bug fix for +10 tasks

* Extended the timeout for waiting for cvat registration to appear. Added a couple of extra clicks on the Apply button, which for reasons I don't understand, is allowing the stepper to proceed in Cypress.

* re-add CVAT dataset type

* syntax fix

* update test to new format selection UI

* Half-baked commit of ocfl deposit

* Cvat uploader in stepper was missing 'select-type' step

* Refactor of repo/deposit.py, tests are now starting to fail usefully

* add toggle to switch between dataset and editing page

* fix upload bug

* not showing edit toggle when in specific dataset summary

* add select for editing prompt

fix margin

* fix uploader issue and black

* take scheme from request to helper

* connect front and back for editing

fix the display issue because of the key

* make it more functional

fixes

* modernise UI for selecting annotation type (#618)

* modernise UI for selecting annotation type

* flake8 lint

* keep upload_id for editing

* handle submit for overwriting editing

* flake8

* remove more cvat stuffs

* more ocfl deposit groundwork

* manual fixes

* rename payload to preset

* Working out how to work with the test fixtures

* rewrite_outputs now uses the new repository interface to get fixture data

* Basic deposit and extract is passing

* Multiple datasets is passing

* Lint

* Fixed typo in weedcoco.json fixture which only showed up because duplicate checking isn't working.

* Duplicate image checking is passing

* check-in-repo test is working

* Made ocfl add and update operations atomic

* Creating zipfiles

* Added weedcoco.repo.deposit with API close to original. Adapted celery task to pass in version metadata based on the user.

* Added ocfl-py as weedcoco requirement

* Enable django access logs

* Ported some of the indexing code to ocfl

* Async unpacking of zip files with polling

* Stop raising exceptions in an unexceptional upload_status

* prod celery now needs tus_dir

* Added parameter types to docstrings for ocfl classes

* security check on ocfl identifiers

Co-authored-by: Joel Nothman <joel.nothman@gmail.com>

* fix indentation

* remove edit toggle and show editability on the dataset page

* import shutil

* change to edit icon and put it aside name

* fix

* change to tooltip

* rm trailing whitespace

* black

* Refactored the Repository and RepositoryDataset objects with a lot of @properties plus more code review cleanups

* Cleaned up old RepositoryDataset.path

* Test for thumbnailing added to test_deposit

* Convert pathlib.Paths to str before writing thumbnail images

* Explicitly convert temp dir in test_deposit.py to pathlib so that is_file() works

* ocfl is now in repository/ocfl because that works nicer with the ocfl-py library, tidied up a few more bugs

* Updated the indexing code to use the new ocfl deposit. Re-implemented
compress_to_download to use ocfl and be more consistent about how it
uses temp directories.

* include annotations in the image hash

* rename test images names

* change weedcoco.json in the test

* fix duplicate images test

* remove print

* Some logging in the celery reindex task to understand what's going on
with metadata

* add to mapping redis

* add redis module

* Updated remove_entity_local_record to use the Repository class

* Made redeposit_dataset work with ocfl

* Tidying up more bugs arising from the edit/ocfl merge

* image mapping init

* editing_init extracts last ocfl version into uploads/USER/DATASET

* changes on deposit

* check existing images with both original and hash file names

* flake8

* fix images dir not existing

* fix the weedcoco upload

* manage the consequence of different mode (upload, edit, admin)

* Added a field for head version to the Dataset table

* Versioned zipfiles with rollbacks

* More thorough testing of ocfl versioning

* Fixed rewrite_outputs - was using the original ocfl API which had
changed. Added sample outputs for ocfl versioning tests.

* Regenerated sample versioned datasets, versioning tests are passing

* Some of the new test fixtures which I forgot to add

* Added tests which fetch and extract versioned zipfiles (which fail)

* fix exception

* show notice email content in logger for DEBUG mode

* Fixed bugs in zipfile generation, added a dataset.update method which
ensures that a dataset's inventory gets refreshed after a new ocfl
version is added

* handle validation error in the frontend

* First working ocfl inspection / migration tool

* Added a command-line option to initialise an ocfl repo in an empty
directory

* add mode args for reindex_dataset task

* Validation for objects and repositories

* Improved text outputs for dry runs, fixed a bug in ingest

* fix

* fix traceback

* don't allow removing individual image

* show missing images when going to images section

* Fixed bug in getting head version from ocfl for dataset summary

* fix missing images when no new annotations being uploaded during editing

* flake8

* Remove test for unreferenced images

* Building the upload dir with extract_original_images, which remaps
image files and the weedcoco using redis

* fix import

* rename all images to original ones and change names in the weedcoco

* check original_name

* add extract function to deposit.py

* use extraction from deposit.py

* flake8

* fixes

* add deposit of zipfile with original image names

* mising method

* remove identical checks on weedcoco.json

* extract_original_images now only rewrites the weedcoco.json if it's
been passed a redis URL. This means that the tests can use
assert_files_equal to test contents of zipsfiles again.

* Allowed Repository.deposit to pass through redis_url so that zipfiles
will be generated with remapped files

* Writing hash mappings to redis before generating zipfiles

* Tidied up extract_original_images so that it only loads the weedcoco
and images if it has a redis_url, and pretty-prints the rewritten JSON

* fix cvat cypress test

* Using repository.dataset(id) to get a RepositoryDataset object, and
assumes that the REPOSITORY_DIR setting points to the ocfl root dir

* reindex_dataset had keyword arg 'upload_mode' and was being called with
'mode'. Renamed the arg to 'mode' for consistency.

* Adding logging to image remapping

* allow un-annotated images to be thumbnailed without bbox

* allow un-annotated images to be indexed

* fix

* reindex_dataset was rebuilding the zipfile without a redis URL - this
was overwriting the zipfile with remapped images which deposit had
already built, with one with hashed image names. Fixes #666

* print duplicate images

* show image mapping and add test

* black

* add repo id to error message

* transfer upload_id to retrieve cvat task

* fix thumbnails path

* try fixing remove index ccode

* fix index query

* Revert "fix index query"

This reverts commit 14fe988.

* fix on elasticsearch query

* move removing index to un-dry env

* add dry run to test_index

* click apply cvat task twice

* check upload_id length

* remove double clicking apply

* change attempting apply button to twice

* resolve elasticsearch conflict to proceed

* Older versions of a dataset are available to download via a drop-down
underneath the yellow "download in weedcoco" button

* Add the amount of un-annotated images to weedcoco stat

* rephrase the error message

* change version tag to ocfl version

* set version_tag to be a number

* change version id to string

* update index by switching head to false

* try editing index with source

* another try update index

* change version to string

* add basic filter to frontend

* default to be latest version

* rename value to defaultValue

* Update the submodule hash for the limits/logstash_async fix

Co-authored-by: Elevn Li <zheng.li411@gmail.com>
Co-authored-by: Mike Lynch <mlynch@sydney.edu.au>
Co-authored-by: Elevn Li <51065213+ElevnLi@users.noreply.github.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>

* Rewrote some of the test fixture images, cleaned up an error which was
introduced during the merge

* OCFL repository migration script (#659)

* ocfl migration as manage.py subcommand

* Explicit relative import because it was breaking when called via
manage.py in django

* Moved the ocfl_migrate command to the correct place

* The ocfl_migration command initialises the ocfl repository if it
doesn't already exist

* tasks.migrate_to_ocfl reindexes each dataset after migrating, and the
command-line tool gives feedback on each dataset as it goes

* Added check for when an incoming weedcoco.json has nothing in common
with the dataset's images

* Fixed a migration bug I introduced when I merged a bugfix branch to
this one

* Moved the test for weedcoco.json compatibility with earlier versions
to the correct view

Co-authored-by: Mike Lynch <mlynch@sydney.edu.au>

* hack cypress test

* Add interval of checking status after submit (#644)

* add interval of checking status after submit

* resolve conflict

* Remove reliance on EPPO download by using GBIF API (#593)

* Remove reliance on EPPO download

Fixes #590

Resolves #559

* Ensure no error in upload_status

* Try to debug failed 'Set Form'

* Move logging statement

* This might set correct env

* More logging

* longer wait after Set Form

* Fix flake8 issue from newer version

* rm logging

* Update search/django/weedid/views.py

* fix: remove eppo cache path var

* flake8

Co-authored-by: Henry Lydecker <henry.lydecker@gmail.com>
Co-authored-by: Elevn Li <zheng.li411@gmail.com>

* Allow admin to remove dataset (upload and repository record, database entity, zipfile and elasticsearch index) (#680)

* backend module changes

* add more constraint

* add task and admin command

* change default host value

* hack cypress test

* flake8

* Allow arbitrary subcategorisation of valid categories by dataset contributors (#636)

* Allow for more fields in category editor

* Add possibility of subcategory to category name

* Adapt tests for addition of mapping summary on accordion summary

* fix

* finally found this bug..

* fixe issues

* add subcat to test case

* fix test

* more fix on cypress test

* raise exception for category with the same name

* remove apply button

* frontend fix editing category

* finally found this bug

Co-authored-by: Elevn Li <zheng.li411@gmail.com>

* update readme for build and start (#682)

* add decorator

* remove HTTP_HOST

* another try

* upsize single image size to 30MB

* change back to HTTPS_HOST

* call check_image_zip if there is an error

* fix the MPO format out of PIL

* remove env variable for cvat from docker-compose

* remove apply button and use next as control for cvat tasks retrival

* fix the inconsistency of missing image error for weedcoco and other uploads

* increase size limit from backend

* modify cypress test

* more fix on frontend

* check status code instead of status code

* remove the next hack

* config tus to restart when docker does (#689)

* Create annotate.md

* Update annotate.md

* Update annotate.md

* Add annotation page to website

Co-authored-by: Mike Lynch <mlynch@sydney.edu.au>
Co-authored-by: Elevn Li <zheng.li411@gmail.com>
Co-authored-by: Mike Lynch <mike@mikelynch.org>
Co-authored-by: Elevn Li <51065213+ElevnLi@users.noreply.github.com>
Co-authored-by: Joel Nothman <joel.nothman@gmail.com>
Co-authored-by: jnot4610 <jnot4610@ac02ztcxgmd6t.lan>
Co-authored-by: twhi2311 <tim.white@sydney.edu.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants