Skip to content

Arbitrary depth support for zip imports#1099

Merged
subdavis merged 2 commits into
web/zip-uploadfrom
web/zip-upload-arbitrary-depth-support
Jan 3, 2022
Merged

Arbitrary depth support for zip imports#1099
subdavis merged 2 commits into
web/zip-uploadfrom
web/zip-upload-arbitrary-depth-support

Conversation

@subdavis
Copy link
Copy Markdown
Contributor

@subdavis subdavis commented Dec 28, 2021

  • Changes a bit of code related to import dataset folder discovery to support arbitrary depths and cases where the zipfile is constructed in an unusual way. Sometimes filenames doesn't list the directory as its own item, which was breaking stuff. See this comment for a better description of what was broken: Web Zip file importing #1062 (review)
  • Changes the integration test to parameterize the dataset rather than the user, so it's immediately clear what test case failed.
  • Replaces the early return statement in postprocess because it makes things easier and you can leave off the "cleanup" routine in the celery task. The server should be smart enough to detect a zip upload and not go around creating empty annotation files and aux folders when it knows that it's about to perform an import.

Now, even if there's an upload where a zip file contains multiple datasets at a/b/c/d and a/1/2, things should work out properly.

@subdavis subdavis requested a review from BryonLewis December 28, 2021 20:19
@subdavis subdavis changed the title Arbitrary depth support for imports Arbitrary depth support for zip imports Dec 28, 2021
@subdavis subdavis force-pushed the web/zip-upload-arbitrary-depth-support branch from cd4ad35 to 72c0305 Compare December 28, 2021 22:16
@subdavis subdavis mentioned this pull request Jan 3, 2022
Copy link
Copy Markdown
Collaborator

@BryonLewis BryonLewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this work with arbitrary levels. I have a few questions and a comment based on using direct zip exports as imports without other datasets included.

Comment thread server/dive_server/crud_rpc.py
Comment thread server/dive_tasks/tasks.py Outdated
json=metdata,
)
for folderName, folderType in discovered_folders.items():
subFolderName = folderName if len(discovered_folders) > 1 else ''
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of the check for a single top level folder means that it imports anything from the exported Zip from the website as a subFolder instead of directly importing it in. I.E. - if you export a dataset as zip using the export all and then try importing it. Instead of treating it as the ./Import_name, it will treat it as ./Import_name/dataset_name meaning that it will always import the exported zips as a sub folder of the import. For multiple sub folders this makes sense but I'm not sure if we want this behavior for a single export zip dataset. I realize this update now allows arbitrary depth of the imports, but at the cost of a smooth export/import procedure for the zip exports. Just want to bring that up and see if it's worth it.

The idea would be to do a sum of the number of 'unstructured' and 'datatset' folders and if it is 1 the subFolderName = '' else it sets it to the current subfolder name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines -650 to -656
# remove metadata
metdata = [constants.TypeMarker, constants.FPSMarker, constants.DatasetMarker]
gc.sendRestRequest(
"DELETE",
f"folder/{folderId}/metadata",
json=metdata,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of this means that top level folders have residual metadata that stick around including fps and a type: zip. The type may be fine as long as we don't ever use type for searching for data. The fps is relative for creating a single dataset import but is a little weird if you have multiple zip files that are being uploaded. Especially because if you set it to the video default rate the fps value will be -1. This may be okay and intended though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread server/tests/integration/conftest.py Outdated
Comment on lines +174 to +175
'name': 'singleDatasetImport',
'path': 'zipTestFiles/singleDatasetImport.zip',
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake on my part SingleDatasetImport not singleDatasetImport. I should probably be more consistent in my zip file names (maybe the idea was to keep dataset imports capitalized in the beginning).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@subdavis subdavis requested a review from BryonLewis January 3, 2022 16:06
@subdavis subdavis merged commit e90ce51 into web/zip-upload Jan 3, 2022
@subdavis subdavis deleted the web/zip-upload-arbitrary-depth-support branch January 3, 2022 16:25
BryonLewis added a commit that referenced this pull request Jan 3, 2022
* initial MVP of zip upload working

* mend

* adding in some failure conditions

* adding in integration testing

* updating tests

* source folder

* starting to address comments

* minor fixes

* removing zipmarker

* adding in multi-import zip  and dataset import zip

* gAdding in beginning of dataset import

* integration testing for multi zip files

* reviewing and simplifying

* Update docs/Web-Version.md

Co-authored-by: Brandon Davis <brandon.davis@kitware.com>

* addressing comments

* Arbitrary depth support for zip imports (#1099)

* Arbitrary depth support for imports

* Respond to comments

Co-authored-by: Brandon Davis <brandon.davis@kitware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants