Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish(): media files must not exist #216

Merged
merged 19 commits into from
Jul 14, 2022
Merged

Publish(): media files must not exist #216

merged 19 commits into from
Jul 14, 2022

Conversation

frankenjoe
Copy link
Collaborator

@frankenjoe frankenjoe commented Jun 29, 2022

Closes #56

Allows it to publish a new version of a database without downloading all media files first. For media files referenced in a table it is sufficient that they exist in dependencies of the previous version. In that case it is assumed they remain unchanged.

This should significantly speed up publishing, since we

  1. don't have to download all media files
  2. don't have to calculate their checksum

Only for media files that exist in the build folder, the checksum is calculated to check if they were altered. When a media file is altered that shares the archive with other files, missing files are automatically downloaded to the build folder so that the archive can be created.

image

To download only the metadata to the build folder, the argument only_metadata was added to audb.load_to().

image

The usage section was updated accordingly:

image

@codecov
Copy link

codecov bot commented Jun 29, 2022

Codecov Report

Merging #216 (258badb) into master (99ef956) will not change coverage.
The diff coverage is 100.0%.

Impacted Files Coverage Δ
audb/core/load_to.py 100.0% <100.0%> (ø)
audb/core/publish.py 100.0% <100.0%> (ø)

@frankenjoe frankenjoe changed the title WIP: Publish only metadata WIP: Publish without media files Jun 30, 2022
@frankenjoe frankenjoe requested a review from hagenw June 30, 2022 13:59
@frankenjoe frankenjoe changed the title WIP: Publish without media files Publish(): media files must not exist Jun 30, 2022
@hagenw
Copy link
Member

hagenw commented Jul 7, 2022

Cool, that idea is even better then I thought, because you are not adding an argument to audb.publish(), but to audb.load_to() which avoids downloading the data already and with only_metadata we have an argument, that users know already from audb.load(), so even better.

@frankenjoe
Copy link
Collaborator Author

frankenjoe commented Jul 7, 2022

One great advantage is that now we only have to have the media in the build folder that is actually added / altered in that version, which makes it easier to spot the changes. I would even argue it makes sense now to have a build folder for every version rather than a shared one as we had so far.

@hagenw
Copy link
Member

hagenw commented Jul 8, 2022

There is one part where I'm wondering if this still works with your proposal: deleting of files. Before you downloaded all files with audb.load_to() and could then simply delete media files and remove them from the tables to remove them from the database.
Is this now handled by simply remove them from the tables?

@frankenjoe
Copy link
Collaborator Author

frankenjoe commented Jul 8, 2022

Is this now handled by simply remove them from the tables?

Yes, it was always handled this way. You can have as many media files as you want, as long as they are not referenced in at least one table they were never uploaded.

Copy link
Member

@hagenw hagenw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, this will indeed a big relief for updating databases, as you nearly never change all media files.

audb/core/publish.py Outdated Show resolved Hide resolved
audb/core/publish.py Outdated Show resolved Hide resolved
tests/test_publish.py Show resolved Hide resolved
tests/test_publish.py Show resolved Hide resolved
tests/test_publish.py Outdated Show resolved Hide resolved
audb/core/publish.py Outdated Show resolved Hide resolved
audb/core/publish.py Outdated Show resolved Hide resolved
audb/core/publish.py Show resolved Hide resolved
audb/core/publish.py Outdated Show resolved Hide resolved
audb/core/publish.py Outdated Show resolved Hide resolved
frankenjoe and others added 3 commits July 14, 2022 08:47
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
Co-authored-by: Hagen Wierstorf <hwierstorf@audeering.com>
audb/core/publish.py Outdated Show resolved Hide resolved
audb/core/publish.py Show resolved Hide resolved
audb/core/publish.py Show resolved Hide resolved
audb/core/publish.py Outdated Show resolved Hide resolved
@hagenw
Copy link
Member

hagenw commented Jul 14, 2022

I'm already excited to use this new feature :o)

@hagenw hagenw merged commit 0e57e8d into master Jul 14, 2022
@hagenw hagenw deleted the publish-only-metadata branch July 14, 2022 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed up publish when no media files were altered
2 participants