New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(4.x.x) Deduplicating BLOB Store #2314

Open
wants to merge 21 commits into
base: develop-4.x.x
from

Conversation

Projects
None yet
1 participant
@adamretter
Member

adamretter commented Nov 27, 2018

This PR provides a new Deduplicating BLOB Store to eXist-db, the design and new features are explained in my blog here: https://blog.adamretter.org.uk/blob-deduplication/

Binaries of eXist-db 4.5.0 patched with this new BLOB Store can be downloaded for testing from: http://static.adamretter.org.uk/blob-dedup/

NOTE: This PR increments the storage format versions of the collections.dbx and Journal files. So a full Backup and Restore is required from previous versions of eXist-db.

In addition this PR adds the following features:

  • Updates eXist-db to use the FasterXML Java UUID Generator.
  • For Binary Documents, the digest and file size are now available in the XML-RPC and XML:DB APIs.
  • For Binary Documents, the digest and file size now appear in the properties dialog of the Java Admin Client.
  • Adds Binary deduplication to database Backup and Restore, this is disabled by default but can be enabled via a flag or command line option. This option is also exposed in the Java Admin Client as a checkbox.
  • Adds the XQuery function util:binary-doc-content-digest which calculates a digest for a Binary Document. It is also optimised to just retrieve the digest if the digest type matches that used for Binary deduplication (e.g. BLAKE2B-256).

NOTE: This PR first requires:

  1. #2305
  2. #2310

adamretter added some commits Nov 7, 2018

[feature] Increment the version number of the Journal file format as …
…we now have the Blob Store instead of filesystem binary storage
[feature] Increment the version number of the collections.dbx file fo…
…rmat as we now have the Blob Store instead of filesystem binary storage
[feature] Added XQuery function to retrieve a digest of a Binary Docu…
…ment util:binary-doc-content-digest($binary-resource, $algorithm)

@adamretter adamretter force-pushed the adamretter:feature/binary-dedup-4.x.x branch from 0b7bb32 to 12b6754 Dec 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment