(4.x.x) Deduplicating BLOB Store #2314
Conversation
0b7bb32
to
12b6754
@dizzzz @wolfgangmm could you please review? |
@adamretter yes I will do |
@@ -57,6 +52,7 @@ | |||
|
|||
JComboBox collections; | |||
JTextField backupTarget; | |||
JCheckBox deduplicateBlobs; |
dizzzz
Dec 26, 2018
Member
this probably needs some documentation
this probably needs some documentation
adamretter
Dec 26, 2018
Author
Member
Sure.
Sure.
the proof is in the pudding ; I am impressed by the work, there might be some documentation needed on the new backup parameter. I hope the performance is as good or better than it was. I shared ideas on (1) nr of files in a directory [performance, inodes] and (2) compression of the blobs and a concern regarding virus scanners on windows hosts (corporate policy) |
one question: if one removes a blob file (e.g. virus scanner) is this detected? @adamretter |
@dizzzz If the file is not available an exception will be thrown and logged. |
@adamretter could you rebase? Looks all good to me code wise, but would test some more. |
The two pre-requisites are pulled in..... |
12b6754
to
15145e4
@wolfgangmm Okay it is now rebased as you requested. |
With approvals from Dannes and Wolfgang, we just need to address the failing appveyor tests. |
Do the appveyor failures in the other two ports also seem harmless? They were about unresolved dependencies on |
Looks like this needs to be rebased now that #2456 was merged first. |
I dont understand... all PRs should be in right now. Is this a GitHub issue? |
@dizzzz I am also unclear why this PR shows a conflict, while the already-merged associated ports (#2460 and #2461) do not. The conflict here, though, is certainly due to the switch in locations of the extensions in #2456 - but that also had ports (#2463 and #2464). Resolving the conflict should be a matter of moving them over to the location that #2456 put them in, I guess. |
@joewiz yes, sorry, moving the files should do it. Regarding the appveyor failure it is not a code problem, but something in appveyor. We also had problems reported by travis which were not there on subsequent runs. |
…lves the copy self-deadlock issue
…rmat as we now have the Blob Store instead of filesystem binary storage
…we now have the Blob Store instead of filesystem binary storage
…-RPC and XML:DB APIs
…ment util:binary-doc-content-digest($binary-resource, $algorithm)
…he Java Admin Client
15145e4
to
4ebaad2
@adamretter Will this help with the problem described in https://stackoverflow.com/q/54640371/659732? If so, we could chime in and say that the forthcoming release will include the feature... |
@joewiz Unrelated I am afraid. |
@adamretter Ok, thanks! |
This PR provides a new Deduplicating BLOB Store to eXist-db, the design and new features are explained in my blog here: https://blog.adamretter.org.uk/blob-deduplication/
Binaries of eXist-db 4.5.0 patched with this new BLOB Store can be downloaded for testing from: http://static.adamretter.org.uk/blob-dedup/
NOTE: This PR increments the storage format versions of the
collections.dbx
and Journal files. So a full Backup and Restore is required from previous versions of eXist-db.In addition this PR adds the following features:
util:binary-doc-content-digest
which calculates a digest for a Binary Document. It is also optimised to just retrieve the digest if the digest type matches that used for Binary deduplication (e.g. BLAKE2B-256).NOTE: This PR first requires: