Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
i thought a bit about how to optimize the chunks cache and just wanted to document one weird idea.
the issue with the chunks cache is that it needs to match the overall repository state (== have up-to-date information about all chunks in all archives, including refcount, size, csize). when backing up multiple machines into same repo, creating an archive of one machine invalidates all chunk caches on the other machines and they need to resync their chunks cache with the repo, which is expensive.
so, there is the idea to store the chunk index into the repo also, so all out-of-sync clients can just fetch the index from the repo.
So we need:
This pretty much sounds like we should just backup the index of repo A into a related, but separate borg repository A'. :-)
I don't think it's a weird idea at all.
I was thinking along similar lines, actually.
I was initially "what iffing" the idea of storing the chunks cache inside
I think similar fresh approaches can be used to optimise the size of other
On 8 December 2015 at 05:58, TW email@example.com wrote:
@RonnyPfannschmidt well, a segment has 5MB, so a 500GB repo has 100.000 segments. That's just an example, but a quite realistic one. Of course, it can be more or less depending on how much data you have.
But I still don't see how your suggestion should be efficient. Just for comparison: the normal, uncached and quite slow repo rebuild goes through ALL archives, ALL files and uses the file item chunk list stored in metadata as increment. The items metadata are stored clustered together in a few segments [not together with file content data]).
BTW, an incremental "just add on top what we already have" approach for the chunks cache only works as long as nothing is removed. Because if something is removed, also the information about it is gone, so we can't subtract. (see the PRs)
if each segment also tracks the removals, then the chunks index willl match the current state after applying a segment, the main problem would be to correct the reference segment of the current state on a vacuum - which is quite hard, since segments will be combined on vacuum