Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Empty source file can make Recreate download all dblock files fruitlessly with huge delay #3747
This issue is a bug that is part of open Recreating broken DB takes forever [$25] #2889 and Database recreate runs forever #3668 problems. Here, an empty source file leads to Transferring/Reading Entire dblocks instead of dindex for rebuild #1928. There are many forum reports. Topic Recreating database logic/understanding/issue/slow is a good overview and includes much technical analysis and materials. This is much of what I would have put in a GitHub issue if I hadn't been responding to a user post too.
I tested 22.214.171.124 canary first to see if it showed the problem (it did), then went to 126.96.36.199 beta because it's widely used, I know it better and it avoids canary regression questions.
Beyond the initial test described in the forum, I also tried backing up a Linux ISO, recreate, add 1-byte-file, backup, recreate, add 0-byte file, backup, recreate. All downloaded only dindex files until the 0-byte file entered, at which time the recreate downloaded all of the dblock files after downloading the dindex.
From a resource point of view, this is not just wasted downloads, but also wasted potential decryption, unzip, and database operations due the missing block upload, thus the later inability to find its volume.
The countMissingInformation failure to drop to 0, as mentioned in Repair is downloading dblock files (responding to @kenkendk) and the big Recreating database logic/understanding/issue/slow analysis expands to SQL below. Try that on
While I'm wondering if StreamBlockSplitter's block-read loop has a need to use ForceStreamReadAsync() return of 0 as an EOF indicator, my guess is that it could be doing that earlier, where it gets the stream from the channel, but I don't know the core design very well, and I don't know CoCoL very well either...
Regardless, the supplied test, logs, and dumps show an empty file issue, even if my code guess is wrong.
Steps to reproduce
Some DB screenshots are here as an attempted escalation from earlier forum posts which weren't quite as graphic.
Some logs and DB bug reports are here along with lots of explanation by me in forum topic Recreating database logic/understanding/issue/slow.
Here is a DB bug report after the length 0 backup (after ISO then length 1) and then the slow recreate:
Thanks @ts678 for the detailed description.
The log from a user also seems to indicate some queries for each
Great investigation @ts678. I can reproduce this. With a backup with 200 MB source, 50 MB block size, and 1 backup version, recreating the database takes
Thanks for the help and confirmation @warwickmm. If you got a profiling log (preferably also with --profile-all-sql-queries), you might find a long version of what I showed in the forum materials for length 0 (see "log of delete repair.txt") which is that it gets the dindex, does a SELECT series, then finishes with the INSERT you show. Recreate with length 0 then fruitlessly continues to processes the dblock and do an INSERT like above. In comparison, the length 1 log shows that it does the dindex and doesn't do dblock. Length 0 line sampler:
From reversing some complex INSERT text chunks using a text search, I think the execution we see is below:
The complex INSERT which is taking a minute or two in your posted results from forum #6496 is sub-second for me, likely because of minimal-sizing of my test. For a 2 GB dblock size, it's 14-15 hours in issue Database recreate runs forever #3668, which also proposes exponential slowdown based on backup size. It's not clear how it scales with dblock size, but 2 GB is 40 times the default 50 MB, and 14 hours is more than 40 times some reports BUT Database recreate performance in the testing by @JonMikelV on an "older machine" did the complex INSERT in 37 minutes so maybe this is one of those cases where fast hardware (possibly with an SSD) helps. The complex INSERT shows up quite a bit in reports that got to an SQL level of detail, for example forum Slow Database Delete and Repair Queries and Re-generate database duration which appear to show it was a regression, perhaps after 188.8.131.52, which sounds plausible because 184.108.40.206 is where the big concurrency rewrite came in. StreamBlockSplitter.cs wasn't even there before, and when looking further at per-line history of its block loop, I found there was an interesting change committed into the concurrent_processing branch:
Duplicati/Library/Main/Operation/Backup/StreamBlockSplitter.cs "This change changes the logic so that empty blocks are not stored at all." and reasons were given (but not specifically detailed). You can, however, see that ability to restore empty-file-without-block was special-cased, it still exists, and release code changed in 220.127.116.11.
Because the special-case handling of empty files is already there and making backups, simply reverting to old behavior isn't enough to fix recreate (and it's not clear without a better understanding of the issue which was is best moving forward), so maybe we need to add more special-casing for empty file, to avoid dblock chases. There could sometimes still be legitimate reasons for it (e.g. backend corruption) though, so making it faster is good, but keeping it from being slow in normal case would be a huge improvement. I'm not sure if that's best done in C# or SQL (e.g. countMissingInformation). I've been impolitely suggesting that Duplicati needs more SQL/DBA expertise in its support and development people, but I might just be thinking of my own situation...
If we assume that the countMissingInformation false positive is from
Someone should probably put this on a debugger (because I haven't recently), however the additional Size check drove countMissingInformation (found at top of this issue) to 0 on a DB browse that used to find 1.
In other news, I verified that the empty file backup reading back the dblock was in 18.104.22.168 and not 22.214.171.124 which had the normal chain of Path --> BlocksetEntry --> Block --> VolumeID --> Name BUT taking the name of the dblock and trying to open it in Windows File Explorer or 7-Zip found the empty block stored wrong somehow. Comparing to a File Explorer created ZIP, I wonder if calling it Deflate method instead of Store method, and then storing packed size of 0 is valid? If I recall correctly, Duplicati sets compression per-block (which is how it can decide not to compress blocks from already compressed files, and intermix those with compressed blocks). If there was really a problem with empty files in Duplicati ZIP files, it might explain unit test failures that were fixed (attempted perhaps?) in Fixed some of the failing tests in concurrent_processing branch #2627 and then shortly after fixed more aggressively by not avoiding empty blocks. Recreate code apparently didn't get the memo on that, as the all-dblock load is bad, and the creation of the block with VolumeID -1 arguably shouldn't have happened, as that gives us not one but two non-standard ways of showing an empty file, on top of the normal way. Writing code around that is harder. The Pectojin plan might work, and gives one non-standard way of representing a file. Helpful simplification, but
I'd have preferred if 2017 fix (or maybe known fixes+unknown bugs) hadn't special-cased representation of an empty file. Who knows how many other spots needed fix? Need to check code and SQL pervasively. Going back to the old representation where an empty file is represented normally might take lots of doing though.
Very valid concerns. I agree that it is logical to store empty blocks the old way, so that empty blocks and lost data doesn't look identical.
However, I'd also argue that even with the old implementation it's still worth implementing the size checks. Checking will boost performance in both restore and recreate functions because you're potentially avoiding download of a bunch of unneeded volumes.
It's a shame it was broken this way, but I think the right way forward is identifying and fixing any remaining issues with the new implementation.
We've had these changes for a while and, while there are unknown factors, the changes seem to be working with exception of this issue.
Has anyone had a chance to test @ts678's suggestion of adding
I might have been a little ambitious today. I set up the test on a fairly complex 50GB backup and it's still going on 7+ hours with the fix, so I abandoned it.
A slightly less ambitious test with a 1.39GB ISO
backup with empty file:
The difference in backup time got me a little paranoid so I ran all the scenarios twice just to be sure. But it's definitely no question that it helps remarkably.