Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload throttle corrupts backup, especially OneDrive. Analyzed, with code proposed. #3787

Open
ts678 opened this issue Jun 3, 2019 · 1 comment

Comments

Projects
None yet
2 participants
@ts678
Copy link

commented Jun 3, 2019

  • I have searched open and closed issues for duplicates.

Environment info

  • Duplicati version: 2.0.4.18
  • Operating system: Windows 10 Version 1809
  • Backend: OneDrive or local folder

Description

Issue is that throttle can lose data, seemingly by internal overwrite, resulting in bad remote volume. Proposed fix is ThrottledStream Read should advance its buffer offset as it reads through its Stream.

This is on volume uploads, so is permanent damage, unlike Hash mismatch download corruption, likely from throttle buffer offset being reused where removing the download throttle resolves the issue. This problem is also more serious because it happens at reasonable levels of throttling (at least on OneDrive) but (like the other) can even happen on local files when throttle is extremely low. It's not proven that the problems at the two speeds are identical, however the symptoms sure look the same, and are reversed from the download case because this is from ThrottledStream Read, not Write. In both cases the buffer offset is not adjusted when data is read or written, but here the bug is reading newer data to the initial buffer offset, whereas in the other issue it was writing the initial buffer offset repeatedly to the output.

Both issues produce a file with the right size but scrambled content, however the upload case may not be immediately noticed after backup because correct size will be considered Verified in RemoteVolume table. The default verification of one sample trio of files might catch the issue as a "Hash mismatch", but it's not guaranteed and can also be disabled. If enabled, load-balancing might favor checking new files.

Seeing data corruption is easy using a hex viewer. Look for unexpected runs of NUL characters. Seeing the pattern to confirm the code theory was done using a test file with offset labels. This was easiest to do with a script, so I used Strawberry Perl on Windows. An example script:

for ($offset = 0; $offset < 4 * 1024 * 1024; $offset += 16) {
	printf("OFFSET %07x\n", $offset);
}

This outputs text, but I put it into a file with a .pet extension, to stop compression to keep it readable.

To avoid confusion due to Duplicati block management, set --blocksize=30MB which also ensures only one source file content block will fit in the default 50MB dblock. There are still some discontinuities due to the deflate algorithm making roughly 64KB blocks, but the expectation is that the original file offset belongs in the copy at only a slightly larger offset. The exact corruption varies with parameters, but a OneDrive test with a 4 MiB file (trying to stay within GitHub upload limit) at 2 MB/sec (exceeding my actual upload capability) wound up writing the last 0x520 bytes of a ZIP file on top of its start (leaving NULs at the end of the file, and obliterating the PK usually seen at start -- similar obliteration of AES header on encrypted file can also occur, which causes "Invalid header marker" errors).

Here's what a suspected end-overwrote-start looks like (from test results with Microsoft OneDrive v2):

000000 39 30 0d 0a 4f 46 46 53 45 54 20 30 33 66 66 63  90..OFFSET 03ffc  
000010 61 30 0d 0a 4f 46 46 53 45 54 20 30 33 66 66 63  a0..OFFSET 03ffc  
...
000500 59 34 54 37 39 75 49 44 59 3d 50 4b 05 06 00 00  Y4T79uIDY=PK....  
000510 00 00 03 00 03 00 ea 00 00 00 20 04 40 00 00 00  ......ê... .@...  
000520 46 53 45 54 20 30 30 30 30 34 33 30 0d 0a 4f 46  FSET 0000430..OF  
000530 46 53 45 54 20 30 30 30 30 34 34 30 0d 0a 4f 46  FSET 0000440..OF  

The 50 4b 05 06 "PK" signature is for End of central directory record (EOCD) and when the record ends, upload seamlessly resumes original pattern of offset-text-near-offset.

The proposed code change is to advance the buffer offset in the Read loop. Possibly this can be done by changing Read offset here to offset + count - remaining (not tested).

Possible sightings:

New clue for the Fatal Failed: 400 - bad_request: Sha1 did not match data received found problems with Backblaze B2 and local file (testing by @JonMikelV). jshipp also did much debugging, and Omnicef reported a throttle connection. Some backends have uploads send a hash, so received content can be integrity-checked. This plan is similar to Duplicati's keeping a hash of remote volume, but error can be detected fast (Duplicati wouldn't be able to verify hash without downloading). Backblaze B2 wouldn't show any misbehavior for me though, and I'm not sure why that was so. Google Drive didn't either, but I didn't beat on it as extensively as I did with Backblaze. I don't plan a huge survey. Fix may be easier...

Error 400 - bad_request
https://forum.duplicati.com/t/error-400-bad-request/2202 (see Omnicef on throttling in above post)

dblocks missing SharpAESCrypt header causing multiple other errors #3746
Seeking info. There are other cases of mysterious decryption failures. Could corruption be part of those?

Steps to reproduce

  1. Backup to Microsoft OneDrive v2 with --throttle-upload not hugely above upload line speeds. Enormous enough upload values will work. Tiny ones might hit other issues, and are unrealistic.
    Alternatively, backup to local folder with an unrealistically low --throttle-upload such as 20 KB, simply to show that this isn't a OneDrive problem. Or try a backend that I haven't tested here.
  • Actual result:
    Probably default post-backup verification test says "Failed to process file" with "Hash mismatch".
    Getting a large volume test with --backup-test-samples=9999 or test command can be tried.
  • Expected result:
    Good backup.

Screenshots

Debug log

throttle_upload_corruption.zip

The above zip file has a "OneDrive without encryption" folder with "Temporary" and "Uploaded" folders containing the original dblock (copied out of Temp as dup-2b1129b8-428b-406c-89f1-5a68d662bcdd), and the uploaded dblock b492c27886f0647a2b524ef22bf4b16e5.dblock.zip. Some earlier less-realistic-but-more-portable tests using local file destination with very low --throttle-upload are in the folders "local_without_encryption" and "local_with_encryption". These could be done using the 1MiB_offset.pet, but the OneDrive test required 4MiB_offset.pet and is easier to get on larger files (also easier at lower throttles). There are some auxiliary files doing things like renaming the Temp file with .zip so as to see which is which, or decrypting .aes files with SharpAESCrypt.exe (where possible) to compare against the unencrypted Temp version. Sometimes copying things out of Temp was done with a .bat file to just copy what it found. There are DB bug reports mainly to show the backup structure and see the Verified state.

@warwickmm warwickmm added the bug label Jun 7, 2019

@ts678

This comment has been minimized.

Copy link
Author

commented Jun 7, 2019

This is old code. I verified corruption (maybe with more work) in old releases such as 2.0.4.5 and 2.0.3.5.

I forgot to mention I used --asynchronous-concurrent-upload-limit=1 in original test, to allow such old releases to be tested in a comparable manner. It seemed maybe easier to get at 4 concurrent uploads, however other things might have changed too. Regardless, 100 KB/sec was breaking some testing, so I decided to try editing decompiled source of Read(), and adding my proposed fix. Working fine, so far...

image

ts678 added a commit to ts678/duplicati that referenced this issue Jun 24, 2019

Fix ThrottledStream Read buffer overwrite
Adjust buffer offset by amount read so far in the internal Read loop,
so newly read data goes after old data instead of going on top of it.

This fixes issue duplicati#3787 read buffer corruptions when throttle is used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.