-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with verification or repair when there are a lot of duplicate or null blocks and file/s are split into a lot of slices #36
Comments
I think the above may be enough to diagnose the issue, but if not, some more findings/notes:
About the middle point:
|
Got a question - when using the GUI, did you try all of these together in a single session?
Does the CL version fail if you don't explicitly disable AVX2?
Yikes. Since you're using roughly 1 GB in size, I can try this on all types of drives: Rusty, SATA SSD and NVMe SSD. Does the approx 1 GB size matter? What if I make it exactly 1 GB (1073741824 byes)? Of course, my CPU does support AVX2, but I could still use your exact CL parameters when testing the CL version of 1.3.1.8/7/6/5 (those are the 4 I have downloaded already so I can easily test). |
Thanks NilEinne for bug report. I made the dummy file with null bytes and tested on my PC. I found a bug and fixed. I made a sample version, which can verify such case. When there are too many same CRCs, my implemented quick sort function failed. Then, I replaced the function to qsort (C-runtime library), and it seems to work now. I put the sample (par2j_sample_2021-06-15.zip) in "MultiPar_sample" folder on OneDrive. Please test it with your wrong resulted files. |
No, I generally just reopened the PAR2. Especially when changing settings as I didn't know if this would take effect without a restart. I didn't really test much with the GUI once I realised what was going on with the CLI, since it didn't seem that useful. Especially the way the CLI would sometimes fail and sometimes work with everything exactly the same, just by rerunning the command.
I didn't test until now. but it doesn't fail beyond the same behaviour as reported above. The CLI is smart enough to recognise the lack of AVX2 as it should and doesn't try to use it. This is on my A10-5800K, didn't try the I5-3470. I actually expected this, just never changed because I originally started by copying the command line from the GUI log and then when I looked into what the settings did I just kept the 512 one. Yikes. Since you're using roughly 1 GB in size, I can try this on all types of drives: Rusty, SATA SSD and NVMe SSD.
1073741824 bytes should be fine.. Sorry I didn't explain this so well but I'm fairly sure the size doesn't matter. I did test 11010101010 bytes. I also have now tested 300000000. I'm sure you can go smaller if necessary. The only thing that matters is nearly all blocks need to be null or duplicate of each other. You can do this with a big chunk of the file not duplicate, but it changes the midway point. (Except eventually the midway point goes beyond 32768 blocks and you won't have the problem.) The midway point for all blocks (except maybe the last) being duplicate seems to be slicing a file of any size into about 15957 blocks with par2j64 1.3.1.8. The midway point is most perplexing example at least to me given the way you can just re-run the command and sometimes it will fail to verify sometimes it will succeed. Yet it doesn't seem to depend on how many threads or the amount of memory PAR2 can use which to my non programmer eyes seems to be the obvious areas which would result in such inconsistent behaviour between runs. BTW when I tested 300000000 I confirmed the same thing for a null byte file and a file with AD repeating. Finally I did a new test where I generated a file with a 16 byte repeating pattern and confirmed the same behaviour when the size of the blocks was 16 byte aligned. I used this PowerShell script to create it if anyone wants to do the same but it's probably not necessary as for this part, it's no different from the others. |
Thanks! I tried the sample version with a few different files I used for testing the problem, including the original disk image where I first encountered the problem. It seems to be fixed in both the 64 and 32 bit versions wherever it used to occur. |
Ahh, sorting. The bane of any data. Lol. Glad you got it fixed, @Yutaka-Sawada. Should we just download the Sample and replace the existing par2j executable files in existing 1.3.1.8 install? As for size of file, @NilEinne - I have 3x 1 TB NVMe SSD, 2x 1 TB Rusty drives (currently in RAID 1) and a 960 GB SATA III SSD. Space is not an issue, I was just curious if having a file to exact 1 GB size made a difference versus 'around' 1 GB. Obviously, it did not. Glad to know this is all fixed. |
Yes, you do, if you have a recovery set with 20000 over same blocks. Because I didn't test so many blocks on uniform file data, I could not find the bug. As I changed a sorting function only, the sample is available for daily usage. Anyway, next version includes the fix. |
I have encountered a weird bug in MultiPAR with verification or repair that happens when your file/s have a lot of duplicate or null blocks and you create PAR2 files with a lot of blocks.
For a simple non real world test, create a file filled with null bytes, e.g:
fsutil file createnew 1000000000fsutilfile 1000000000
(will likely require administrator privileges)
Then create a PAR2 file splicing the "data" into the maximum number of blocks (32766). Next create a PAR2 file splicing the data file into 5000 recovery blocks. You can verify both, and they'll both verify fine.
Then corrupt the data file slightly i.e. change a few byte somewhere to something besides 00. It can be right at the beginning/first block although that may make it less clear was is going on. Possibly not the last block. I just use a hex editor. Try to verify and you will find with the 32766 block PAR, the verification will break and won't continue to verify after it finds the corrupt block. But with the 5000 block PAR, it will verify and say it can be repaired/rejoined, as expected since all blocks are the same.
Since MultiPAR does recognise null blocks, you may also want to create a file filled with some other byte, I did 10h. You will find the same behaviour. (I used the 00 example because it's fairly simple to create.)
If you try to repair, you will have the same problem with 32766 blocks. An exception is if you use the file filled with null blocks, when you repair a second time it will succeed because the first repair will create a temporary "blank" file so now it just has to rename that. This can't happen if the file is filled with some other byte so repair is impossible.
If you use a larger file e.g. 11010101010 bytes (over x10 previous) you will find the same problem. (I'm sure smaller as well.) I also tried adding non duplicate blocks into the file. You get the same problem even if the corruption is in the non duplicate data (and you therefore need recovery blocks).
Additional details:
Most of my testing was with 1.3.1.8 but I also tried 1.3.0.7. Version 1.3.0.7 (really 1.3.0.6 since it's par2j/64 that's the problem) is worse than 1.3.1.8. par2j is better than par2j64 but both can have the problem. See my reply for details.
Most of my testing was with an AMD A10-5800K with 32GiB of RAM, but I also tried on an Intel i5-3470 with 24GiB RAM. Both computers were running Windows 10 x86-64.
I tried fiddling around with memory settings e.g. 1/8 or 7/8 and also limiting thread count down to a single thread and disabling or enabling GPU and finally changing verification levels. These made little or no difference to the problem. (I didn't try the SSD setting in part because of it occurs on 1.3.0.6 in part because all my testing was on HD.)
The reason I found the bug weird, there is an middle point where the verification will sometimes work and sometimes fail. To be clear, I mean with the exact same files. Just repeat verification 10 times and you should find that sometimes it will stop after finding the broken block, sometimes it will keep going and confirm recovery is possible. (Or recover if you do it in recovery mode.) To be clear this happens even when I use -lc513 to limit par2j64 to one thread, unless this doesn't completely eliminate multi-threading? See this file for a sample output: Sample of output for GitHub.txt
The text was updated successfully, but these errors were encountered: