Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Digit Compression may Produce Incorrect Digits #16
This the first time in a very long time that a bug of this severity has made it into a production build.
Under very specific circumstances, y-cruncher v0.7.6 will incorrectly compress digits. The .ycd files may have a small number of digits that are wrong.
This bug has been confirmed to be part of the new Digit Viewer, therefore it only affects v0.7.6 (as well as v0.7.7 beta). A fix has been implemented and is currently being tested. I should know in a couple of days whether or not it works. (There is only one known repro case known before the cause of the bug was found. It takes about 48 hours to repro from scratch and 8 hours to repro from last checkpoint.)
A fix will be rolled out in December. I can't do it earlier since I'm away from home and I don't have access to my build box.
The effects of this bug are:
Mitigating factors are:
This is a bug that may block a world record computation attempt. But it will not result in silent failure or undetected data corruption if output verification is enabled. Thus this is being categorized as only the second highest severity.
*There have been multiple bugs in the past with similar effects that only affect the 32-bit binaries (mostly from 32-bit integer overflows). But nobody uses the 32-bit binaries for world record attempts. So they were never considered as severe.
This bug happens to be in one of the open-sourced portions of y-cruncher. The root cause is a subtle error in the sector and word alignment logic for edge cases.
The relevant code is here:
The compressed digit writer has the complicated job of doing both of the following:
Writes where a boundary splits a sector or a word must do a read-modify-write on that boundary instead of a simple write.
The bug arises when a write is not word-aligned, but the word on the boundary is sector-aligned. Because the write is not word-aligned, it must read the partial word, modify it, and write it back. But when a boundary word is sector-aligned, the read-modify-write logic (incorrectly) elides the boundary read. Thus the read is omitted, and the necessary read-modify-write to handle the word-misalignment doesn't happen.
The result is a corruption of the digits on that boundary word. This will show up as minor differences in the digit counts as well as a checksum mismatch.
The very purpose of the read-back verification (added in v0.7.5) is to catch these kind of errors - both software bugs and hardware instability. And it seems to have worked perfectly.