Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Digit Compression may Produce Incorrect Digits #16

Closed
Mysticial opened this issue Nov 24, 2018 · 0 comments

Comments

Projects
None yet
1 participant
@Mysticial
Copy link
Owner

commented Nov 24, 2018

This the first time in a very long time that a bug of this severity has made it into a production build.

Under very specific circumstances, y-cruncher v0.7.6 will incorrectly compress digits. The .ycd files may have a small number of digits that are wrong.

This bug has been confirmed to be part of the new Digit Viewer, therefore it only affects v0.7.6 (as well as v0.7.7 beta). A fix has been implemented and is currently being tested. I should know in a couple of days whether or not it works. (There is only one known repro case known before the cause of the bug was found. It takes about 48 hours to repro from scratch and 8 hours to repro from last checkpoint.)

A fix will be rolled out in December. I can't do it earlier since I'm away from home and I don't have access to my build box.


The effects of this bug are:

  • Possible incorrect digits for computations when enabling compressed output.
  • Possible incorrect digits when using the Digit Viewer to compress digits.
  • When it manifests, it will do so very late in the computation.
  • It affects all binaries, both x86* and x64.

Mitigating factors are:

  • When the bug manifests in a computation, it will trigger a redundancy check failure with near 100% probability. (It needs to pass a 61-bit modular checksum.)

This is a bug that may block a world record computation attempt. But it will not result in silent failure or undetected data corruption if output verification is enabled. Thus this is being categorized as only the second highest severity.

*There have been multiple bugs in the past with similar effects that only affect the 32-bit binaries (mostly from 32-bit integer overflows). But nobody uses the 32-bit binaries for world record attempts. So they were never considered as severe.

compression bug


Details:

This bug happens to be in one of the open-sourced portions of y-cruncher. The root cause is a subtle error in the sector and word alignment logic for edge cases.

The relevant code is here:

The compressed digit writer has the complicated job of doing both of the following:

  • Sector alignment for raw disk access.
  • Word alignment for digits. (16 or 19 digits per 8-byte word)

Writes where a boundary splits a sector or a word must do a read-modify-write on that boundary instead of a simple write.

The bug arises when a write is not word-aligned, but the word on the boundary is sector-aligned. Because the write is not word-aligned, it must read the partial word, modify it, and write it back. But when a boundary word is sector-aligned, the read-modify-write logic (incorrectly) elides the boundary read. Thus the read is omitted, and the necessary read-modify-write to handle the word-misalignment doesn't happen.

The result is a corruption of the digits on that boundary word. This will show up as minor differences in the digit counts as well as a checksum mismatch.

The very purpose of the read-back verification (added in v0.7.5) is to catch these kind of errors - both software bugs and hardware instability. And it seems to have worked perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.