This PR adds a checksum to the Journal for each entry. The checksum of the entry is then compared when reading the Journal back for recovery. This enables us to ensure that the Journal file is not corrupt.
I have used the xxhash-64 algorithm from the lz4 project, which in my testing for the typical size of journal entries was faster than lz4's xxhash-32, Java's Adler32, and OpenHFT's no-allocation xxhash.
NOTE This PR also increments the Journal file format version to version
dizzzz left a comment •
wow, serious stuff. I get the big picture, LGTM. Wondering if a journal is not completely written due e.g. a JVM crash, will the loader robust for this? wouldn't it be safer to write the checksum before the larger payload data in
@dizzzz some replies to your comments:
No. At the moment, we just detect the failure during recovery and abort. However, recovery will then only have been done as far as the checksum failure. This could lead to inconsistencies in the recovered data. Consider this an improvement on what we had, whereby previously we might ignore journal corruptions and report that recovery completed.
For future improvements, we have two other options which would be more robust:
I am not sure it makes much difference. If there is a corruption, regardless of whether the checksum is before or after the payload, it will not match.
The older journal code is a bit yucky and won't cope with payloads greater than 32 KB. In reality the payloads written by eXist-db are always under 4 KB which is the B-Tree page size.