Journal Checksums #2254
Journal Checksums #2254
Conversation
…corrupted Journal
"; transactId = " + transactId); | ||
throw new LogException("Bad pointer to previous in entry: " + loggable.dump()); | ||
} | ||
|
||
// update the checksum for the entry data and backLink | ||
if (payload.hasArray()) { |
dizzzz
Nov 1, 2018
•
Member
looks like almost identical code of line 211 ? I can imagine we want to avoid the cost of making a call to a new method?
looks like almost identical code of line 211 ? I can imagine we want to avoid the cost of making a call to a new method?
adamretter
Nov 2, 2018
Author
Member
I could abstract it, but I would need access to several local vars, as it was small I just reproduced it.
I could abstract it, but I would need access to several local vars, as it was small I just reproduced it.
wow, serious stuff. I get the big picture, LGTM. Wondering if a journal is not completely written due e.g. a JVM crash, will the loader robust for this? wouldn't it be safer to write the checksum before the larger payload data in |
see my remark; I'd guess some longer stress testing might needed here? |
Looks good to me. I cannot remember having seen a corrupted journal yet, but it's hard to know, so having a check is better than not having one ;-) |
@dizzzz some replies to your comments:
No. At the moment, we just detect the failure during recovery and abort. However, recovery will then only have been done as far as the checksum failure. This could lead to inconsistencies in the recovered data. Consider this an improvement on what we had, whereby previously we might ignore journal corruptions and report that recovery completed. For future improvements, we have two other options which would be more robust:
I am not sure it makes much difference. If there is a corruption, regardless of whether the checksum is before or after the payload, it will not match.
The older journal code is a bit yucky and won't cope with payloads greater than 32 KB. In reality the payloads written by eXist-db are always under 4 KB which is the B-Tree page size. |
This PR adds a checksum to the Journal for each entry. The checksum of the entry is then compared when reading the Journal back for recovery. This enables us to ensure that the Journal file is not corrupt.
I have used the xxhash-64 algorithm from the lz4 project, which in my testing for the typical size of journal entries was faster than lz4's xxhash-32, Java's Adler32, and OpenHFT's no-allocation xxhash.
NOTE This PR also increments the Journal file format version to version
3
.