Do not msync the entire offset index file on every transaction#5018
Conversation
kim
left a comment
There was a problem hiding this comment.
I've thought about it many times, but at this point I'm not on board moving away from the 1-tx-per-commit restriction.
A torn write in the middle of a commit is guaranteed to destroy commit.n transactions. A smaller number of transactions per commit results in a higher number of commits per write, and increases the chance that at least some transactions are recoverable.
The trouble is that we are rather prone to torn writes, not least because they (the writes) are unaligned. Just advising to use confirmed reads is not enough, I'd argue, because users have no way of even knowing how many transactions could potentially be lost -- outside of benchmarking scenarios, I at least would want to design my application such that it doesn't write too much ahead of the "uncertainty window" of the durability layer.
We can certainly do better by improving our I/O model and recovery mechanisms, but at this point I think we'd basically weaken durability guarantees, and I don't think this is a good idea.
The offset index changes look fine to me. I would suggest to increase Options::offset_index_interval_bytes if there is data that suggests that we're updating the index too often.
msync the entire offset index file on every transaction
|
Understood @kim. Thank you for the detailed argument. I'll keep the offset index changes only. |
3a85694 to
a9c5ac6
Compare
Also pass a range to msync when writing entries to the segment offset index file, to be explicit and avoid flushing/examining unnecessary pages.
a9c5ac6 to
5bc263b
Compare
Description of Changes
msyncthe modified pages instead of the entire file.API and ABI breaking changes
None
Expected complexity level and risk
1
Testing
Refactor