Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
storage: timestampCache ignores txn IDs on timestamp collisions #9083
The timestamp cache avoids adding redundant spans to its interval tree. However, its definition of "redundant" is incomplete. A timestamp cache entry consists of a timestamp, a key span, and a transaction ID; spans are considered redundant based solely on their timestamp and keys without regard for the transaction ID.
The purpose of the transaction ID is to allow transactions to read their own writes or to write after they have done a read. This turns out to matter only for the most recent read or write on a given key, so as long as timestamps are globally unique, everything is fine. When two transactions have the same timestamp, problems arise. (Duplicate timestamps may sound rare, but there are various mechanisms that can funnel transactions onto the same timestamps, especially in the presence of clock offsets between nodes)
Consider two transactions executing
When the two transactions have different timestamps, the one with the higher timestamp effectively "owns" the newest span in the timestamp cache, thus guaranteeing that only this transaction will be allowed to commit. When they have the same timestamp, however, the one that reads second "steals" ownership of that span. If the first transaction has already performed its write, this could allow both transactions to write and commit.
The solution is to split the baby: when considering a new span with the same timestamp as one already in the timestamp cache, the transaction ID for all shared parts of the span must be cleared, so neither transaction "owns" it. Then both transactions will be forced to restart when they attempt to write, since they will see a conflict with the existing read (which now appears to be non-transactional).
As I've worked on tests for this, I've found that there are other issues that don't require duplicate timestamps. Consider the "left partial overlap" scenario (similar arguments apply to many other branches of this switch statement). Once the old span has been truncated, future calls to
I don't think we can provide the documented semantics of GetMaxRead while retaining the memory savings of #4789. Fortunately, I'm not sure we have to. We use the timestamp cache to tell us whether a transaction can act at a certain timestamp, or if it needs to be pushed into the future. As long as the value returned by GetMaxRead is in the past, it doesn't matter when in the past it is.
I propose to change the interface to GetMaxRead (and GetMaxWrite) so that instead of taking a transaction ID argument, they return the transaction ID of the latest span in the cache. The caller can then recognize its own transaction ID and see if it's a match. It doesn't actually need to know about what's behind it.
As far as I can tell there's no reason for GetMaxRead to look past the first span. I've made the refactor to have it return rather than take a transaction ID and it seems to be working. There was just one thing slightly tricky about it, which is that when there are two non-overlapping cache entries at the same timestamp, we must report a nil txnID instead of using either of them.