Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
storage: fix asserts around log truncations #43314
Stop relying on
This change was prompted by a faulty assertion using
Release note (bug fix): Faulty assertion could cause panics when a log truncation took place concurrently with a replica being added to a raft group.
nvanbenschoten left a comment
Agreed. At this point, I'd feel more comfortable just removing it altogether. We'll need to backport this to other release branches as well, right?
Stop relying on `getQuorumIndex` for assertions, as it has subtle behaviour around the addition/removal of replicas to the raft group. This change was prompted by a faulty assertion using `QuorumIndex`, where we previously asserted that `NewFirstIndex <= QuorumIndex`. But this was incorrect, to see why consider a Raft group with first-last indexes `[(100-112), (116-124), (116-124)]`. When a new replica is added, it starts off with a match index of 0, i.e. `[(0,0), (100-112), (116-124), (116-124)]`. Naively, we'd expect the quorum index to be 116, but it's 100. The `FirstIndex` at the leaseholder will be 116, and thus we have `QuorumIndex < FirstIndex <= NewFirstIndex`. We don't want to truncate past the "quorum index", but new replicas muddle our computation of this "quorum index". Release note (bug fix): Faulty assertion could cause panics when a log truncation took place concurrently with a replica being added to a raft group.
irfansharif left a comment
No, this assertion was added to catch a malformed snapshot error that was partly caused due to raft log entries being sent as part of the snapshot. That's not the case for subsequent releases, so was not backported. The assertions on