backport-2.1: kv: try next replica on RangeNotFoundError #31250

This was hiding the output if the invocation itself failed, which is when you wanted it most. Release note: None

Release note: None

@nvanbenschoten

Previously, if a Batch RPC came back with a RangeNotFoundError, we would immediately stop trying to send to more replicas, evict the range descriptor, and start a new attempt after a back-off. This new attempt could end up using the same replica, so if the RangeNotFoundError persisted for some amount of time, so would the unsuccessful retries for requests to it as DistSender doesn't aggressively shuffle the replicas. It turns out that there are such situations, and the election-after-restart roachtest spuriously hit one of them: 1. new replica receives a preemptive snapshot and the ConfChange 2. cluster restarts 3. now the new replica is in this state until the range wakes up, which may not happen for some time. 4. the first request to the range runs into the above problem @nvanbenschoten: I think there is an issue to be filed about the tendency of DistSender to get stuck in unfortunate configurations. Fixes cockroachdb#30613. Release note (bug fix): Avoid repeatedly trying a replica that was found to be in the process of being added.

Whenever a successful response is received from an RPC that we know has to contact the leaseholder to succeed, update the leaseholder cache. The immediate motivation for this is to be able to land the preceding commits, which greatly exacerbated (as in, added a much faster failure mode to) ``` make stress PKG=./pkg/sql/logictest TESTS=TestPlannerLogic/5node-dist/distsql_interleaved_join ``` However, the change is one we've wanted to make for a while; our caching and in particular the eviction of leaseholders has been deficient essentially ever since it was first introduced. Touches cockroachdb#31068. Release note: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backport-2.1: kv: try next replica on RangeNotFoundError #31250

backport-2.1: kv: try next replica on RangeNotFoundError #31250

Commits on Oct 12, 2018