Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: queues swallow errors if requeueing #25191

Closed
benesch opened this issue Apr 30, 2018 · 1 comment · Fixed by #25245
Closed

storage: queues swallow errors if requeueing #25191

benesch opened this issue Apr 30, 2018 · 1 comment · Fixed by #25245
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers.

Comments

@benesch
Copy link
Contributor

benesch commented Apr 30, 2018

If queue.process returns an error, that error will be swallowed if the item is slated to be requeued:

if item.requeue {

In the case of the split queue, every command processed on a too-large replica re-adds to the split queue, thus marking the replica for requeueing:

needsSplitBySize := r.needsSplitBySizeRLocked()

So, if a range that fails to split receives enough traffic, the error message indicating why it can't be split is never printed to the logs.

h/t to @nvanbenschoten for discovering this.

@benesch benesch added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers. A-kv-distribution Relating to rebalancing and leasing. labels Apr 30, 2018
@a-robinson
Copy link
Contributor

cc #25073

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 2, 2018
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 3, 2018
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue May 8, 2018
craig bot pushed a commit that referenced this issue May 8, 2018
25245: storage: add unsplittable ranges to split queue purgatory r=nvanbenschoten a=nvanbenschoten

Fixes #25191.

This makes the fix for #24215 trivial.

In #14654 we added a mechanism to double the max range size whenever a
split attempt found a range that was unsplittable. This prevented a
tight loop of split attempts. However, it didn't actually do anything to
help us find a split point to reduce the size of the range in the
future. This size doubling worked in practice, but it was a blunt
instrument that had strange effects (see #24215).

This change rips out this range size doubling and replaces it with a
split queue purgatory. This purgatory is used to house replicas that
are unsplittable, preventing them from getting into a tight loop.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in #25245 May 8, 2018
andreimatei pushed a commit to andreimatei/cockroach that referenced this issue Aug 31, 2018
craig bot pushed a commit that referenced this issue Sep 2, 2018
29440: backport 2.0: storage: improve logging of queue errors r=andreimatei a=andreimatei

Backport of one commit from #25245, because seeing these errors in a 2.0 cluster would have really helped me.

Fixes #25191.

Release note: None


Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-productivity Severe issues that impede the productivity of CockroachDB developers.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants