Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability: could not GC completed transaction #8978

Closed
petermattis opened this issue Aug 31, 2016 · 9 comments
Closed

stability: could not GC completed transaction #8978

petermattis opened this issue Aug 31, 2016 · 9 comments
Milestone

Comments

@petermattis
Copy link
Collaborator

Run zerosum -n 4 -m 3 -c none (prerequisite, make build). After 30s, killzerosum`. The log files will have a mess of the following log message:

~/Development/go/src/github.com/cockroachdb/cockroach pmattis/clear-placeholder grep 'could not GC' cockroach-data/*/logs/stderr | head -1
cockroach-data/1/logs/stderr:W160831 10:25:22.188431 storage/intent_resolver.go:344  could not GC completed transaction: end key /Local/Range/"\x93"/"rdsc\x00" must be greater than start /Local/Range/"\x93"/RangeDescriptor

~/Development/go/src/github.com/cockroachdb/cockroach pmattis/clear-placeholder grep 'could not GC' cockroach-data/*/logs/stderr | wc -l
      88

Immediately before this error, we are performing a split:

cockroach-data/3/logs/stderr-I160831 10:26:16.887206 storage/replica_command.go:2213  store=2:2 range=82 [/Table/51/19128-/Table/51/19407): initiating a split of this range at key /Table/51/19350
cockroach-data/3/logs/stderr:W160831 10:26:16.897832 storage/intent_resolver.go:344  could not GC completed transaction: end key /Local/Range/"\xbb\xf7J\xb8"/"rdsc\x00" must be greater than start /Local/Range/"\xbb\xf7J\xb8"/RangeDescriptor

I'm not sure what the error means. Perhaps its innocuous, but it is certainly spammy. @tschottdorf?

Cc @cockroachdb/stability

@tbg
Copy link
Member

tbg commented Aug 31, 2016

These are probably external intent resolutions which reliably end up on the wrong range after splits/rebalances. I thought we fixed that (#7652 (comment)), but perhaps not fully so.

@tbg
Copy link
Member

tbg commented Aug 31, 2016

Ah. I think I know what's going on and it's not good.

We construct a GCRequest and give it a key range [key, key.Next). However, if these are range-local keys, weird things happen:

Take the two keys (the second being the .Next() of the first):

/Local/Range/"\x93"/RangeDescriptor < /Local/Range/"\x93"/"rdsc\x00"

Resolving their addresses (keys.MustAddr), both resolve to /Table/11, leaving an empty key-range.

The more general problem here is that while .Next() gives a lexicographically higher key, it doesn't know anything about addressing, and ideally we would want that to strictly increase as well.

We've talked about making GCRequest a single-key request, which would solve this issue. But it suggests that there could be subtleties elsewhere, preferably near invocations of .Next().

@petermattis
Copy link
Collaborator Author

Can you expand on what badness this specific problem could cause? Are we just failing to clean up the transaction record for a split?

@bdarnell
Copy link
Member

bdarnell commented Sep 1, 2016

We're definitely failing to clean up transaction records for splits and ChangeReplicas (at least for a while; I think the slow GC queue will eventually get them). That may be all that's going wrong, but this has revealed that Next() and Addr() don't work well together, so we should look at other places where we call Next() and see if any of them end up getting passed to Addr(). I did a quick pass and didn't see any obvious causes for concern.

@tbg
Copy link
Member

tbg commented Sep 1, 2016

I'll fix the immediate problem.

tbg added a commit to tbg/cockroach that referenced this issue Sep 1, 2016
The previous way of computing the key range for the GCRequest was incorrectly
making the assumption that `(roachpb.Key).Next` respected ordering on the
corresponding addressing-resolved keys.

This lead to intents on RangeDescriptors not being resolved during range splits
and perhaps rebalances, as observed in cockroachdb#8978.
@tbg
Copy link
Member

tbg commented Sep 1, 2016

#9036 fixes the immediate problem (and removes the annoying error messages), but I'm still suspicious about this in general.

@tbg tbg removed their assignment Sep 1, 2016
@petermattis petermattis added this to the Later milestone Feb 22, 2017
@petermattis
Copy link
Collaborator Author

This is old and might no longer be an issue. We should verify and close.

@petermattis petermattis modified the milestones: 1.1, Later Jun 1, 2017
@tbg
Copy link
Member

tbg commented Jun 1, 2017

cc #7880, which is referred in a comment from #9036. We really ought to be addressing #7880, I think, to restore sanity.

@bdarnell
Copy link
Member

bdarnell commented Jun 1, 2017

I don't think there's any work remaining here that's not captured in #7880, so I'm closing it.

@bdarnell bdarnell closed this as completed Jun 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants