-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage,sql: add a fast mode to CheckConsistency and expose via SQL #35861
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bonus points if you also add a function for checking the accuracy of the MVCCStats
.
Reviewed 3 of 13 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/roachpb/api.proto, line 435 at r1 (raw file):
// log a diff of inconsistencies if such inconsistencies are found. bool with_diff = 2; bool quick = 3;
pls comment.
And also can we give this a more descriptive name like "consult only stats"? Elsewhere too.
pkg/roachpb/api.proto, line 439 at r1 (raw file):
// A CheckConsistencyResponse is the return value from the CheckConsistency() method. // If a replica finds itself to be inconsistent with its lease holder it will panic.
is this comment still the whole story?
pkg/roachpb/api.proto, line 460 at r1 (raw file):
} repeated Result result = 2 [(gogoproto.nullable) = false];
say something about how there'll be one entry for each range within the requested span? Will they be sorted?
pkg/roachpb/api.proto, line 1052 at r1 (raw file):
// used in logging a diff during checksum verification. bool snapshot = 4; bool quick = 5;
pls comment.
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2081 at r1 (raw file):
# Ensure that privileged builtins work from DistSQL. statement ok SELECT crdb_internal.set_vmodule('foo=2') FROM foo
what happened to this test? You've left part of the comment.
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2082 at r1 (raw file):
# Sanity-check crdb_internal.check_consistency works. query ITT SELECT (t).* FROM unnest(crdb_internal.check_consistency(true, '\x02'::bytea, '\xff'::bytea)) as t WHERE (t).range_id = 1
can you comment about the start and end keys and why the expected result is indeterminate
?
pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):
Quick: quick, }) if err := ctx.Txn.DB().Run(ctx.Ctx(), &b); err != nil {
I'm assuming these request don't need to be transactional. If you have access to the db
, I'd use that instead of ctx.Txn.
pkg/sql/sem/builtins/builtins.go, line 2821 at r1 (raw file):
return arr, nil }, // Example usage:
put this in the Info
.
And pls comment the arguments and ret vals
pkg/storage/replica_consistency.go, line 75 at r1 (raw file):
// other replicas. When an inconsistency is detected and no diff was requested, // the consistency check will be re-run to collect a diff, which is then printed // before calling `log.Fatal`.
comment is stale
pkg/storage/replica_consistency.go, line 77 at r1 (raw file):
// before calling `log.Fatal`. func (r *Replica) CheckConsistency( ctx context.Context, args roachpb.CheckConsistencyRequest, fatalOnMismatch bool,
pls make that bool an enum
pkg/storage/replica_consistency.go, line 94 at r1 (raw file):
var inconsistencyCount int var missingCount int
what's this var? It doesn't seem to be used for anything.
pkg/storage/replica_consistency.go, line 124 at r1 (raw file):
res.Status = roachpb.CheckConsistencyResponse_RANGE_INCONSISTENT } else if !args.Quick { res.Status = roachpb.CheckConsistencyResponse_RANGE_CONSISTENT
I think we should return RANGE_CONSISTENT
even if args.Quick
. It's weird to have a flavor of the RPC for which the possible responses are inconsistent
and indeterminate
.
I think we should make INDETERMINATE
mean strictly that we failed contacting some replicas. Which btw, I don't really understand how it's handled... Did you intend to do something with missingCount
?
pkg/storage/replica_consistency.go, line 190 at r1 (raw file):
// No diff was printed, so we want to re-run with diff. // Note that this will call Fatal recursively in `CheckConsistency` (in the code above).
update comment
pkg/storage/storagepb/proposer_kv.proto, line 81 at r1 (raw file):
// should be saved so that a diff of divergent replicas can later be computed. bool save_snapshot = 2; bool quick = 3;
pls comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
pkg/storage/replica_consistency.go, line 130 at r1 (raw file):
if !fatalOnMismatch { return resp, nil
how come we early return here and don't do the stats recomputation? What's the thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The \x02
sprinkled everywhere is some LocalMax
. It'd be cool if that was documented. And also if a null
input would translate to that, and a null
input for the end key would represent the end of the key space.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @tbg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the comments, RFAL!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/roachpb/api.proto, line 435 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls comment.
And also can we give this a more descriptive name like "consult only stats"? Elsewhere too.
Done.
pkg/roachpb/api.proto, line 439 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
is this comment still the whole story?
Done.
pkg/roachpb/api.proto, line 460 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
say something about how there'll be one entry for each range within the requested span? Will they be sorted?
Done.
pkg/roachpb/api.proto, line 1052 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls comment.
Done.
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2081 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
what happened to this test? You've left part of the comment.
Done. I didn't like that test, it called vmodule(foo=2) and didn't reset it. What the hell? Also unclear what it was testing.
pkg/sql/logictest/testdata/logic_test/builtin_function, line 2082 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
can you comment about the start and end keys and why the expected result is
indeterminate
?
Obsolete.
pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'm assuming these request don't need to be transactional. If you have access to the
db
, I'd use that instead of ctx.Txn.
I'm using the DB. I don't know of a better way to get my fingers on it, if you know one please let me know.
pkg/sql/sem/builtins/builtins.go, line 2821 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
put this in the
Info
.And pls comment the arguments and ret vals
Done.
pkg/storage/replica_consistency.go, line 75 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
comment is stale
Done.
pkg/storage/replica_consistency.go, line 77 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls make that bool an enum
Done.
pkg/storage/replica_consistency.go, line 94 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
what's this var? It doesn't seem to be used for anything.
Done.
pkg/storage/replica_consistency.go, line 124 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think we should return
RANGE_CONSISTENT
even ifargs.Quick
. It's weird to have a flavor of the RPC for which the possible responses areinconsistent
andindeterminate
.
I think we should makeINDETERMINATE
mean strictly that we failed contacting some replicas. Which btw, I don't really understand how it's handled... Did you intend to do something withmissingCount
?
Done.
pkg/storage/replica_consistency.go, line 130 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
how come we early return here and don't do the stats recomputation? What's the thinking?
The thinking is that this isn't something this method ought to be triggering in the first place.
pkg/storage/replica_consistency.go, line 190 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
update comment
Update how?
pkg/storage/storagepb/proposer_kv.proto, line 81 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls comment
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except it looks to me that a diff is not generated if a range is found to be inconsistent, right? Could we generate it? :)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
docs/generated/sql/functions.md, line 939 at r2 (raw file):
stats_only should only be used...
stats_only is the cheap one, right? So why do you call this out to be expensive? Is that sentence missing a negation?
pkg/roachpb/api.proto, line 439 at r2 (raw file):
// // TODO(tbg): these semantics are an artifact of how consistency checks were // first implemented. The extra behavior here would move to the consistency
s/would/should
pkg/roachpb/api.proto, line 495 at r2 (raw file):
int64 range_id = 1 [(gogoproto.customname) = "RangeID", (gogoproto.casttype) = "RangeID"]; Status status = 2; string detail = 3;
does the detail include the diff, if there's a diff? If so, please say that in a comment.
pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I'm using the DB. I don't know of a better way to get my fingers on it, if you know one please let me know.
ugh yeah I guess the DB is not part of the EvalContext
. Whatevs.
pkg/sql/sem/builtins/builtins.go, line 2797 at r2 (raw file):
} kFrom := roachpb.Key(*args[1].(*tree.DBytes))
so do NULLs
work here, or would it be easy to make them work?
pkg/sql/sem/builtins/builtins.go, line 2845 at r2 (raw file):
}, Info: "Runs a consistency check on ranges touching the specified key range. " + "the start and end keys default to the minimum and maximum possible. " +
I'm not sure what "default to" means; maybe say if empty, ...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change so that when you do the full computation, a diff is generated. Generating a diff when we were only comparing the stats amounts to re-running another internal consistency check and is exactly the kind of work I want to move to higher layers in the future.
Hmm, I should return the start_key
in the result so that we can programmatically run deep checks on the individual ranges when the shallow check fails without bending over backwards. Done.
I'll merge this tomorrow morning, but feel free to bors it for me if TC is green
TFTR!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
docs/generated/sql/functions.md, line 939 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
stats_only should only be used...
stats_only is the cheap one, right? So why do you call this out to be expensive? Is that sentence missing a negation?
Good catch, done.
pkg/roachpb/api.proto, line 439 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/would/should
Done.
pkg/roachpb/api.proto, line 495 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
does the detail include the diff, if there's a diff? If so, please say that in a comment.
It does if WithDiff
was specified. Updated.
pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
ugh yeah I guess the DB is not part of the
EvalContext
. Whatevs.
Done.
pkg/sql/sem/builtins/builtins.go, line 2797 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
so do
NULLs
work here, or would it be easy to make them work?
I don't want to get into that.
pkg/sql/sem/builtins/builtins.go, line 2845 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'm not sure what "default to" means; maybe say
if empty, ...
.
Done.
It has always been difficult to diagnose consistency check failures for to three reasons: 1. they're rare (and often go undetected) 2. they require the data directories to meaningfully investigate 3. failures often occur long after the likely triggers This PR works to address some of all of these. Note that none of this is for end user consumption at this point. We add a "quick" mode to CheckConsistency. In that mode, the hash of a replica is simply the hash of its RangeAppliedState. We know only of one incident in which a detected replica inconsistency did not include this key. This is because the consistently replicated MVCCStats are stored under this key, and because consistency failures are almost always caused by to an "atomic" failure to apply a batch. That is, when a write gets lost, it's very likely that on the replica losing the write the corresponding stats update will also be missing. The only known exception is one recent incident involving only time series data, where there is vague hope that the problem might instead have been caused by some problem specific to RocksDB merges. Having the quick mode available makes it feasible to run the consistency checks throughout our various roachtests, and to detect more inconsistencies close to when they happen, both of which will naturally improve our ability to get to the bottom of them. Consistency checks are usually run on a queue at low frequency, making it hard to predict when a given range will be checked. To make this more convenient, we add a sql builtin `crdb_internal.check_consistency` which does not trigger fatal node errors when it detects an inconsistency, and takes three parameters: 1. whether to use the stats only (bool), enabling the quick mode 2. start key (byte) (defaults to min possible) 3. end key (byte) (defaults to max possible) These three inputs roughly translate to a CheckConsistencyRequest{Key: key, EndKey: end_key, Mode: <mode>}. and the output is returned as a array of tuples, which can be consumed conveniently via the desired variant of ``` SELECT (t).* FROM unnest(crdb_internal.check_consistency(true, '\x02', '\x04')) as t WHERE (t).status != 'RANGE_CONSISTENT'; range_id | status | detail +----------+--------+--------+ (0 rows) ```
bors r=andreimatei |
35861: storage,sql: add a fast mode to CheckConsistency and expose via SQL r=andreimatei a=tbg It has always been difficult to diagnose consistency check failures for to three reasons: 1. they're rare (and often go undetected) 2. they require the data directories to meaningfully investigate 3. failures often occur long after the likely triggers This PR works to address some of all of these. Note that none of this is for end user consumption at this point. We add a "quick" mode to CheckConsistency. In that mode, the hash of a replica is simply the hash of its RangeAppliedState. We know only of one incident in which a detected replica inconsistency did not include this key. This is because the consistently replicated MVCCStats are stored under this key, and because consistency failures are almost always caused by to an "atomic" failure to apply a batch. That is, when a write gets lost, it's very likely that on the replica losing the write the corresponding stats update will also be missing. The only known exception is one recent incident involving only time series data, where there is vague hope that the problem might instead have been caused by some problem specific to RocksDB merges. Having the quick mode available makes it feasible to run the consistency checks throughout our various roachtests, and to detect more inconsistencies close to when they happen, both of which will naturally improve our ability to get to the bottom of them. Consistency checks are usually run on a queue at low frequency, making it hard to predict when a given range will be checked. To make this more convenient, we add a sql builtin `crdb_internal.check_consistency` which does not trigger fatal node errors when it detects an inconsistency, and takes three parameters: 1. whether to use quick mode (bool) 2. start key (byte) 3. end key (byte) These three inputs roughly translate to a CheckConsistencyRequest{Key: key, EndKey: end_key, Quick: quick}. and the output is returned as a array of tuples, which can be consumed conveniently via the desired variant of ``` SELECT (t).* FROM unnest( crdb_internal.check_consistency( true, e'\\x02'::BYTES, e'\\xff'::BYTES ) ) AS t; range_id | status | detail +----------+------------------+--------+ 1 | RANGE_CONSISTENT | 2 | RANGE_CONSISTENT | 3 | RANGE_CONSISTENT | 4 | RANGE_CONSISTENT | 5 | RANGE_CONSISTENT | 6 | RANGE_CONSISTENT | 7 | RANGE_CONSISTENT | 8 | RANGE_CONSISTENT | 9 | RANGE_CONSISTENT | 10 | RANGE_CONSISTENT | 11 | RANGE_CONSISTENT | 12 | RANGE_CONSISTENT | 13 | RANGE_CONSISTENT | 14 | RANGE_CONSISTENT | 15 | RANGE_CONSISTENT | 16 | RANGE_CONSISTENT | 17 | RANGE_CONSISTENT | 18 | RANGE_CONSISTENT | 19 | RANGE_CONSISTENT | 20 | RANGE_CONSISTENT | ``` The plan is to sprinkle this through some candidate roachtests in the hope that something will fall out. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Build succeeded |
In cockroachdb#35861, I made changes to the consistency checksum computation that were not backwards-compatible. When a 19.1 node asks a 2.1 node for a fast SHA, the 2.1 node would run a full computation and return a corresponding SHA which wouldn't match with the leaseholder's. Bump ReplicaChecksumVersion to make sure that we don't attempt to compare SHAs across these two releases. Fixes cockroachdb#37425. Release note (bug fix): Fixed a potential source of (faux) replica inconsistencies that can be reported while running a mixed v19.1 / v2.1 cluster. This error (in that situation only) is benign and can be resolved by upgrading to the latest v19.1 patch release. Every time this error occurs a "checkpoint" is created which will occupy a large amount of disk space and which needs to be removed manually (see <store directory>/auxiliary/checkpoints).
In cockroachdb#35861, I made changes to the consistency checksum computation that were not backwards-compatible. When a 19.1 node asks a 2.1 node for a fast SHA, the 2.1 node would run a full computation and return a corresponding SHA which wouldn't match with the leaseholder's. Bump ReplicaChecksumVersion to make sure that we don't attempt to compare SHAs across these two releases. Fixes cockroachdb#37425. Release note (bug fix): Fixed a potential source of (faux) replica inconsistencies that can be reported while running a mixed v19.1 / v2.1 cluster. This error (in that situation only) is benign and can be resolved by upgrading to the latest v19.1 patch release. Every time this error occurs a "checkpoint" is created which will occupy a large amount of disk space and which needs to be removed manually (see <store directory>/auxiliary/checkpoints).
It has always been difficult to diagnose consistency check failures for
to three reasons:
This PR works to address some of all of these. Note that none of this is
for end user consumption at this point.
We add a "quick" mode to CheckConsistency. In that mode, the hash of a
replica is simply the hash of its RangeAppliedState. We know only of one
incident in which a detected replica inconsistency did not include this
key. This is because the consistently replicated MVCCStats are stored
under this key, and because consistency failures are almost always
caused by to an "atomic" failure to apply a batch.
That is, when a write gets lost, it's very likely that on the replica
losing the write the corresponding stats update will also be missing.
The only known exception is one recent incident involving only time
series data, where there is vague hope that the problem might instead
have been caused by some problem specific to RocksDB merges.
Having the quick mode available makes it feasible to run the consistency
checks throughout our various roachtests, and to detect more
inconsistencies close to when they happen, both of which will naturally
improve our ability to get to the bottom of them.
Consistency checks are usually run on a queue at low frequency, making
it hard to predict when a given range will be checked. To make this
more convenient, we add a sql builtin
crdb_internal.check_consistency
which does not trigger fatal node errors when it detects an
inconsistency, and takes three parameters:
These three inputs roughly translate to a
and the output is returned as a array of tuples, which can be consumed
conveniently via the desired variant of
The plan is to sprinkle this through some candidate roachtests in the
hope that something will fall out.
Release note: None