Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage,sql: add a fast mode to CheckConsistency and expose via SQL #35861

Merged
merged 1 commit into from
Mar 27, 2019

Conversation

tbg
Copy link
Member

@tbg tbg commented Mar 18, 2019

It has always been difficult to diagnose consistency check failures for
to three reasons:

  1. they're rare (and often go undetected)
  2. they require the data directories to meaningfully investigate
  3. failures often occur long after the likely triggers

This PR works to address some of all of these. Note that none of this is
for end user consumption at this point.

We add a "quick" mode to CheckConsistency. In that mode, the hash of a
replica is simply the hash of its RangeAppliedState. We know only of one
incident in which a detected replica inconsistency did not include this
key. This is because the consistently replicated MVCCStats are stored
under this key, and because consistency failures are almost always
caused by to an "atomic" failure to apply a batch.
That is, when a write gets lost, it's very likely that on the replica
losing the write the corresponding stats update will also be missing.

The only known exception is one recent incident involving only time
series data, where there is vague hope that the problem might instead
have been caused by some problem specific to RocksDB merges.

Having the quick mode available makes it feasible to run the consistency
checks throughout our various roachtests, and to detect more
inconsistencies close to when they happen, both of which will naturally
improve our ability to get to the bottom of them.

Consistency checks are usually run on a queue at low frequency, making
it hard to predict when a given range will be checked. To make this
more convenient, we add a sql builtin crdb_internal.check_consistency
which does not trigger fatal node errors when it detects an
inconsistency, and takes three parameters:

  1. whether to use quick mode (bool)
  2. start key (byte)
  3. end key (byte)

These three inputs roughly translate to a

CheckConsistencyRequest{Key: key, EndKey: end_key, Quick: quick}.

and the output is returned as a array of tuples, which can be consumed
conveniently via the desired variant of

SELECT
	(t).*
FROM
	unnest(
		crdb_internal.check_consistency(
			true,
			e'\\x02'::BYTES,
			e'\\xff'::BYTES
		)
	)
		AS t;

  range_id |      status      | detail
+----------+------------------+--------+
         1 | RANGE_CONSISTENT |
         2 | RANGE_CONSISTENT |
         3 | RANGE_CONSISTENT |
         4 | RANGE_CONSISTENT |
         5 | RANGE_CONSISTENT |
         6 | RANGE_CONSISTENT |
         7 | RANGE_CONSISTENT |
         8 | RANGE_CONSISTENT |
         9 | RANGE_CONSISTENT |
        10 | RANGE_CONSISTENT |
        11 | RANGE_CONSISTENT |
        12 | RANGE_CONSISTENT |
        13 | RANGE_CONSISTENT |
        14 | RANGE_CONSISTENT |
        15 | RANGE_CONSISTENT |
        16 | RANGE_CONSISTENT |
        17 | RANGE_CONSISTENT |
        18 | RANGE_CONSISTENT |
        19 | RANGE_CONSISTENT |
        20 | RANGE_CONSISTENT |

The plan is to sprinkle this through some candidate roachtests in the
hope that something will fall out.

Release note: None

@tbg tbg requested review from a team March 18, 2019 14:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bonus points if you also add a function for checking the accuracy of the MVCCStats.

Reviewed 3 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/roachpb/api.proto, line 435 at r1 (raw file):

  // log a diff of inconsistencies if such inconsistencies are found.
  bool with_diff = 2;
  bool quick = 3;

pls comment.
And also can we give this a more descriptive name like "consult only stats"? Elsewhere too.


pkg/roachpb/api.proto, line 439 at r1 (raw file):

// A CheckConsistencyResponse is the return value from the CheckConsistency() method.
// If a replica finds itself to be inconsistent with its lease holder it will panic.

is this comment still the whole story?


pkg/roachpb/api.proto, line 460 at r1 (raw file):

  }

  repeated Result result = 2 [(gogoproto.nullable) = false];

say something about how there'll be one entry for each range within the requested span? Will they be sorted?


pkg/roachpb/api.proto, line 1052 at r1 (raw file):

  // used in logging a diff during checksum verification.
  bool snapshot = 4;
  bool quick = 5;

pls comment.


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2081 at r1 (raw file):

# Ensure that privileged builtins work from DistSQL.
statement ok
SELECT crdb_internal.set_vmodule('foo=2') FROM foo

what happened to this test? You've left part of the comment.


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2082 at r1 (raw file):

# Sanity-check crdb_internal.check_consistency works.
query ITT
SELECT (t).* FROM unnest(crdb_internal.check_consistency(true, '\x02'::bytea, '\xff'::bytea)) as t WHERE (t).range_id = 1

can you comment about the start and end keys and why the expected result is indeterminate?


pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):

					Quick: quick,
				})
				if err := ctx.Txn.DB().Run(ctx.Ctx(), &b); err != nil {

I'm assuming these request don't need to be transactional. If you have access to the db, I'd use that instead of ctx.Txn.


pkg/sql/sem/builtins/builtins.go, line 2821 at r1 (raw file):

				return arr, nil
			},
			// Example usage:

put this in the Info.

And pls comment the arguments and ret vals


pkg/storage/replica_consistency.go, line 75 at r1 (raw file):

// other replicas. When an inconsistency is detected and no diff was requested,
// the consistency check will be re-run to collect a diff, which is then printed
// before calling `log.Fatal`.

comment is stale


pkg/storage/replica_consistency.go, line 77 at r1 (raw file):

// before calling `log.Fatal`.
func (r *Replica) CheckConsistency(
	ctx context.Context, args roachpb.CheckConsistencyRequest, fatalOnMismatch bool,

pls make that bool an enum


pkg/storage/replica_consistency.go, line 94 at r1 (raw file):

	var inconsistencyCount int
	var missingCount int

what's this var? It doesn't seem to be used for anything.


pkg/storage/replica_consistency.go, line 124 at r1 (raw file):

		res.Status = roachpb.CheckConsistencyResponse_RANGE_INCONSISTENT
	} else if !args.Quick {
		res.Status = roachpb.CheckConsistencyResponse_RANGE_CONSISTENT

I think we should return RANGE_CONSISTENT even if args.Quick. It's weird to have a flavor of the RPC for which the possible responses are inconsistent and indeterminate.
I think we should make INDETERMINATE mean strictly that we failed contacting some replicas. Which btw, I don't really understand how it's handled... Did you intend to do something with missingCount?


pkg/storage/replica_consistency.go, line 190 at r1 (raw file):

	// No diff was printed, so we want to re-run with diff.
	// Note that this will call Fatal recursively in `CheckConsistency` (in the code above).

update comment


pkg/storage/storagepb/proposer_kv.proto, line 81 at r1 (raw file):

  // should be saved so that a diff of divergent replicas can later be computed.
  bool save_snapshot = 2;
  bool quick = 3;

pls comment

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replica_consistency.go, line 130 at r1 (raw file):

	if !fatalOnMismatch {
		return resp, nil

how come we early return here and don't do the stats recomputation? What's the thinking?

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The \x02 sprinkled everywhere is some LocalMax. It'd be cool if that was documented. And also if a null input would translate to that, and a null input for the end key would represent the end of the key space.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@tbg tbg changed the title storage,sql: and a fast mode to CheckConsistency and expose via SQL storage,sql: add a fast mode to CheckConsistency and expose via SQL Mar 26, 2019
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the comments, RFAL!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


pkg/roachpb/api.proto, line 435 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment.
And also can we give this a more descriptive name like "consult only stats"? Elsewhere too.

Done.


pkg/roachpb/api.proto, line 439 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

is this comment still the whole story?

Done.


pkg/roachpb/api.proto, line 460 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

say something about how there'll be one entry for each range within the requested span? Will they be sorted?

Done.


pkg/roachpb/api.proto, line 1052 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment.

Done.


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2081 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what happened to this test? You've left part of the comment.

Done. I didn't like that test, it called vmodule(foo=2) and didn't reset it. What the hell? Also unclear what it was testing.


pkg/sql/logictest/testdata/logic_test/builtin_function, line 2082 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

can you comment about the start and end keys and why the expected result is indeterminate?

Obsolete.


pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm assuming these request don't need to be transactional. If you have access to the db, I'd use that instead of ctx.Txn.

I'm using the DB. I don't know of a better way to get my fingers on it, if you know one please let me know.


pkg/sql/sem/builtins/builtins.go, line 2821 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

put this in the Info.

And pls comment the arguments and ret vals

Done.


pkg/storage/replica_consistency.go, line 75 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

comment is stale

Done.


pkg/storage/replica_consistency.go, line 77 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls make that bool an enum

Done.


pkg/storage/replica_consistency.go, line 94 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what's this var? It doesn't seem to be used for anything.

Done.


pkg/storage/replica_consistency.go, line 124 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think we should return RANGE_CONSISTENT even if args.Quick. It's weird to have a flavor of the RPC for which the possible responses are inconsistent and indeterminate.
I think we should make INDETERMINATE mean strictly that we failed contacting some replicas. Which btw, I don't really understand how it's handled... Did you intend to do something with missingCount?

Done.


pkg/storage/replica_consistency.go, line 130 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

how come we early return here and don't do the stats recomputation? What's the thinking?

The thinking is that this isn't something this method ought to be triggering in the first place.


pkg/storage/replica_consistency.go, line 190 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

update comment

Update how?


pkg/storage/storagepb/proposer_kv.proto, line 81 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment

Done.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except it looks to me that a diff is not generated if a range is found to be inconsistent, right? Could we generate it? :)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


docs/generated/sql/functions.md, line 939 at r2 (raw file):

stats_only should only be used...

stats_only is the cheap one, right? So why do you call this out to be expensive? Is that sentence missing a negation?


pkg/roachpb/api.proto, line 439 at r2 (raw file):

    //
    // TODO(tbg): these semantics are an artifact of how consistency checks were
    // first implemented. The extra behavior here would move to the consistency

s/would/should


pkg/roachpb/api.proto, line 495 at r2 (raw file):

    int64 range_id = 1 [(gogoproto.customname) = "RangeID", (gogoproto.casttype) = "RangeID"];
    Status status = 2;
    string detail = 3;

does the detail include the diff, if there's a diff? If so, please say that in a comment.


pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'm using the DB. I don't know of a better way to get my fingers on it, if you know one please let me know.

ugh yeah I guess the DB is not part of the EvalContext. Whatevs.


pkg/sql/sem/builtins/builtins.go, line 2797 at r2 (raw file):

				}

				kFrom := roachpb.Key(*args[1].(*tree.DBytes))

so do NULLs work here, or would it be easy to make them work?


pkg/sql/sem/builtins/builtins.go, line 2845 at r2 (raw file):

			},
			Info: "Runs a consistency check on ranges touching the specified key range. " +
				"the start and end keys default to the minimum and maximum possible. " +

I'm not sure what "default to" means; maybe say if empty, ....

Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change so that when you do the full computation, a diff is generated. Generating a diff when we were only comparing the stats amounts to re-running another internal consistency check and is exactly the kind of work I want to move to higher layers in the future.

Hmm, I should return the start_key in the result so that we can programmatically run deep checks on the individual ranges when the shallow check fails without bending over backwards. Done.

I'll merge this tomorrow morning, but feel free to bors it for me if TC is green

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)


docs/generated/sql/functions.md, line 939 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…
stats_only should only be used...

stats_only is the cheap one, right? So why do you call this out to be expensive? Is that sentence missing a negation?

Good catch, done.


pkg/roachpb/api.proto, line 439 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/would/should

Done.


pkg/roachpb/api.proto, line 495 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

does the detail include the diff, if there's a diff? If so, please say that in a comment.

It does if WithDiff was specified. Updated.


pkg/sql/sem/builtins/builtins.go, line 2802 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

ugh yeah I guess the DB is not part of the EvalContext. Whatevs.

Done.


pkg/sql/sem/builtins/builtins.go, line 2797 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

so do NULLs work here, or would it be easy to make them work?

I don't want to get into that.


pkg/sql/sem/builtins/builtins.go, line 2845 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm not sure what "default to" means; maybe say if empty, ....

Done.

It has always been difficult to diagnose consistency check failures for
to three reasons:

1. they're rare (and often go undetected)
2. they require the data directories to meaningfully investigate
3. failures often occur long after the likely triggers

This PR works to address some of all of these. Note that none of this is
for end user consumption at this point.

We add a "quick" mode to CheckConsistency. In that mode, the hash of a
replica is simply the hash of its RangeAppliedState. We know only of one
incident in which a detected replica inconsistency did not include this
key. This is because the consistently replicated MVCCStats are stored
under this key, and because consistency failures are almost always
caused by to an "atomic" failure to apply a batch.
That is, when a write gets lost, it's very likely that on the replica
losing the write the corresponding stats update will also be missing.

The only known exception is one recent incident involving only time
series data, where there is vague hope that the problem might instead
have been caused by some problem specific to RocksDB merges.

Having the quick mode available makes it feasible to run the consistency
checks throughout our various roachtests, and to detect more
inconsistencies close to when they happen, both of which will naturally
improve our ability to get to the bottom of them.

Consistency checks are usually run on a queue at low frequency, making
it hard to predict when a given range will be checked. To make this more
convenient, we add a sql builtin `crdb_internal.check_consistency` which
does not trigger fatal node errors when it detects an inconsistency, and
takes three parameters:

1. whether to use the stats only (bool), enabling the quick mode
2. start key (byte) (defaults to min possible)
3. end key (byte)   (defaults to max possible)

These three inputs roughly translate to a

    CheckConsistencyRequest{Key: key, EndKey: end_key, Mode: <mode>}.

and the output is returned as a array of tuples, which can be consumed
conveniently via the desired variant of

```
SELECT (t).* FROM unnest(crdb_internal.check_consistency(true, '\x02', '\x04')) as t
WHERE (t).status != 'RANGE_CONSISTENT';

  range_id | status | detail
+----------+--------+--------+
(0 rows)
```
@tbg
Copy link
Member Author

tbg commented Mar 27, 2019

bors r=andreimatei

craig bot pushed a commit that referenced this pull request Mar 27, 2019
35861: storage,sql: add a fast mode to CheckConsistency and expose via SQL r=andreimatei a=tbg

It has always been difficult to diagnose consistency check failures for
to three reasons:

1. they're rare (and often go undetected)
2. they require the data directories to meaningfully investigate
3. failures often occur long after the likely triggers

This PR works to address some of all of these. Note that none of this is
for end user consumption at this point.

We add a "quick" mode to CheckConsistency. In that mode, the hash of a
replica is simply the hash of its RangeAppliedState. We know only of one
incident in which a detected replica inconsistency did not include this
key. This is because the consistently replicated MVCCStats are stored
under this key, and because consistency failures are almost always
caused by to an "atomic" failure to apply a batch.
That is, when a write gets lost, it's very likely that on the replica
losing the write the corresponding stats update will also be missing.

The only known exception is one recent incident involving only time
series data, where there is vague hope that the problem might instead
have been caused by some problem specific to RocksDB merges.

Having the quick mode available makes it feasible to run the consistency
checks throughout our various roachtests, and to detect more
inconsistencies close to when they happen, both of which will naturally
improve our ability to get to the bottom of them.

Consistency checks are usually run on a queue at low frequency, making
it hard to predict when a given range will be checked. To make this
more convenient, we add a sql builtin `crdb_internal.check_consistency`
which does not trigger fatal node errors when it detects an
inconsistency, and takes three parameters:

1. whether to use quick mode (bool)
2. start key (byte)
3. end key (byte)

These three inputs roughly translate to a

    CheckConsistencyRequest{Key: key, EndKey: end_key, Quick: quick}.

and the output is returned as a array of tuples, which can be consumed
conveniently via the desired variant of

```
SELECT
	(t).*
FROM
	unnest(
		crdb_internal.check_consistency(
			true,
			e'\\x02'::BYTES,
			e'\\xff'::BYTES
		)
	)
		AS t;

  range_id |      status      | detail
+----------+------------------+--------+
         1 | RANGE_CONSISTENT |
         2 | RANGE_CONSISTENT |
         3 | RANGE_CONSISTENT |
         4 | RANGE_CONSISTENT |
         5 | RANGE_CONSISTENT |
         6 | RANGE_CONSISTENT |
         7 | RANGE_CONSISTENT |
         8 | RANGE_CONSISTENT |
         9 | RANGE_CONSISTENT |
        10 | RANGE_CONSISTENT |
        11 | RANGE_CONSISTENT |
        12 | RANGE_CONSISTENT |
        13 | RANGE_CONSISTENT |
        14 | RANGE_CONSISTENT |
        15 | RANGE_CONSISTENT |
        16 | RANGE_CONSISTENT |
        17 | RANGE_CONSISTENT |
        18 | RANGE_CONSISTENT |
        19 | RANGE_CONSISTENT |
        20 | RANGE_CONSISTENT |
```

The plan is to sprinkle this through some candidate roachtests in the
hope that something will fall out.

Release note: None

Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
@craig
Copy link
Contributor

craig bot commented Mar 27, 2019

Build succeeded

@craig craig bot merged commit fcfd9b8 into cockroachdb:master Mar 27, 2019
tbg added a commit to tbg/cockroach that referenced this pull request May 21, 2019
In cockroachdb#35861, I made changes to the consistency checksum computation that
were not backwards-compatible. When a 19.1 node asks a 2.1 node for a
fast SHA, the 2.1 node would run a full computation and return a
corresponding SHA which wouldn't match with the leaseholder's.

Bump ReplicaChecksumVersion to make sure that we don't attempt to
compare SHAs across these two releases.

Fixes cockroachdb#37425.

Release note (bug fix): Fixed a potential source of (faux) replica
inconsistencies that can be reported while running a mixed v19.1 / v2.1
cluster. This error (in that situation only) is benign and can be
resolved by upgrading to the latest v19.1 patch release. Every time this
error occurs a "checkpoint" is created which will occupy a large amount
of disk space and which needs to be removed manually (see <store
directory>/auxiliary/checkpoints).
tbg added a commit to tbg/cockroach that referenced this pull request May 22, 2019
In cockroachdb#35861, I made changes to the consistency checksum computation that
were not backwards-compatible. When a 19.1 node asks a 2.1 node for a
fast SHA, the 2.1 node would run a full computation and return a
corresponding SHA which wouldn't match with the leaseholder's.

Bump ReplicaChecksumVersion to make sure that we don't attempt to
compare SHAs across these two releases.

Fixes cockroachdb#37425.

Release note (bug fix): Fixed a potential source of (faux) replica
inconsistencies that can be reported while running a mixed v19.1 / v2.1
cluster. This error (in that situation only) is benign and can be
resolved by upgrading to the latest v19.1 patch release. Every time this
error occurs a "checkpoint" is created which will occupy a large amount
of disk space and which needs to be removed manually (see <store
directory>/auxiliary/checkpoints).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants