Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: use inconsistent scans #36500

Merged
merged 1 commit into from May 17, 2019

Conversation

Projects
None yet
6 participants
@RaduBerinde
Copy link
Member

commented Apr 3, 2019

Automated statistics run with throttling (to reduce impact on running
workloads) and can take a long time. When this time exceeds the TTL,
scans start receiving errors.

This change makes CREATE STATISTICS ... AS OF SYSTEM TIME use a
"moving" timestamp that keeps up with the current time. When the
timestamp is more than 5 minutes in the past, we move it up by the
amount of time that passed since the scan (so that the new timestamp
is as much in the past as the original timestamp was). The comments in
TableReaderSpec describe this in more detail.

Fixes #35432.

Release note (bug fix): Automated table statistics no longer encounter
"batch timestamp must be after replica GC threshold" errors on
configurations with low TTL.

I am hesitant to add even more nuance to the fetcher code but I could see no way around it. I plan on looking at creating a roachtest that reproduces #35432 and run it with and without this change.

@RaduBerinde RaduBerinde requested review from cockroachdb/distsql-prs as code owners Apr 3, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

This change is Reviewable

@andreimatei
Copy link
Member

left a comment

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/distsqlrun/tablereader.go, line 93 at r1 (raw file):

	tr.limitHint = limitHint(spec.LimitHint, post)
	tr.maxResults = spec.MaxResults
	tr.inconsistentScanInterval = time.Duration(spec.InconsistentScanNanos) * time.Nanosecond

nit: time.Nanosecond is 1, so scratch it


pkg/sql/row/fetcher.go, line 496 at r1 (raw file):

		if now.Sub(txnTime) > 2*relativeAsOfTime {
			// Timestamp is too old; advance it.
			// We could in principle recalculate the timestamp on each batch but this

nit: I'd move this comment to the proto or the spec or wherever this all starts. I thought about it for a second when I was reading the commit message to come to the same conclusion.


pkg/sql/row/fetcher.go, line 501 at r1 (raw file):

		}
		ts := hlc.Timestamp{WallTime: txnTime.UnixNano()}
		err := db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {

for symmetry with the consistent case, I'd create a txn and call txn.Send, instead of calling db.Txn(). The difference is how retriable errors are handled, but that shouldn't matter here.
And actually, it might be worth it to reuse the same txn. I think you've asked me and I've told you to just create new transactions every time, but I didn't think we're talking about such a significant use case.

Or even better, it might be worth using non-transactional requests. Because there is a bunch of overhead to allocating transactions.
I think you can do something like

b := client.Batch{Timestamp: ...}
b.Scan(...)
return db.Run(b)

pkg/sql/row/kv_batch_fetcher.go, line 154 at r1 (raw file):

) (txnKVFetcher, error) {
	sendFn := func(ctx context.Context, ba roachpb.BatchRequest) (*roachpb.BatchResponse, error) {
		res, err := txn.Send(ctx, ba)

you'll be happy know that I'm trying to make txn.Send return error, not pErr


pkg/sql/row/kv_batch_fetcher.go, line 161 at r1 (raw file):

	}
	return makeKVBatchFetcherWithSendFunc(
		sendFunc(sendFn), spans, reverse, useBatchLimit, firstBatchLimit, returnRangeInfo,

do you need the cast to sendFunc?

@andreimatei
Copy link
Member

left a comment

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis, @RaduBerinde, and @rytaft)


pkg/sql/row/fetcher.go, line 501 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

for symmetry with the consistent case, I'd create a txn and call txn.Send, instead of calling db.Txn(). The difference is how retriable errors are handled, but that shouldn't matter here.
And actually, it might be worth it to reuse the same txn. I think you've asked me and I've told you to just create new transactions every time, but I didn't think we're talking about such a significant use case.

Or even better, it might be worth using non-transactional requests. Because there is a bunch of overhead to allocating transactions.
I think you can do something like

b := client.Batch{Timestamp: ...}
b.Scan(...)
return db.Run(b)

actually, let's get more opinions here. Cause cross-range non-transactional requests were something that we've considered restricting in the past. Although I think that was more about the difficulties in ensuring atomicity of such batches, but that doesn't matter for reads. (the implementation will wrap the batch into a transaction transparently when it's crossing ranges, and that's pretty gross).
@bdarnell what do you think?

@rytaft

rytaft approved these changes Apr 3, 2019

Copy link
Contributor

left a comment

:lgtm:

Reviewed 16 of 16 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/distsql_plan_stats.go, line 89 at r1 (raw file):

			// It is checked elsewhere that the AS OF time is in the past; this guard
			// is just in case.
			deltaNanos = 1

You check later that this should be at least 1 microsecond. Should this be 1000?

@bdarnell
Copy link
Member

left a comment

Why is this a user-visible option in the grammar? How would a user decide when to use INCONSISTENT or not? If auto stats use INCONSISTENT mode shouldn't the explicit statement default to inconsistent too (so the new keyword would be CONSISTENT)? The consistent case will be extremely rare; is it worth even having the option?

Maybe we could make this automatic based on TTL: pick a timestamp and use it until we either finish a cycle or that timestamp's age is halfway to the TTL. That would be fully consistent in most practical configurations while falling back to inconsistent mode for short TTLs.

Reviewed 16 of 16 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @andreimatei, @jordanlewis, and @RaduBerinde)


pkg/sql/row/fetcher.go, line 501 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

actually, let's get more opinions here. Cause cross-range non-transactional requests were something that we've considered restricting in the past. Although I think that was more about the difficulties in ensuring atomicity of such batches, but that doesn't matter for reads. (the implementation will wrap the batch into a transaction transparently when it's crossing ranges, and that's pretty gross).
@bdarnell what do you think?

I think I'd rather go through a transaction here for consistency with the regular use of AOST. (but one transaction and reuse it until we have to reset the timestamp instead of one per batch, if we're concerned about the allocation cost).

@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2019

It is in the grammar because the automatic stats code issues the command as a SQL statement. I wasn't planning on documenting the option (or the throttling option) for users.

When a user issues a CREATE STATISTICS manually they probably expect it to use a consistent snapshot of the table. At least this was the case until now. I don't see a problem with providing that. Also, INCONSISTENT requires AS OF SYSTEM TIME so it would feel strange if by default the statement cannot be issued without AOST. Using an implicit AOST by default would be problematic, in cases where users call it right after a schema change or similar.

Finally, there may be testing scenarios where we want consistent scans so I'd be hesitant to remove that possibility altogether.

@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented Apr 3, 2019

Maybe we could make this automatic based on TTL: pick a timestamp and use it until we either finish a cycle or that timestamp's age is halfway to the TTL. That would be fully consistent in most practical configurations while falling back to inconsistent mode for short TTLs.

I can look into something like this. The TTL could be changed during the operation though so there would be some difficulty there (not sure how to plumb that to the distsql processor).

@bdarnell

This comment has been minimized.

Copy link
Member

commented Apr 3, 2019

When a user issues a CREATE STATISTICS manually they probably expect it to use a consistent snapshot of the table.

I doubt that users would have expectations that are that detailed. (and if so, I think we should probably tamp down those expectations - we should give ourselves permission to make the stats as approximate as we want as long as it's useful to the optimizer. If you want consistent information, write your own query that computes the stats you want)

Using an implicit AOST by default would be problematic, in cases where users call it right after a schema change or similar.

We could solve that the same way we do for the auto stats job itself: sleep for enough time that our default AOST delay is in the past.

@jordanlewis
Copy link
Member

left a comment

:lgtm: regarding the changes to the fetcher

Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @andreimatei, @jordanlewis, and @RaduBerinde)

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:inconsistent-scan-2 branch 3 times, most recently from 4b4acdb to e2513a9 May 14, 2019

@RaduBerinde
Copy link
Member Author

left a comment

I let this sit until the release and only now got back to working on it. Sorry for letting this much time pass.

I updated the change: I removed the special syntax; we use the new mechanism whenever AS OF SYSTEM TIME is used. I also updated the mechanism, we now use a timestamp until it becomes 5 minutes old, at which point we move it up.

I plan to work on a roach test with a large table in a low TTL scenario but it will be a separate PR.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andreimatei, @bdarnell, and @rytaft)


pkg/sql/distsql_plan_stats.go, line 89 at r1 (raw file):

Previously, rytaft wrote…

You check later that this should be at least 1 microsecond. Should this be 1000?

Changed this code altogether.


pkg/sql/row/fetcher.go, line 496 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I'd move this comment to the proto or the spec or wherever this all starts. I thought about it for a second when I was reading the commit message to come to the same conclusion.

The TableReaderSpec now describes exactly how the ts gets advanced.


pkg/sql/row/fetcher.go, line 501 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think I'd rather go through a transaction here for consistency with the regular use of AOST. (but one transaction and reuse it until we have to reset the timestamp instead of one per batch, if we're concerned about the allocation cost).

Done.


pkg/sql/row/kv_batch_fetcher.go, line 161 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

do you need the cast to sendFunc?

Removed.

@rytaft

rytaft approved these changes May 15, 2019

Copy link
Contributor

left a comment

:lgtm:

Reviewed 19 of 19 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @RaduBerinde)


pkg/sql/distsqlrun/tablereader.go, line 54 at r2 (raw file):

	maxResults uint64

	// See TableReaderSpec.InconsistentScanNanos.

Update comment

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:inconsistent-scan-2 branch from e2513a9 to 35a3029 May 15, 2019

@RaduBerinde
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @andreimatei and @rytaft)


pkg/sql/distsqlrun/tablereader.go, line 54 at r2 (raw file):

Previously, rytaft wrote…

Update comment

Done.

@andreimatei
Copy link
Member

left a comment

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @rytaft)

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:inconsistent-scan-2 branch from 35a3029 to eda1b92 May 16, 2019

@RaduBerinde RaduBerinde requested a review from cockroachdb/sql-ccl-prs as a code owner May 16, 2019

@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented May 16, 2019

Added a test. Thanks to @andreimatei and @tbg for helping me write it!

@rytaft

rytaft approved these changes May 16, 2019

Copy link
Contributor

left a comment

Nice test! :lgtm:

Reviewed 9 of 9 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei)

stats: use inconsistent scans
Automated statistics run with throttling (to reduce impact on running
workloads) and can take a long time. When this time exceeds the TTL,
scans start receiving errors.

This change makes `CREATE STATISTICS ... AS OF SYSTEM TIME` use a
"moving" timestamp that keeps up with the current time. When the
timestamp is more than 5 minutes in the past, we move it up by the
amount of time that passed since the scan (so that the new timestamp
is as much in the past as the original timestamp was). The comments in
`TableReaderSpec` describe this in more detail.

Fixes #35432.

Release note (bug fix): Automated table statistics no longer encounter
"batch timestamp must be after replica GC threshold" errors on
configurations with low TTL.

Release note: None

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:inconsistent-scan-2 branch from eda1b92 to 5cc9a1a May 17, 2019

@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented May 17, 2019

Disabled the test under race (the initial data insertion can take very long in race mode).

@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented May 17, 2019

bors r+

craig bot pushed a commit that referenced this pull request May 17, 2019

Merge #36500
36500: stats: use inconsistent scans r=RaduBerinde a=RaduBerinde

Automated statistics run with throttling (to reduce impact on running
workloads) and can take a long time. When this time exceeds the TTL,
scans start receiving errors.

This change makes `CREATE STATISTICS ... AS OF SYSTEM TIME` use a
"moving" timestamp that keeps up with the current time. When the
timestamp is more than 5 minutes in the past, we move it up by the
amount of time that passed since the scan (so that the new timestamp
is as much in the past as the original timestamp was). The comments in
`TableReaderSpec` describe this in more detail.

Fixes #35432.

Release note (bug fix): Automated table statistics no longer encounter
"batch timestamp must be after replica GC threshold" errors on
configurations with low TTL.

I am hesitant to add even more nuance to the fetcher code but I could see no way around it. I plan on looking at creating a roachtest that reproduces #35432 and run it with and without this change.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@craig

This comment has been minimized.

Copy link

commented May 17, 2019

Build succeeded

@craig craig bot merged commit 5cc9a1a into cockroachdb:master May 17, 2019

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@RaduBerinde RaduBerinde deleted the RaduBerinde:inconsistent-scan-2 branch Jun 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.