Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, opt: support hint to disallow full scan, update coster to avoid full scans #71317

Merged
merged 3 commits into from
Oct 11, 2021

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Oct 8, 2021

sql: partial index scans are exempt from disallow_full_table_scans

Release note (sql change): Fixed an oversight in which a full scan of a
partial index could be rejected due to the disallow_full_table_scans
setting. Full scans of partial indexes will no longer be rejected if
disallow_full_table_scans is true, since a full scan of a partial index
must be a constrained scan of the table.

opt: update coster to avoid full scans with disallow_full_table_scans

Updated the coster so that full table scans and full scans of non-partial
indexes are given a "huge cost" if disallow_full_table_scans is true and
stats are unavailable or the estimated row count is greater than
large_full_scan_rows. Since disallow_full_table_scans and
large_full_scan_rows can now affect the chosen plan, the optimizer now tracks
these settings in the memo to ensure cached plans are not stale.

Fixes #70795

Release note (sql change): The optimizer has been updated so that if
disallow_full_table_scans is true, it will never plan a full table scan with
an estimated row count greater than large_full_scan_rows. If no alternative
plan is possible, an error will be returned, just as it was before. However,
cases where an alternative plan is possible will no longer produce an error,
since the alternative plan will be chosen. As a result, users should see
fewer errors due to disallow_full_table_scans.

A side effect of this change is that if disallow_full_table_scans is set along
with statement-level hints such as an index hint, the optimizer will try to
avoid a full scan while also respecting the index hint. If this is not
possible, the optimizer will return an error and might not log the attempted
full table scan or update the sql.guardrails.full_scan_rejected.count metric.
If no index hint is used, the full scan will be logged and the metric updated.

opt,sql: support hint to disallow full scan

Release note (sql change): Added support for a new index hint,
NO_FULL_SCAN, which will prevent the optimizer from planning a
full scan for the specified table. The hint can be used in the
same way as other existing index hints. For example,
SELECT * FROM table_name@{NO_FULL_SCAN};. Note that a full scan
of a partial index may still be planned, unless NO_FULL_SCAN is
forced in combination with a specific partial index via
FORCE_INDEX=index_name.

@rytaft rytaft requested review from RaduBerinde, michae2 and a team October 8, 2021 15:14
@rytaft rytaft requested a review from a team as a code owner October 8, 2021 15:14
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rytaft
Copy link
Collaborator Author

rytaft commented Oct 8, 2021

I'm going to add a few more tests, so hold off on reviewing for now.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I love to see this.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2 and @RaduBerinde)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed just the first commit for now.

Reviewed 8 of 8 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner, @michae2, @RaduBerinde, and @rytaft)


-- commits, line 5 at r1:
I brought this up when disallow_full_table_scans was added and AFAIK this was intentional. @arulajmani Do you remember why we made the decision to have disallow_full_table_scans prevent full scans on partial indexes?


docs/generated/sql/bnf/stmt_block.bnf, line 2804 at r1 (raw file):

	| 'NO_INDEX_JOIN'
	| 'NO_ZIGZAG_JOIN'
	| 'NO_FULL_SCAN'

nit: seems like this change should not be in the first commit?


pkg/sql/opt/exec/execbuilder/relational.go, line 673 at r1 (raw file):

	// Save if we planned a full table/index scan on the builder so that the
	// planner can be made aware later. We only do this for non-virtual tables.
	if !tab.IsVirtualTable() && isUnfiltered {

nit: assignment to a variable seems unnecessary

@rytaft rytaft force-pushed the no-full-scan branch 2 times, most recently from f6811dd to fed1e6b Compare October 8, 2021 19:20
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! The other commits are RFAL now too.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @mgartner, @michae2, and @RaduBerinde)


-- commits, line 5 at r1:

Previously, mgartner (Marcus Gartner) wrote…

I brought this up when disallow_full_table_scans was added and AFAIK this was intentional. @arulajmani Do you remember why we made the decision to have disallow_full_table_scans prevent full scans on partial indexes?

Maybe it was intentional, but it seems very suboptimal to me -- why should a full scan on INDEX (b) WHERE b > 0 fail while a constrained scan of INDEX (b) succeeds?


docs/generated/sql/bnf/stmt_block.bnf, line 2804 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: seems like this change should not be in the first commit?

Oops good catch. Fixed


pkg/sql/opt/exec/execbuilder/relational.go, line 673 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: assignment to a variable seems unnecessary

The variable is used again in a later commit

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! :lgtm_strong: I think many people will be really happy with this!

Reviewed 5 of 8 files at r1, 15 of 15 files at r6, 8 of 8 files at r7, 12 of 12 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @mgartner, and @RaduBerinde)


pkg/sql/opt/exec/execbuilder/relational.go, line 665 at r8 (raw file):

Quoted 4 lines of code…
		// Normally a full scan of a partial index would be allowed with the
		// NO_FULL_SCAN hint (isUnfiltered is false for partial indexes), but if the
		// user has explicitly forced the partial index *and* used NO_FULL_SCAN, we
		// disallow the full index scan.

Just clarifying: is this because forcing the partial index without a matching predicate could lead to a full scan of the partial index plus a constrained scan of another index to include the rows not in the partial index? And that wouldn't be detected as a full scan?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this will be useful! :lgtm:

Could you add some tests with UPDATE and DELETE? (we can have a filter like k IN (...) where the list is significantly larger than the table size - this kind of query would otherwise do a full table scan. There are some existing tests with hints in execbuilder/testdata/update and delete

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @arulajmani and @mgartner)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 15 of 15 files at r6, 8 of 8 files at r7, 12 of 12 files at r8, all commit messages.
Reviewable status: :shipit: complete! 3 of 0 LGTMs obtained (waiting on @arulajmani)


-- commits, line 5 at r1:
I agree with you on this, just wanted to bring it to your attention. I searched for anything we wrote down about this decision but didn't find anything. My guess is that we just used the same logic from the `sql.log.slow_query.experimental_full_table_scans.enabled] setting added here.

Tthe docs say this about disallow_full_table_scans:

If set to on, all queries that have planned a full table or full secondary index scan will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable.

Maybe we should clarify full non-partial secondary index scan?


pkg/sql/opt/exec/execbuilder/relational.go, line 673 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

The variable is used again in a later commit

Oops, sorry.

Release note (sql change): Fixed an oversight in which a full scan of a
partial index could be rejected due to the disallow_full_table_scans
setting. Full scans of partial indexes will no longer be rejected if
disallow_full_table_scans is true, since a full scan of a partial index
must be a constrained scan of the table.
Updated the coster so that full table scans and full scans of non-partial
indexes are given a "huge cost" if disallow_full_table_scans is true and
stats are unavailable or the estimated row count is greater than
large_full_scan_rows. Since disallow_full_table_scans and
large_full_scan_rows can now affect the chosen plan, the optimizer now tracks
these settings in the memo to ensure cached plans are not stale.

Release note (sql change): The optimizer has been updated so that if
disallow_full_table_scans is true, it will never plan a full table scan with
an estimated row count greater than large_full_scan_rows. If no alternative
plan is possible, an error will be returned, just as it was before. However,
cases where an alternative plan is possible will no longer produce an error,
since the alternative plan will be chosen. As a result, users should see
fewer errors due to disallow_full_table_scans.

A side effect of this change is that if disallow_full_table_scans is set along
with statement-level hints such as an index hint, the optimizer will try to
avoid a full scan while also respecting the index hint. If this is not
possible, the optimizer will return an error and might not log the attempted
full table scan or update the sql.guardrails.full_scan_rejected.count metric.
If no index hint is used, the full scan will be logged and the metric updated.
Release note (sql change): Added support for a new index hint,
NO_FULL_SCAN, which will prevent the optimizer from planning a
full scan for the specified table. The hint can be used in the
same way as other existing index hints. For example,
SELECT * FROM table_name@{NO_FULL_SCAN};. Note that a full scan
of a partial index may still be planned, unless NO_FULL_SCAN is
forced in combination with a specific partial index via
FORCE_INDEX=index_name.
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Could you add some tests with UPDATE and DELETE?

Done.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @arulajmani, @mgartner, @michae2, and @stbof)


-- commits, line 5 at r1:

Previously, mgartner (Marcus Gartner) wrote…

I agree with you on this, just wanted to bring it to your attention. I searched for anything we wrote down about this decision but didn't find anything. My guess is that we just used the same logic from the `sql.log.slow_query.experimental_full_table_scans.enabled] setting added here.

Tthe docs say this about disallow_full_table_scans:

If set to on, all queries that have planned a full table or full secondary index scan will return an error message. This setting does not apply to internal queries, which may plan full table or index scans without checking the session variable.

Maybe we should clarify full non-partial secondary index scan?

Good call. I assume this will get picked up from the release note, but cc @stbof for awareness that the docs on disallow_full_table_scans will need an update.


pkg/sql/opt/exec/execbuilder/relational.go, line 665 at r8 (raw file):

Previously, michae2 (Michael Erickson) wrote…
		// Normally a full scan of a partial index would be allowed with the
		// NO_FULL_SCAN hint (isUnfiltered is false for partial indexes), but if the
		// user has explicitly forced the partial index *and* used NO_FULL_SCAN, we
		// disallow the full index scan.

Just clarifying: is this because forcing the partial index without a matching predicate could lead to a full scan of the partial index plus a constrained scan of another index to include the rows not in the partial index? And that wouldn't be detected as a full scan?

Forcing the partial index without a matching predicate would fail with an error "index "index_name" is a partial index that does not contain all the rows needed to execute this query".

I just added this exception because I've seen examples from customer queries where CRDB was performing a full scan of the partial index when it really could have constrained it further. For example, if we have a partial index index (a) where a > 0 and a predicate a = 1, we should be able to use the partial index and constrain it to a: [/1 - /1]. This exception will allow us to hint more precisely in cases like this.

It also just seems intuitive that if you're using NO_FULL_SCAN with a specific index, you really want to avoid a full scan of that index.

Does this help clarify?

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @arulajmani, @mgartner, @michae2, and @stbof)


pkg/sql/opt/exec/execbuilder/relational.go, line 665 at r8 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Forcing the partial index without a matching predicate would fail with an error "index "index_name" is a partial index that does not contain all the rows needed to execute this query".

I just added this exception because I've seen examples from customer queries where CRDB was performing a full scan of the partial index when it really could have constrained it further. For example, if we have a partial index index (a) where a > 0 and a predicate a = 1, we should be able to use the partial index and constrain it to a: [/1 - /1]. This exception will allow us to hint more precisely in cases like this.

It also just seems intuitive that if you're using NO_FULL_SCAN with a specific index, you really want to avoid a full scan of that index.

Does this help clarify?

Yes, thank you!

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @arulajmani, @mgartner, @michae2, and @stbof)

@craig
Copy link
Contributor

craig bot commented Oct 11, 2021

Build succeeded:

@craig craig bot merged commit 10bdacc into cockroachdb:master Oct 11, 2021
@blathers-crl
Copy link

blathers-crl bot commented Oct 11, 2021

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from ed0af7e to blathers/backport-release-21.2-71317: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@davepacheco
Copy link

Thanks for doing this! I'm excited to check it out.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 2 stale)


-- commits, line 5 at r1:

@arulajmani Do you remember why we made the decision to have disallow_full_table_scans prevent full scans on partial indexes?

Just catching up on my mentions. I can't remember exactly how we ended up on this behaviour, but I think we were concerned about an unbounded partial index scan. This was probably the wrong way of looking at things now that I see the example Becca illustrated above so happy for things to change here!

@RaduBerinde
Copy link
Member

Adding docs-todo label - we need to document NO_FULL_SCAN in https://www.cockroachlabs.com/docs/v21.2/table-expressions#force-index-selection

@ghost ghost removed the docs-todo label Jan 26, 2022
@ghost
Copy link

ghost commented Jan 26, 2022

There is a docs ticket for this: cockroachdb/docs#12279. Actually looking into this more. This PR has several release notes and we had a period of time when some doc issues were not created.

Actually cockroachdb/docs#11936 is the doc ticket for NO_FULL_SCAN. We need some dev-infra work so that the generated doc issues reflect the commit message and not only the PR that generated the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

what exactly does disallow_full_table_scans do? / is it working correctly?
8 participants