sql, opt: implements index skip scan #39668

ridwanmsharif · 2019-08-14T17:30:46Z

This change plans index-skip-scan in the optimizer.
It adds exploration rules to a scan but having them use the
loose index table reader instead of the regular table reader.
This will allow the optimizer to create a far better plan (an
order of magnitude better) for some Distinct queries by
leveraging the index the scan is on. This will also make it
possible to plan index-skip-scans more often when the
prefix of an index is known to have a small number of distinct
values.

cockroach-teamcity · 2019-08-14T17:30:55Z

This change is

RaduBerinde

I think that from the optimizer side, the operator should be semantically equivalent to a DistinctOn -> Scan complex, where:

the distinct columns are a prefix of the index
the ordering inside Distinct refers only to distinct columns
the constraints in the scan only cover the distinct columns

Some of these could be relaxed a bit in the future if necessary, but with non-trivial work.

The distsql physical planner should be in charge of adding distinct processors as needed to guarantee these semantics in distributed cases.

Separately, we should fix the restriction of indexJoin input nodes, now that the local execution code is gone it should be doable. I'm surprised we got this far without fixing that.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

ridwanmsharif

Hmm those are good points. Right now the IndexSkipScan is equivalent to a DistinctOn - > Scan (we add the DistinctOn when building the exec plan). Also yes, the distinct on has to be on the prefix of the index currently.

the ordering inside Distinct refers only to distinct columns

the constraints in the scan only cover the distinct columns

The PR doesn't restrict either of these to just the distinct columns actually. Why do we need these restrictions exactly?
There are tests I added that tests IndexSkipScans work sanely even when the ordering is on columns that are part of the index but not part of the distinct on. Same thing with the constraints, why do you think the restriction on the distinct cols is necessary?

The distsql physical planner should be in charge of adding distinct processors as needed to guarantee these semantics in distributed cases.

I agree with this. Until this is true though, the DistinctOn added in the exec build phase is a workaround I'm okay with.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

RaduBerinde

The PR doesn't restrict either of these to just the distinct columns actually. Why do we need these restrictions exactly?
There are tests I added that tests IndexSkipScans work sanely even when the ordering is on columns that are part of the index but not part of the distinct on. Same thing with the constraints, why do you think the restriction on the distinct cols is necessary?

They're not strictly speaking necessary (I did mention that we could relax them), they're just hard to think about and get right IMO. If we have an ordering on other columns, we have to set up the distinct processors with ordered input synchronizers to select the correct row. There may be cases where we're not even selecting those other columns, but we would need to get the values over to the distinct processor.

Constraints may not be a problem, it's more that they normally come from filters which in general can't support on those other columns anyway.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

This change adds functionality to support index-skip-scans when only a proper subset of the output columns are part of the columns we do an index-skip scan over. Release note: None

This change introduces index skip scans. To start with, we implement index-skip scans when we have a DistinctOn on a prefix of a scan. Following this change, we can add more uses of this when the DistinctOn has an index-join as its input, when there is no DistinctOn but the prefix of the index has very few distinct values and there is a constraint on the columns after those columns (This would need some more thought but I think it's possible if we do an index skip scan and a cross product with the other bounded column values and use this cross product to do a lookup join with the original table). Release note: None

This change adds an exploration rule that allows index skip scans to be used when we have an index join underneath a `DistinctOn`. Release note: None

Previously we were planning an index skip scan as a regular scan with some other properties. This change separates it out into its own operator. This is good because even though a lot of rules that apply to scans are applicable here too, some of them are difficult to reason about and separating it into a separate operator allows us to audit those rules in isolation before applying them instead of just doing it now. This also helps when creating new rules in the future. Another important note is, we need to do some careful work with IndexSkipScan so the exploration rules works sanely when the data is distributed over multiple nodes. This is important because when each node does an `IndexSkipScan` over its own data, the rows returned by different nodes might conflict and so we need a `DistinctOn` to resolve them. and We add this DistinctOn node in the exec builder phase so the guarantees made by the operator always hold. Finally, this change tries to IndexSkipScan behaves sanely when used with various different orderings. Release note: None

ridwanmsharif

Yeah that's fair. We could tighten the requirements for this rule a little more than we do in this PR and then loosen it later. For now, I've added support for orderings that aren't part of the DistinctOn columns as well and have tests exercising behavior in those situations.

I'll defer to the team from here! Thanks for letting me work on such cool projects!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

This processor is not used. It was implemented for cockroachdb#24584, but the PR to start planning it stalled since Ridwan left, a year ago (cockroachdb#39668). This guy is marginally in my way, and, worse, it's dead code. Unless someone promises to finish that PR imminently :). Release note: None

51178: sql/rowexec: delete indexSkipTableReader r=andreimatei a=andreimatei This processor is not used. It was implemented for #24584, but the PR to start planning it stalled since Ridwan left, a year ago (#39668). This guy is marginally in my way, and, worse, it's dead code. Unless someone promises to finish that PR imminently :). Release note: None Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

andreimatei · 2020-07-09T20:59:47Z

#51178 deleted the indexSkipTableReader processor cause it was laying around unused, so this PR needs to revert the deletion when the time comes.

cockroach-teamcity · 2020-11-17T20:28:15Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

Ridwan Sharif seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None

110945: sql: plumb locking durability into Get/Scan/ReverseScan requests r=arulajmani,nvanbenschoten,yuzefovich a=michae2 **execinfrapb: remove IndexSkipTableReaderSpec** The `indexSkipTableReader` was deleted in #51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off #39668 they can bring it back. Release note: None **sql: plumb locking durability into Get/Scan/ReverseScan requests** Building on top of #110201, plumb locking durability from the optimizer to creation of Get/Scan/ReverseScan requests in txnKVFetcher and txnKVStreamer. Also add a version gate for usage of guaranteed-durable locking. Fixes: #100194 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>

The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off cockroachdb#39668 they can bring it back. Release note: None

ridwanmsharif force-pushed the index-skip-scan branch 22 times, most recently from 140aa1f to 68ce923 Compare August 18, 2019 22:36

ridwanmsharif requested a review from justinj August 19, 2019 01:35

ridwanmsharif marked this pull request as ready for review August 19, 2019 01:35

ridwanmsharif requested a review from a team as a code owner August 19, 2019 01:35

ridwanmsharif force-pushed the index-skip-scan branch 4 times, most recently from b378d23 to 8cdcbd5 Compare August 19, 2019 03:57

RaduBerinde reviewed Aug 23, 2019

View reviewed changes

ridwanmsharif commented Aug 23, 2019

View reviewed changes

ridwanmsharif force-pushed the index-skip-scan branch from ba76073 to eb6cf14 Compare August 23, 2019 16:25

RaduBerinde reviewed Aug 23, 2019

View reviewed changes

Ridwan Sharif added 4 commits August 23, 2019 15:40

sql: allow index-skip-scans to specify the prefix length

4bbf8f1

This change adds functionality to support index-skip-scans when only a proper subset of the output columns are part of the columns we do an index-skip scan over. Release note: None

opt: plan index skip scans on secondary indexes through an index join

a83b106

This change adds an exploration rule that allows index skip scans to be used when we have an index join underneath a `DistinctOn`. Release note: None

ridwanmsharif force-pushed the index-skip-scan branch from eb6cf14 to 66e78b5 Compare August 23, 2019 19:40

ridwanmsharif commented Aug 23, 2019

View reviewed changes

jordanlewis mentioned this pull request Dec 5, 2019

performance: aggregate seems to not fully take advantage of index #42728

Closed

RaduBerinde mentioned this pull request Apr 18, 2020

opt,distsql: index-skip-distinct #37725

Open

This was referenced Jul 8, 2020

sql/rowexec: delete indexSkipTableReader #51178

Merged

sql: loose index scan #24584

Open

tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021

michae2 mentioned this pull request Sep 20, 2023

sql: plumb locking durability into Get/Scan/ReverseScan requests #110945

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql, opt: implements index skip scan #39668

sql, opt: implements index skip scan #39668

ridwanmsharif commented Aug 14, 2019

cockroach-teamcity commented Aug 14, 2019

RaduBerinde left a comment

ridwanmsharif left a comment

RaduBerinde left a comment

ridwanmsharif left a comment

andreimatei commented Jul 9, 2020

cockroach-teamcity commented Nov 17, 2020

sql, opt: implements index skip scan #39668

Are you sure you want to change the base?

sql, opt: implements index skip scan #39668

Conversation

ridwanmsharif commented Aug 14, 2019

cockroach-teamcity commented Aug 14, 2019

RaduBerinde left a comment

Choose a reason for hiding this comment

ridwanmsharif left a comment

Choose a reason for hiding this comment

RaduBerinde left a comment

Choose a reason for hiding this comment

ridwanmsharif left a comment

Choose a reason for hiding this comment

andreimatei commented Jul 9, 2020

cockroach-teamcity commented Nov 17, 2020