Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql, opt: implements index skip scan #39668

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ridwanmsharif
Copy link

This change plans index-skip-scan in the optimizer.
It adds exploration rules to a scan but having them use the
loose index table reader instead of the regular table reader.
This will allow the optimizer to create a far better plan (an
order of magnitude better) for some Distinct queries by
leveraging the index the scan is on. This will also make it
possible to plan index-skip-scans more often when the
prefix of an index is known to have a small number of distinct
values.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ridwanmsharif ridwanmsharif force-pushed the index-skip-scan branch 22 times, most recently from 140aa1f to 68ce923 Compare August 18, 2019 22:36
@ridwanmsharif ridwanmsharif marked this pull request as ready for review August 19, 2019 01:35
@ridwanmsharif ridwanmsharif requested a review from a team as a code owner August 19, 2019 01:35
@ridwanmsharif ridwanmsharif force-pushed the index-skip-scan branch 4 times, most recently from b378d23 to 8cdcbd5 Compare August 19, 2019 03:57
Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that from the optimizer side, the operator should be semantically equivalent to a DistinctOn -> Scan complex, where:

  • the distinct columns are a prefix of the index
  • the ordering inside Distinct refers only to distinct columns
  • the constraints in the scan only cover the distinct columns

Some of these could be relaxed a bit in the future if necessary, but with non-trivial work.

The distsql physical planner should be in charge of adding distinct processors as needed to guarantee these semantics in distributed cases.

Separately, we should fix the restriction of indexJoin input nodes, now that the local execution code is gone it should be doable. I'm surprised we got this far without fixing that.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm those are good points. Right now the IndexSkipScan is equivalent to a DistinctOn - > Scan (we add the DistinctOn when building the exec plan). Also yes, the distinct on has to be on the prefix of the index currently.

  • the ordering inside Distinct refers only to distinct columns
  • the constraints in the scan only cover the distinct columns

The PR doesn't restrict either of these to just the distinct columns actually. Why do we need these restrictions exactly?
There are tests I added that tests IndexSkipScans work sanely even when the ordering is on columns that are part of the index but not part of the distinct on. Same thing with the constraints, why do you think the restriction on the distinct cols is necessary?

The distsql physical planner should be in charge of adding distinct processors as needed to guarantee these semantics in distributed cases.

I agree with this. Until this is true though, the DistinctOn added in the exec build phase is a workaround I'm okay with.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR doesn't restrict either of these to just the distinct columns actually. Why do we need these restrictions exactly?
There are tests I added that tests IndexSkipScans work sanely even when the ordering is on columns that are part of the index but not part of the distinct on. Same thing with the constraints, why do you think the restriction on the distinct cols is necessary?

They're not strictly speaking necessary (I did mention that we could relax them), they're just hard to think about and get right IMO. If we have an ordering on other columns, we have to set up the distinct processors with ordered input synchronizers to select the correct row. There may be cases where we're not even selecting those other columns, but we would need to get the values over to the distinct processor.

Constraints may not be a problem, it's more that they normally come from filters which in general can't support on those other columns anyway.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

Ridwan Sharif added 4 commits August 23, 2019 15:40
This change adds functionality to support index-skip-scans
when only a proper subset of the output columns are part
of the columns we do an index-skip scan over.

Release note: None
This change introduces index skip scans. To start with,
we implement index-skip scans when we have a DistinctOn
on a prefix of a scan.
Following this change, we can add more uses of this
when the DistinctOn has an index-join as its input,
when there is no DistinctOn but the prefix of the index
has very few distinct values and there is a constraint
on the columns after those columns (This would need
some more thought but I think it's possible if we
do an index skip scan and a cross product with the other
bounded column values and use this cross product to do a
lookup join with the original table).

Release note: None
This change adds an exploration rule that allows
index skip scans to be used when we have an index
join underneath a `DistinctOn`.

Release note: None
Previously we were planning an index skip scan as a regular scan
with some other properties. This change separates it out into its own
operator. This is good because even though a lot of rules that apply to
scans are applicable here too, some of them are difficult to reason
about and separating it into a separate operator allows us to audit
those rules in isolation before applying them instead of just doing it
now. This also helps when creating new rules in the future.

Another important note is, we need to do some careful work with
IndexSkipScan so the exploration rules works sanely when the data is
distributed over multiple nodes. This is important because when each
node does an `IndexSkipScan` over its own data, the rows returned by
different nodes might conflict and so we need a `DistinctOn` to resolve
them. and We add this DistinctOn node in the exec builder phase so the
guarantees made by the operator always hold.

Finally, this change tries to IndexSkipScan behaves sanely
when used with various different orderings.

Release note: None
Copy link
Author

@ridwanmsharif ridwanmsharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's fair. We could tighten the requirements for this rule a little more than we do in this PR and then loosen it later. For now, I've added support for orderings that aren't part of the DistinctOn columns as well and have tests exercising behavior in those situations.

I'll defer to the team from here! Thanks for letting me work on such cool projects!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @justinj)

andreimatei added a commit to andreimatei/cockroach that referenced this pull request Jul 8, 2020
This processor is not used. It was implemented for cockroachdb#24584, but the PR to
start planning it stalled since Ridwan left, a year ago (cockroachdb#39668).
This guy is marginally in my way, and, worse, it's dead code. Unless
someone promises to finish that PR imminently :).

Release note: None
craig bot pushed a commit that referenced this pull request Jul 9, 2020
51178: sql/rowexec: delete indexSkipTableReader r=andreimatei a=andreimatei

This processor is not used. It was implemented for #24584, but the PR to
start planning it stalled since Ridwan left, a year ago (#39668).
This guy is marginally in my way, and, worse, it's dead code. Unless
someone promises to finish that PR imminently :).

Release note: None

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@andreimatei
Copy link
Contributor

#51178 deleted the indexSkipTableReader processor cause it was laying around unused, so this PR needs to revert the deletion when the time comes.

@cockroach-teamcity
Copy link
Member

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ridwan Sharif seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
michae2 added a commit to michae2/cockroach that referenced this pull request Sep 27, 2023
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec
has stuck around for a few versions. I think we can delete the spec now,
too. If someone wants to dust off cockroachdb#39668 they can bring it back.

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Sep 27, 2023
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec
has stuck around for a few versions. I think we can delete the spec now,
too. If someone wants to dust off cockroachdb#39668 they can bring it back.

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Sep 28, 2023
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec
has stuck around for a few versions. I think we can delete the spec now,
too. If someone wants to dust off cockroachdb#39668 they can bring it back.

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Oct 1, 2023
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec
has stuck around for a few versions. I think we can delete the spec now,
too. If someone wants to dust off cockroachdb#39668 they can bring it back.

Release note: None
craig bot pushed a commit that referenced this pull request Oct 2, 2023
110945: sql: plumb locking durability into Get/Scan/ReverseScan requests r=arulajmani,nvanbenschoten,yuzefovich a=michae2

**execinfrapb: remove IndexSkipTableReaderSpec**

The `indexSkipTableReader` was deleted in #51178 but the distsql spec has stuck around for a few versions. I think we can delete the spec now, too. If someone wants to dust off #39668 they can bring it back.

Release note: None

**sql: plumb locking durability into Get/Scan/ReverseScan requests**

Building on top of #110201, plumb locking durability from the optimizer to creation of Get/Scan/ReverseScan requests in txnKVFetcher and txnKVStreamer.

Also add a version gate for usage of guaranteed-durable locking.

Fixes: #100194

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
THardy98 pushed a commit to THardy98/cockroach that referenced this pull request Oct 6, 2023
The `indexSkipTableReader` was deleted in cockroachdb#51178 but the distsql spec
has stuck around for a few versions. I think we can delete the spec now,
too. If someone wants to dust off cockroachdb#39668 they can bring it back.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants