Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fix selectivity estimates for index constraints #31937

Merged
merged 1 commit into from
Oct 30, 2018

Conversation

rytaft
Copy link
Collaborator

@rytaft rytaft commented Oct 26, 2018

Unlike the constraints found in Select and Join filters, an index
constraint may represent multiple conjuncts. Therefore, the selectivity
estimate for a Scan should account for the selectivity of each
constrained column in the index constraint. This commit fixes the
selectivity estimation in the optimizer to properly account for
each constrained column in a Scan.

Fixes #31929

Release note (bug fix): In some cases the optimizer was choosing
the wrong index for a scan because of incorrect selectivity
estimation. This estimation error has been fixed.

@rytaft rytaft requested a review from a team as a code owner October 26, 2018 20:16
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde
Copy link
Member

LGTM! Thanks for the quick fix!

Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I suppose I should wait for two reviews since this will be backported?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got some concerns, but think we should proceed with the fix (and continue to ponder how we can address concerns longer-term).


var cols opt.ColSet
for i := 0; i < scan.Constraint.Columns.Count(); i++ {
for i := 0; i < scan.Constraint.ConstrainedColumns(sb.evalCtx); i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the for i, n := 0, scan.Constraint.ConstrainedColumns(sb.evalCtx) pattern. Otherwise, you're unnecessarily recomputing the constrained column count on each iteration.

@@ -2102,7 +2106,7 @@ func (sb *statisticsBuilder) selectivityFromDistinctCounts(
oldDistinct := inputStat.DistinctCount

if oldDistinct != 0 && newDistinct < oldDistinct {
selectivity *= newDistinct / oldDistinct
selectivity *= min(newDistinct/oldDistinct, unknownFilterSelectivity)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This min check will probably end up biting us later on, when we have more accurate distinct counts, and yet can never compute a selectivity > 1/3. I see why you have it here, and think it's right way to go to solve the short-term problem. But longer-term, it could be one of many small changes that seem harmless in isolation, but combine together to make the code fragile and filled with special cases.

Unlike the constraints found in Select and Join filters, an index
constraint may represent multiple conjuncts. Therefore, the selectivity
estimate for a Scan should account for the selectivity of each
constrained column in the index constraint. This commit fixes the
selectivity estimation in the optimizer to properly account for
each constrained column in a Scan.

Fixes cockroachdb#31929

Release note (bug fix): In some cases the optimizer was choosing
the wrong index for a scan because of incorrect selectivity
estimation. This estimation error has been fixed.
Copy link
Collaborator Author

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/opt/memo/statistics_builder.go, line 460 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

You should use the for i, n := 0, scan.Constraint.ConstrainedColumns(sb.evalCtx) pattern. Otherwise, you're unnecessarily recomputing the constrained column count on each iteration.

Done.


pkg/sql/opt/memo/statistics_builder.go, line 2109 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This min check will probably end up biting us later on, when we have more accurate distinct counts, and yet can never compute a selectivity > 1/3. I see why you have it here, and think it's right way to go to solve the short-term problem. But longer-term, it could be one of many small changes that seem harmless in isolation, but combine together to make the code fragile and filled with special cases.

Based on our offline discussion, I've removed this change. It's not necessary for this particular fix, and it's not clear whether or not it's actually beneficial (as you pointed out).

craig bot pushed a commit that referenced this pull request Oct 30, 2018
31937: opt: fix selectivity estimates for index constraints r=rytaft a=rytaft

Unlike the constraints found in Select and Join filters, an index
constraint may represent multiple conjuncts. Therefore, the selectivity
estimate for a Scan should account for the selectivity of each
constrained column in the index constraint. This commit fixes the
selectivity estimation in the optimizer to properly account for
each constrained column in a Scan.

Fixes #31929

Release note (bug fix): In some cases the optimizer was choosing
the wrong index for a scan because of incorrect selectivity
estimation. This estimation error has been fixed.

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Oct 30, 2018

Build succeeded

@craig craig bot merged commit 8af58bb into cockroachdb:master Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: incorrect index chosen when using LIKE
4 participants