opt: fix selectivity estimates for index constraints #31937

rytaft · 2018-10-26T20:16:20Z

Unlike the constraints found in Select and Join filters, an index
constraint may represent multiple conjuncts. Therefore, the selectivity
estimate for a Scan should account for the selectivity of each
constrained column in the index constraint. This commit fixes the
selectivity estimation in the optimizer to properly account for
each constrained column in a Scan.

Fixes #31929

Release note (bug fix): In some cases the optimizer was choosing
the wrong index for a scan because of incorrect selectivity
estimation. This estimation error has been fixed.

cockroach-teamcity · 2018-10-26T20:16:26Z

This change is

RaduBerinde · 2018-10-26T23:10:13Z

LGTM! Thanks for the quick fix!

rytaft

TFTR! I suppose I should wait for two reviews since this will be backported?

Reviewable status: complete! 0 of 0 LGTMs obtained

andy-kimball

I've got some concerns, but think we should proceed with the fix (and continue to ponder how we can address concerns longer-term).

andy-kimball · 2018-10-30T14:48:18Z

pkg/sql/opt/memo/statistics_builder.go


 		var cols opt.ColSet
-		for i := 0; i < scan.Constraint.Columns.Count(); i++ {
+		for i := 0; i < scan.Constraint.ConstrainedColumns(sb.evalCtx); i++ {


You should use the for i, n := 0, scan.Constraint.ConstrainedColumns(sb.evalCtx) pattern. Otherwise, you're unnecessarily recomputing the constrained column count on each iteration.

andy-kimball · 2018-10-30T14:54:37Z

pkg/sql/opt/memo/statistics_builder.go

@@ -2102,7 +2106,7 @@ func (sb *statisticsBuilder) selectivityFromDistinctCounts(
 		oldDistinct := inputStat.DistinctCount

 		if oldDistinct != 0 && newDistinct < oldDistinct {
-			selectivity *= newDistinct / oldDistinct
+			selectivity *= min(newDistinct/oldDistinct, unknownFilterSelectivity)


This min check will probably end up biting us later on, when we have more accurate distinct counts, and yet can never compute a selectivity > 1/3. I see why you have it here, and think it's right way to go to solve the short-term problem. But longer-term, it could be one of many small changes that seem harmless in isolation, but combine together to make the code fragile and filled with special cases.

Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes cockroachdb#31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed.

rytaft

TFTR!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/sql/opt/memo/statistics_builder.go, line 460 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

You should use the for i, n := 0, scan.Constraint.ConstrainedColumns(sb.evalCtx) pattern. Otherwise, you're unnecessarily recomputing the constrained column count on each iteration.

Done.

pkg/sql/opt/memo/statistics_builder.go, line 2109 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This min check will probably end up biting us later on, when we have more accurate distinct counts, and yet can never compute a selectivity > 1/3. I see why you have it here, and think it's right way to go to solve the short-term problem. But longer-term, it could be one of many small changes that seem harmless in isolation, but combine together to make the code fragile and filled with special cases.

Based on our offline discussion, I've removed this change. It's not necessary for this particular fix, and it's not clear whether or not it's actually beneficial (as you pointed out).

31937: opt: fix selectivity estimates for index constraints r=rytaft a=rytaft Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes #31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

craig · 2018-10-30T16:17:23Z

Build succeeded

GitHub CI (Cockroach)

rytaft requested review from RaduBerinde and andy-kimball October 26, 2018 20:16

rytaft requested a review from a team as a code owner October 26, 2018 20:16

rytaft force-pushed the constraint branch from 42f4b4a to 841498f Compare October 26, 2018 20:56

RaduBerinde added the backport-2.1.x label Oct 26, 2018

rytaft commented Oct 27, 2018

View reviewed changes

andy-kimball approved these changes Oct 30, 2018

View reviewed changes

rytaft force-pushed the constraint branch from 841498f to 8af58bb Compare October 30, 2018 15:55

rytaft commented Oct 30, 2018

View reviewed changes

craig bot merged commit 8af58bb into cockroachdb:master Oct 30, 2018

rytaft mentioned this pull request Oct 30, 2018

release-2.1: opt: fix selectivity estimates for index constraints #32011

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: fix selectivity estimates for index constraints #31937

opt: fix selectivity estimates for index constraints #31937

rytaft commented Oct 26, 2018

cockroach-teamcity commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

rytaft left a comment

andy-kimball left a comment

andy-kimball Oct 30, 2018

andy-kimball Oct 30, 2018

rytaft left a comment

craig bot commented Oct 30, 2018

opt: fix selectivity estimates for index constraints #31937

opt: fix selectivity estimates for index constraints #31937

Conversation

rytaft commented Oct 26, 2018

cockroach-teamcity commented Oct 26, 2018

RaduBerinde commented Oct 26, 2018

rytaft left a comment

Choose a reason for hiding this comment

andy-kimball left a comment

Choose a reason for hiding this comment

andy-kimball Oct 30, 2018

Choose a reason for hiding this comment

andy-kimball Oct 30, 2018

Choose a reason for hiding this comment

rytaft left a comment

Choose a reason for hiding this comment

craig bot commented Oct 30, 2018

Build succeeded