-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix selectivity estimates for index constraints #31937
Conversation
LGTM! Thanks for the quick fix! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I suppose I should wait for two reviews since this will be backported?
Reviewable status: complete! 0 of 0 LGTMs obtained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got some concerns, but think we should proceed with the fix (and continue to ponder how we can address concerns longer-term).
|
||
var cols opt.ColSet | ||
for i := 0; i < scan.Constraint.Columns.Count(); i++ { | ||
for i := 0; i < scan.Constraint.ConstrainedColumns(sb.evalCtx); i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use the for i, n := 0, scan.Constraint.ConstrainedColumns(sb.evalCtx)
pattern. Otherwise, you're unnecessarily recomputing the constrained column count on each iteration.
@@ -2102,7 +2106,7 @@ func (sb *statisticsBuilder) selectivityFromDistinctCounts( | |||
oldDistinct := inputStat.DistinctCount | |||
|
|||
if oldDistinct != 0 && newDistinct < oldDistinct { | |||
selectivity *= newDistinct / oldDistinct | |||
selectivity *= min(newDistinct/oldDistinct, unknownFilterSelectivity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This min
check will probably end up biting us later on, when we have more accurate distinct counts, and yet can never compute a selectivity > 1/3. I see why you have it here, and think it's right way to go to solve the short-term problem. But longer-term, it could be one of many small changes that seem harmless in isolation, but combine together to make the code fragile and filled with special cases.
Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes cockroachdb#31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained
pkg/sql/opt/memo/statistics_builder.go, line 460 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
You should use the
for i, n := 0, scan.Constraint.ConstrainedColumns(sb.evalCtx)
pattern. Otherwise, you're unnecessarily recomputing the constrained column count on each iteration.
Done.
pkg/sql/opt/memo/statistics_builder.go, line 2109 at r1 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
This
min
check will probably end up biting us later on, when we have more accurate distinct counts, and yet can never compute a selectivity > 1/3. I see why you have it here, and think it's right way to go to solve the short-term problem. But longer-term, it could be one of many small changes that seem harmless in isolation, but combine together to make the code fragile and filled with special cases.
Based on our offline discussion, I've removed this change. It's not necessary for this particular fix, and it's not clear whether or not it's actually beneficial (as you pointed out).
31937: opt: fix selectivity estimates for index constraints r=rytaft a=rytaft Unlike the constraints found in Select and Join filters, an index constraint may represent multiple conjuncts. Therefore, the selectivity estimate for a Scan should account for the selectivity of each constrained column in the index constraint. This commit fixes the selectivity estimation in the optimizer to properly account for each constrained column in a Scan. Fixes #31929 Release note (bug fix): In some cases the optimizer was choosing the wrong index for a scan because of incorrect selectivity estimation. This estimation error has been fixed. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
Build succeeded |
Unlike the constraints found in Select and Join filters, an index
constraint may represent multiple conjuncts. Therefore, the selectivity
estimate for a Scan should account for the selectivity of each
constrained column in the index constraint. This commit fixes the
selectivity estimation in the optimizer to properly account for
each constrained column in a Scan.
Fixes #31929
Release note (bug fix): In some cases the optimizer was choosing
the wrong index for a scan because of incorrect selectivity
estimation. This estimation error has been fixed.