Skip to content

Better cardinality estimation for queries with many predicates (especially those not matched to any indices)#8940

Merged
dyemanov merged 3 commits intomasterfrom
work/optimizer-selectivity-backoff
Mar 12, 2026
Merged

Better cardinality estimation for queries with many predicates (especially those not matched to any indices)#8940
dyemanov merged 3 commits intomasterfrom
work/optimizer-selectivity-backoff

Conversation

@dyemanov
Copy link
Copy Markdown
Member

@dyemanov dyemanov commented Mar 11, 2026

  • Refactor selectivity estimations
  • Apply exponential backoff to the selectivity combined from multiple booleans

In real-world SQL queries, conjuncts (ANDed booleans) are often inter-dependent and simple multiplication of selectivities (that we used so far) results to a very low final selectivity value, thus causing the stream cardinality being under-estimated. To avoid this, apply exponential backoff adjustment:

sel = sel1 * sqrt(sel2) * sqrt(sqrt(sel3)) * ... where sel1 is the least (best) selectivity and selN is the biggest (worst) one

I don't pretend this is the best appoach possible, but it seems working fine for MSSQL, so it's worth trying (actually, it was already tested in production -- with quite good results so far).

@dyemanov dyemanov merged commit 4d482a2 into master Mar 12, 2026
46 checks passed
@dyemanov dyemanov self-assigned this Mar 12, 2026
@dyemanov dyemanov deleted the work/optimizer-selectivity-backoff branch March 12, 2026 07:28
@mrotteveel
Copy link
Copy Markdown
Member

@dyemanov For clarification, where you say boolean, do you mean predicate? Or are you literally talking about columns of type BOOLEAN?

@dyemanov
Copy link
Copy Markdown
Member Author

Of course, I meant predicates. Inside the code, you may find conjunct or boolean meaning a predicate.

@mrotteveel mrotteveel changed the title Better cardinality estimation for queries with many booleans (especially those not matched to any indices) Better cardinality estimation for queries with many predicates (especially those not matched to any indices) Mar 15, 2026
@mrotteveel
Copy link
Copy Markdown
Member

@dyemanov Another question, the title talks about cardinality, while the body talks about selectivity. Although those are somewhat related, they are different things. Looking at the code, I think this is actually about selectivity, right?

@mrotteveel
Copy link
Copy Markdown
Member

And the mention of "low selectivity" vs "cardinality under-estimated" throws me off, because I assume a low selectivity implies an over-estimated cardinality, not an under-estimated cardinality.

@dyemanov
Copy link
Copy Markdown
Member Author

@dyemanov Another question, the title talks about cardinality, while the body talks about selectivity. Although those are somewhat related, they are different things. Looking at the code, I think this is actually about selectivity, right?

Internally, fix is about selectivity. But from the user POV, only the cardinality difference is visible. I.e. changed selectivity affects cardinality, changed cardinality affects cost.

And the mention of "low selectivity" vs "cardinality under-estimated" throws me off, because I assume a low selectivity implies an over-estimated cardinality, not an under-estimated cardinality.

Lower selectivity means smaller estimated cardinality, thus under-estimation.

0.1 * 1000 = 100
0.001 (lower selectivity) * 1000 = 1

I think the confusion is about what selective means. Very selective means more unique values and in turn implies lower selectivity value which is calculated as 1 / <number of unique values>.

@mrotteveel
Copy link
Copy Markdown
Member

OK, clear. Now I need to think how to phrase that, because in English, low selectivity is the opposite of what in Firebird code is a "low selectivity value", so the release notes need to be understandable with the general meaning in mind, and the meaning in the context of Firebird code for those that think in its internals.

@mrotteveel mrotteveel added the rlsnotes60: yes Already added to the Firebird 6.0 release notes. (Do not add this to signal it should be added.) label Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: engine fix-version: 6.0 Alpha 1 rlsnotes60: yes Already added to the Firebird 6.0 release notes. (Do not add this to signal it should be added.) type: improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants