Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
opt: calculate more accurate selectivity for disjunctions
The optimal plan for a query with a disjunctive filter is often a DistinctOn+UnionAll of the results of two index scans, when two separate indexes can satisfy either side of the disjunction. For example: SELECT * FROM t WHERE a = 'foo' OR b = 'foo' LIMIT 5 limit ├── distinct-on │ └── union-all │ ├── scan t@a_idx │ │ └── constraint: /a: [/'foo' - /'foo'] │ └── scan t@a_idx │ └── constraint: /b: [/'foo' - /'foo'] └── 5 However, this optimal plan is not always chosen. A `constraint.Set` cannot be built for a disjunction involving multiple columns, so the statistics builder assumes the selectivity of the disjunction to be 1/3, often resulting in an over-estimated row count. When the DistinctOn is built during exploration it is added to the same memo group as the Select that it replaces, so it shares the same row count estimate. Even though the UnionAll's cost is low because it produces a small subset of the table, the DistinctOn's cost is high because the coster is under the assumption that the DistinctOn will produce 1/3 of the rows in the table. The overhead of producing so many rows adds significant overhead to the overall cost, preventing this plan from being chosen by the optimizer. This commit fixes the issue by attempting to build a constraint set for each side of a disjunction in a filter. By unioning the selectivity of each constraint set, a more accurate row count estimate is calculated for the filter. As a result, the cost of the DistinctOn is more accurate and the optimal plan is chosen. This fix is only enabled if the cluster setting is enabled: `sql.defaults.optimizer_improve_disjunction_selectivity.enabled`. Informs #58744 Release note (performance improvement): A new cluster setting `sql.defaults.optimizer_improve_disjunction_selectivity.enabled` enables more accurate selectivity estimation of query filters with OR expressions. This improves query plans in some cases. The cluster setting is disabled by default.
- Loading branch information
Showing
5 changed files
with
330 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.