New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.2.4-rc: opt/memo: use virtual column stats in statistics builder #121329
release-23.2.4-rc: opt/memo: use virtual column stats in statistics builder #121329
Commits on Mar 29, 2024
-
opt: avoid ColSet allocations in statisticsBuilder.colStatJoin
This commit reduces allocation in `statisticsBuilder.colStatJoin`. Previously, it was creating intersections of two sets, which, in some cases, were only useful for checking their emptiness. Now we use the `ColSet.Intersects` method which returns a boolean and does not build a new set. Release note: None
Configuration menu - View commit details
-
Copy full SHA for bd9d472 - Browse repository at this point
Copy the full SHA bd9d472View commit details -
opt: add props.Statistics.VirtualCols
As of cockroachdb#118241 we now collect table statistics on virtual computed columns, but do not yet use them in statistics builder. The difficulty with using these stats in statistics builder is that virtual computed columns are synthesized by various non-Scan expressions (Project, Select, etc). When calculating stats for these non-Scan expressions, we need to find the virtual column stats even though the virtual columns are not produced by the input to these expressions. To solve this, we add a VirtualCols set to props.Statistics which holds all of the virtual columns that could be produced by the input to a group. Expressions that could synthesize virtual columns will look in this set to discover whether there are statistics for any of the scalar expressions they render. If there are, they will call colStatXXX using the virtual column ID as if the virtual column had originated from their input. This commit adds VirtualCols but does not yet use it. Note that we cannot currently pass VirtualCols up through set operations or with-scans, due to the column ID translation they use. Informs: cockroachdb#68254 Epic: CRDB-8949 Release note: None
Configuration menu - View commit details
-
Copy full SHA for ad152de - Browse repository at this point
Copy the full SHA ad152deView commit details -
sql: add optimizer_use_virtual_computed_column_stats session variable
Informs: cockroachdb#68254 Epic: CRDB-8949 Release note (sql): Add new session variable `optimizer_use_virtual_computed_column_stats`. When this variable is enabled, the optimizer will make use of table statistics on virtual computed columns.
Configuration menu - View commit details
-
Copy full SHA for 704f13c - Browse repository at this point
Copy the full SHA 704f13cView commit details -
opt/memo: extend OutputCols with VirtualCols in statistics builder
Throughout statistics builder we use OutputCols to determine which columns come from the input to an expression. We then typically call colStatXXX with those columns as part of statistics calculation. In order to use statistics on virtual computed columns, we need to call colStatXXX on any virtual columns that could come from our input, even if they are not passed upward through OutputCols. To do this we extend OutputCols with the VirtualCols set we built in a previous commit. This commit replaces almost all usages of OutputCols in statistics builder with a call to helper function colStatCols, which returns a union of OutputCols and VirtualCols. This is enough to get the optimizer to use statistics on virtual computed columns in some simple plans. More complex plans will require matching the virtual column scalar expressions, which will be in the next PR. I've left some TODOs marking spots where this next PR will touch. Informs: cockroachdb#68254 Epic: CRDB-8949 Release note: None
Configuration menu - View commit details
-
Copy full SHA for 037c287 - Browse repository at this point
Copy the full SHA 037c287View commit details -
opt/memo: use virtual column stats for matching scalar expressions
As of cockroachdb#120668 we now use statistics on virtual computed columns in statistics builder. Simple queries that synthesize virtual columns in Project expressions already benefit, because they use the virtual column ID when synthesizing the virtual column. Other expressions, however, do not directly use the virtual column ID when synthesizing a virtual column. This includes Select expressions, joins, constrained scans, and some Project expressions. For example, consider a query like the following: ``` CREATE TABLE ab (a INT PRIMARY KEY, b INT AS (a % 10) VIRTUAL, INDEX (b)); SELECT * FROM ab WHERE a % 10 > 3; ``` Even though the filter condition is in terms of `a`, we'd like to use the statistics on virtual computed column `b` since the expression matches. In order to do this, we replace `a % 10` with `b` in a copy of the filter condition before doing any stats calculations. Then we perform our normal stats calculations, using `b`. Fixes: cockroachdb#68254 Fixes: cockroachdb#110146 Epic: CRDB-8949 Release note: None
Configuration menu - View commit details
-
Copy full SHA for 04643a3 - Browse repository at this point
Copy the full SHA 04643a3View commit details -
opt/memo: fix optimizer_use_virtual_computed_column_stats
With optimizer_use_virtual_computed_column_stats set to false, constrained scans were still sometimes using stats on virtual computed columns. This commit adds a check to makeTableStatistics which prevents creation of any statistics referencing a virtual computed column, which is a stronger check than existed before. With this check, the VirtualCols sets will always be empty when optimizer_use_virtual_computed_column_stats is false. Informs: cockroachdb#68254 Epic: CRDB-8949 Release note: None
Configuration menu - View commit details
-
Copy full SHA for bbfee8f - Browse repository at this point
Copy the full SHA bbfee8fView commit details