opt: do not create column statistics for unfiltered columns #121311
Labels
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-sql-queries
SQL Queries Team
When estimating row counts, the statistics builder builds column statistics for all columns in base relations that have collected stats, even if they are never used to estimate row counts in any parent expressions. This can cause excessive and unnecessary heap allocations, especially when base relations have many columns. Large column IDs cause allocations when creating the
opt.ColSet
here:cockroach/pkg/sql/opt/memo/statistics_builder.go
Lines 675 to 679 in 579d42e
And when adding to the
props.ColStatsMap
here:cockroach/pkg/sql/opt/memo/statistics_builder.go
Line 716 in 579d42e
I've seen these two account for 10% of allocated objects and 8.5% of allocated space in some workloads.
I believe we can eliminate most of these allocations by only building column statistics for columns that are filtered in parent expressions. This will be a bit tricky to do while also caching table statistics in base relations, as is currently done:
cockroach/pkg/sql/opt/memo/statistics_builder.go
Lines 610 to 615 in 579d42e
More discussion can be found in https://cockroachlabs.slack.com/archives/G01R3EYPQKA/p1711577752592149
Jira issue: CRDB-37179
The text was updated successfully, but these errors were encountered: