Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: do not create column statistics for unfiltered columns #121311

Open
mgartner opened this issue Mar 28, 2024 · 0 comments
Open

opt: do not create column statistics for unfiltered columns #121311

mgartner opened this issue Mar 28, 2024 · 0 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Mar 28, 2024

When estimating row counts, the statistics builder builds column statistics for all columns in base relations that have collected stats, even if they are never used to estimate row counts in any parent expressions. This can cause excessive and unnecessary heap allocations, especially when base relations have many columns. Large column IDs cause allocations when creating the opt.ColSet here:

var cols opt.ColSet
var colOrd int
for i := 0; i < stat.ColumnCount(); i++ {
colOrd = stat.ColumnOrdinal(i)
col := tabID.ColumnID(colOrd)

And when adding to the props.ColStatsMap here:

colStat, ok := stats.ColStats.Add(cols)

I've seen these two account for 10% of allocated objects and 8.5% of allocated space in some workloads.

I believe we can eliminate most of these allocations by only building column statistics for columns that are filtered in parent expressions. This will be a bit tricky to do while also caching table statistics in base relations, as is currently done:

func (sb *statisticsBuilder) makeTableStatistics(tabID opt.TableID) *props.Statistics {
stats, ok := sb.md.TableAnnotation(tabID, statsAnnID).(*props.Statistics)
if ok {
// Already made.
return stats
}

More discussion can be found in https://cockroachlabs.slack.com/archives/G01R3EYPQKA/p1711577752592149

Jira issue: CRDB-37179

@mgartner mgartner added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team labels Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Status: 24.2 Release
Development

No branches or pull requests

1 participant