Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.2.4-rc: opt/memo: use virtual column stats in statistics builder #121329

Commits on Mar 29, 2024

  1. opt: avoid ColSet allocations in statisticsBuilder.colStatJoin

    This commit reduces allocation in `statisticsBuilder.colStatJoin`.
    Previously, it was creating intersections of two sets, which, in some
    cases, were only useful for checking their emptiness. Now we use the
    `ColSet.Intersects` method which returns a boolean and does not
    build a new set.
    
    Release note: None
    mgartner authored and michae2 committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    bd9d472 View commit details
    Browse the repository at this point in the history
  2. opt: add props.Statistics.VirtualCols

    As of cockroachdb#118241 we now collect table statistics on virtual computed
    columns, but do not yet use them in statistics builder. The difficulty
    with using these stats in statistics builder is that virtual computed
    columns are synthesized by various non-Scan expressions (Project,
    Select, etc). When calculating stats for these non-Scan expressions, we
    need to find the virtual column stats even though the virtual columns
    are not produced by the input to these expressions.
    
    To solve this, we add a VirtualCols set to props.Statistics which holds
    all of the virtual columns that could be produced by the input to a
    group. Expressions that could synthesize virtual columns will look in
    this set to discover whether there are statistics for any of the scalar
    expressions they render. If there are, they will call colStatXXX using
    the virtual column ID as if the virtual column had originated from
    their input.
    
    This commit adds VirtualCols but does not yet use it.
    
    Note that we cannot currently pass VirtualCols up through set operations
    or with-scans, due to the column ID translation they use.
    
    Informs: cockroachdb#68254
    
    Epic: CRDB-8949
    
    Release note: None
    michae2 committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    ad152de View commit details
    Browse the repository at this point in the history
  3. sql: add optimizer_use_virtual_computed_column_stats session variable

    Informs: cockroachdb#68254
    
    Epic: CRDB-8949
    
    Release note (sql): Add new session variable
    `optimizer_use_virtual_computed_column_stats`. When this variable is
    enabled, the optimizer will make use of table statistics on virtual
    computed columns.
    michae2 committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    704f13c View commit details
    Browse the repository at this point in the history
  4. opt/memo: extend OutputCols with VirtualCols in statistics builder

    Throughout statistics builder we use OutputCols to determine which
    columns come from the input to an expression. We then typically call
    colStatXXX with those columns as part of statistics calculation.
    
    In order to use statistics on virtual computed columns, we need to call
    colStatXXX on any virtual columns that could come from our input, even
    if they are not passed upward through OutputCols. To do this we extend
    OutputCols with the VirtualCols set we built in a previous commit. This
    commit replaces almost all usages of OutputCols in statistics builder
    with a call to helper function colStatCols, which returns a union of
    OutputCols and VirtualCols.
    
    This is enough to get the optimizer to use statistics on virtual
    computed columns in some simple plans. More complex plans will require
    matching the virtual column scalar expressions, which will be in the
    next PR. I've left some TODOs marking spots where this next PR will
    touch.
    
    Informs: cockroachdb#68254
    
    Epic: CRDB-8949
    
    Release note: None
    michae2 committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    037c287 View commit details
    Browse the repository at this point in the history
  5. opt/memo: use virtual column stats for matching scalar expressions

    As of cockroachdb#120668 we now use statistics on virtual computed columns in
    statistics builder. Simple queries that synthesize virtual columns in
    Project expressions already benefit, because they use the virtual column
    ID when synthesizing the virtual column.
    
    Other expressions, however, do not directly use the virtual column ID
    when synthesizing a virtual column. This includes Select expressions,
    joins, constrained scans, and some Project expressions.
    
    For example, consider a query like the following:
    
    ```
    CREATE TABLE ab (a INT PRIMARY KEY, b INT AS (a % 10) VIRTUAL, INDEX (b));
    SELECT * FROM ab WHERE a % 10 > 3;
    ```
    
    Even though the filter condition is in terms of `a`, we'd like to use
    the statistics on virtual computed column `b` since the expression
    matches.
    
    In order to do this, we replace `a % 10` with `b` in a copy of the
    filter condition before doing any stats calculations. Then we perform
    our normal stats calculations, using `b`.
    
    Fixes: cockroachdb#68254
    Fixes: cockroachdb#110146
    
    Epic: CRDB-8949
    
    Release note: None
    michae2 committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    04643a3 View commit details
    Browse the repository at this point in the history
  6. opt/memo: fix optimizer_use_virtual_computed_column_stats

    With optimizer_use_virtual_computed_column_stats set to false,
    constrained scans were still sometimes using stats on virtual computed
    columns. This commit adds a check to makeTableStatistics which prevents
    creation of any statistics referencing a virtual computed column, which
    is a stronger check than existed before.
    
    With this check, the VirtualCols sets will always be empty when
    optimizer_use_virtual_computed_column_stats is false.
    
    Informs: cockroachdb#68254
    
    Epic: CRDB-8949
    
    Release note: None
    michae2 committed Mar 29, 2024
    Configuration menu
    Copy the full SHA
    bbfee8f View commit details
    Browse the repository at this point in the history