Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: statement bundles with expression indexes should have reproducible stats #68184

Open
mgartner opened this issue Jul 28, 2021 · 2 comments
Open
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@mgartner
Copy link
Collaborator

mgartner commented Jul 28, 2021

Expression indexes are implemented using inaccessible virtual computed columns. These columns are not shown shown in the schema.sql file of a statement bundle. But they are included in the stats-*.sql files.

When the schema.sql file is executed to try to reproduce a customer issue, the names of these inaccessible columns are dependent on the order of their associated indexes. This means that the names of the columns can be different from what they were in the customer's schema. This can create cases where the inaccessible columns' statistics refer to the wrong columns.

For example, suppose a customer created the following table and collected a statement bundle:

SET CLUSTER SETTING sql.stats.histogram_collection.enabled = true;
SET experimental_enable_expression_indexes = true;

CREATE TABLE t (a INT, INDEX i1 ((a + 10)), INDEX i2 ((a > 500)));

DROP INDEX i1;

CREATE INDEX i3 ON t ((a + 100));

INSERT INTO t SELECT generate_series(1, 1000);

ANALYZE t;

EXPLAIN ANALYZE (DEBUG) SELECT * FROM t;

The statement bundle would include a schema.sql like this:

CREATE TABLE t (
	a INT8 NULL,
	rowid INT8 NOT VISIBLE NOT NULL DEFAULT unique_rowid(),
	CONSTRAINT "primary" PRIMARY KEY (rowid ASC),
	INDEX i2 ((a > 500:::INT8) ASC),
	INDEX i3 ((a + 100:::INT8) ASC),
	FAMILY "primary" (a, rowid)
);

And a statistics file like this (I've abbreviated it here):

ALTER TABLE t INJECT STATISTICS '[
    {
        "columns": [
            "crdb_internal_idx_expr_1"
        ],
        "created_at": "2021-07-28 15:57:20.789126",
        "distinct_count": 1,
        "histo_col_type": "BOOL",
        "null_count": 1000,
        "row_count": 1000
    },
    {
        "columns": [
            "crdb_internal_idx_expr"
        ],
        "created_at": "2021-07-28 15:57:20.789126",
        "distinct_count": 1,
        "histo_col_type": "INT8",
        "null_count": 1000,
        "row_count": 1000
    }
]';

Note that the statistics list the type of crdb_internal_idx_expr_1 as BOOL. However, when you execute schema.sql, the crdb_internal_idx_expr_1 column is the virtual column backing i3's a + 100:::INT8 expression:

> SELECT * FROM (
  SELECT json_array_elements(
    crdb_internal.pb_to_json('cockroach.sql.sqlbase.Descriptor', descriptor, false)->'table'->'columns'
  ) AS desc FROM system.descriptor WHERE id = 't'::REGCLASS
) AS cols WHERE cols.desc->'name' = '"crdb_internal_idx_expr_1"';
                                                                                                desc
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  {"computeExpr": "a + 100:::INT8", "id": 4, "inaccessible": true, "name": "crdb_internal_idx_expr_1", "nullable": true, "type": {"family": "IntFamily", "oid": 20, "width": 64}, "virtual": true}

This will make debugging statement bundles that have expression indexes hard to debug. It will not be clear when the column names do not line up, and will likely lead to confusion when a customer issue cannot be reproduced with the statement bundle.

One proposed solution: for the statement bundle schema.sql file only, list the inaccessible columns as hidden virtual columns and print these column names as the index elements of expression indexes rather than their expression.

Jira issue: CRDB-8919

Epic CRDB-14510

@mgartner mgartner added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. labels Jul 28, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 28, 2021
@mgartner mgartner self-assigned this Aug 3, 2021
@mgartner
Copy link
Collaborator Author

As of #68312, we no longer collect statistics for inaccessible virtual columns that back expression indexes. If that changes, we'll need to address this issue. For now, I'll move this to the backlog.

@mgartner mgartner removed their assignment Aug 12, 2021
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 29, 2022
Previous, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs cockroachdb#68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns in be inject successfully.
mgartner added a commit to mgartner/cockroach that referenced this issue Mar 29, 2022
Previously, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs cockroachdb#68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns will be injected successfully.
craig bot pushed a commit that referenced this issue Mar 30, 2022
78997: sql: ignore non-existent columns when injecting stats r=mgartner a=mgartner

Previously, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs #68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns will be injected successfully.

Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this issue Mar 30, 2022
Previously, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs #68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns will be injected successfully.
blathers-crl bot pushed a commit that referenced this issue Mar 30, 2022
Previously, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs #68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns will be injected successfully.
blathers-crl bot pushed a commit that referenced this issue Mar 30, 2022
Previously, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs #68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns will be injected successfully.
fqazi pushed a commit to fqazi/cockroach that referenced this issue Apr 4, 2022
Previously, an `ALTER TABLE ... INJECT STATS` command would return an
error if the given stats JSON included any columns that were not present
in the table descriptor. Statistics in statement bundles often include
dropped columns, so reproducing issues with a bundle required tediously
removing stats for these columns. This commit changes the stats
injection behavior so that a notice is issued for stats with
non-existent columns rather than an error. Any stats for existing
columns will be injected successfully.

Informs cockroachdb#68184

Release note (sql change): `ALTER TABLE ... INJECT STATISTICS ...` will
now issue notices when the given statistics JSON includes non-existent
columns, rather than resulting in an error. Any statistics in the JSON
for existing columns will be injected successfully.
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

1 participant