Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/opt: prove partial index implication with in-exact atoms containing virtual computed columns #122352

Closed
cjireland opened this issue Apr 15, 2024 · 6 comments · Fixed by #123163
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@cjireland
Copy link

cjireland commented Apr 15, 2024

Describe the problem

Please describe the issue you observed, and any steps we can take to reproduce it:

--! The following works as expected
CREATE TABLE test_index_stored (
    a int primary key,
    b jsonb
);

ALTER TABLE test_index_stored ADD COLUMN c STRING AS (b->>'c') STORED;
ALTER TABLE test_index_stored ADD COLUMN d STRING AS (b->>'d') STORED;
ALTER TABLE test_index_stored ADD COLUMN e INT AS ((b->>'e')::INT) STORED;
CREATE INDEX ON test_index_stored(c, e) WHERE c IS NOT NULL AND e IS NOT NULL;

INSERT INTO test_index_stored VALUES (1, '{"c": "foo", "d": "bar", "e": 5}');

SELECT c, e FROM test_index_stored@test_index_stored_c_e_idx WHERE c = 'foo' AND e > 4;  --! works

--! The following fails:
CREATE TABLE test_index_virtual (
    a int primary key,
    b jsonb
);

ALTER TABLE test_index_virtual ADD COLUMN c STRING AS (b->>'c') VIRTUAL;
ALTER TABLE test_index_virtual ADD COLUMN d STRING AS (b->>'d') VIRTUAL;
ALTER TABLE test_index_virtual ADD COLUMN e INT AS ((b->>'e')::INT) VIRTUAL;
CREATE INDEX ON test_index_virtual(c, e) WHERE c IS NOT NULL AND e IS NOT NULL;

INSERT INTO test_index_virtual VALUES (1, '{"c": "foo", "d": "bar", "e": 5}');

SELECT c, e FROM test_index_virtual@test_index_virtual_c_e_idx WHERE c = 'foo' AND e > 4; --! fails
SELECT c, e FROM test_index_virtual@test_index_virtual_c_e_idx WHERE c is not null and e is not null; --! works
SELECT c, e FROM test_index_virtual@test_index_virtual_c_e_idx WHERE c is not null and e is not null and c = 'foo' and e > 4; --! fails

Spent quite some time investigating the problem, even tried to force the index scan. The exception was thrown here:

"index \"%s\" is a partial index that does not contain all the rows needed to execute this query",
, when the index that is actually needed mismatches the index in the flag.

Changed the code to force the scan the flags.index, the query worked, but empty result was returned, likely due to predicate matching.
It seemed to me that the predict added here can be further pruned:

b.addPartialIndexPredicatesForTable(tabMeta, outScope.expr)

and it might also have something to do with virtual columns, because they are always applied a projection

Expected behavior
The query should not fail.

Environment:

  • CockroachDB version 24.1 alpha, also have tested on 23.1.17 and 23.2.3
  • Server OS: [e.g. Linux/Distrib]
  • Client app [e.g. cockroach sql]

Jira issue: CRDB-37825

@cjireland cjireland added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 15, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Apr 15, 2024
@dikshant dikshant added the T-sql-queries SQL Queries Team label Apr 15, 2024
@michae2
Copy link
Collaborator

michae2 commented Apr 15, 2024

I wonder if this is at all related to #112978?

@mgartner
Copy link
Collaborator

Here's a smaller reproduction, which I believe has the same underlying cause:

CREATE TABLE t (
  a JSON,
  b INT AS ((a->>'c')::INT) VIRTUAL,
  INDEX i (b) WHERE b IS NOT NULL
);

SELECT * FROM t@i WHERE b > 4;
-- ERROR: index "i" is a partial index that does not contain all the rows needed to execute this query
-- SQLSTATE: 42809

The problem is that the partial index implicator cannot prove that (a->>'c')::INT > 4 implies (a->>'c')::INT IS NOT NULL. We use constraints to prove implication of expressions that are not identical. A constraint can only constrain columns, and it's impossible to create a constraint for a given (a->>'c')::INT > 4.

I think to fix this we'll have to make the partial index implicator aware of virtual computed columns so that it can build constraints on those columns, instead of trying to bulid constraints on the columns referenced in the computed column expression. So in this case we'd build the constraint /b: [/4 - ] for (a->>'c')::INT > 4 and /b: [/NULL - ] for (a->>'c')::INT IS NOT NULL, making it trivial for the implicator to see that the former is contained by the latter, and thus the former implies the latter.

@mgartner mgartner changed the title ERROR: index "<myIndex>" is a partial index that does not contain all the rows needed to execute this query sql/opt: prove partial index implication with in-exact atoms containing virtual computed columns Apr 15, 2024
@cty123
Copy link
Contributor

cty123 commented Apr 16, 2024

During the opt phase, I checked that b.addPartialIndexPredicatesForTable(tabMeta, outScope.expr) was able to correctly add the partial index predicate and it's aware of the fact that the a is-not null. So my expectation was that, during the opt phase, the engine would simplify the query further to narrow down the constraints, but that wasn't the case.

@mgartner
Copy link
Collaborator

mgartner commented Apr 16, 2024

... it's aware of the fact that the a is-not null.

The a being the non-virtual column, correct? a being NOT NULL does not ensure that the expression a->>'c' is NOT NULL.

I'm fairly confident that b.addPartialIndexPredicatesForTable is not at fault, and improving the implicator logic here will be the correct solution.

@cty123
Copy link
Contributor

cty123 commented Apr 16, 2024

Sorry, I mixed up the column name here. I was referring to the column b in your example table t.

During the execution, I indeed see the predicate being added correctly, and it's aware that the column b in the index can't be null. It looks like the later optimization or the transformation should incorporate that information.

@mgartner
Copy link
Collaborator

When attempting the produce the partial index scan, the query plan being operated on looks like:

project
 ├── columns: a:1 b:2
 ├── immutable
 ├── select
 │    ├── columns: a:1
 │    ├── immutable
 │    ├── scan t
 │    │    ├── columns: a:1
 │    │    └── partial index predicates
 │    │         └── i: filters
 │    │              └── (a:1->>'c')::INT8 IS NOT NULL [outer=(1), immutable]
 │    └── filters
 │         └── (a:1->>'c')::INT8 > 4 [outer=(1), immutable]
 └── projections
      └── (a:1->>'c')::INT8 [as=b:2, outer=(1), immutable]

Notice that the filter b > 4 has been converted to (a:1->>'c')::INT8 > 4 and pushed below the projection of b. This is necessary in order to generate a constrained scan on any type of secondary index with a virtual computed column.

However, no filter constraint is generated for (a:1->>'c')::INT8 > 4, nor for (a:1->>'c')::INT8 IS NOT NULL because it is impossible to do so with respect to a. To prove that (a:1->>'c')::INT8 > 4 implies (a:1->>'c')::INT8 IS NOT NULL, we need the implicator to understand that the expression (a:1->>'c')::INT8 represents the virtual column b and we can temporarily generate a filter constraint w.r.t. that column to prove implication.

@mgartner mgartner self-assigned this Apr 26, 2024
@mgartner mgartner moved this from Triage to Active in SQL Queries Apr 26, 2024
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 26, 2024
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

    CREATE TABLE (
      j JSONB,
      i INT AS ((j->>'x')::INT) VIRTUAL,
      INDEX (i) WHERE i IS NOT NULL
    );

    SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as `(j->>'x')::INT = 10`,
imply the partial index predicate, built as
`(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: `i = 10` and `i IS NOT NULL`.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes cockroachdb#122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.
craig bot pushed a commit that referenced this issue Apr 30, 2024
123163: opt: improve partial index implication with virtual columns r=mgartner a=mgartner

#### opt: add Implicator benchmarks with virtual columns

Release note: None

#### opt: improve partial index implication with virtual columns

The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

    CREATE TABLE (
      j JSONB,
      i INT AS ((j->>'x')::INT) VIRTUAL,
      INDEX (i) WHERE i IS NOT NULL
    );

    SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as `(j->>'x')::INT = 10`,
imply the partial index predicate, built as
`(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: `i = 10` and `i IS NOT NULL`.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes #122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.

#### opt: add session setting for virtual computed column implication

The `optimizer_prove_implication_with_virtual_computed_columns` has been
added which, when enabled, indicates that virtual computed columns
should be considered during partial index implication to try to prove
that a filter implies a predicate.

Release note: None


Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
@craig craig bot closed this as completed in 3aa3d6f Apr 30, 2024
@github-project-automation github-project-automation bot moved this from Active to Done in SQL Queries Apr 30, 2024
mgartner added a commit to mgartner/cockroach that referenced this issue May 1, 2024
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

    CREATE TABLE (
      j JSONB,
      i INT AS ((j->>'x')::INT) VIRTUAL,
      INDEX (i) WHERE i IS NOT NULL
    );

    SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as `(j->>'x')::INT = 10`,
imply the partial index predicate, built as
`(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: `i = 10` and `i IS NOT NULL`.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes cockroachdb#122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.
mgartner added a commit to mgartner/cockroach that referenced this issue May 2, 2024
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

    CREATE TABLE (
      j JSONB,
      i INT AS ((j->>'x')::INT) VIRTUAL,
      INDEX (i) WHERE i IS NOT NULL
    );

    SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as `(j->>'x')::INT = 10`,
imply the partial index predicate, built as
`(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: `i = 10` and `i IS NOT NULL`.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes cockroachdb#122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.
mgartner added a commit to mgartner/cockroach that referenced this issue May 2, 2024
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

    CREATE TABLE (
      j JSONB,
      i INT AS ((j->>'x')::INT) VIRTUAL,
      INDEX (i) WHERE i IS NOT NULL
    );

    SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as `(j->>'x')::INT = 10`,
imply the partial index predicate, built as
`(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: `i = 10` and `i IS NOT NULL`.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes cockroachdb#122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants