Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.2.5-rc: opt: improve partial index implication with virtual columns #123408

Merged

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented May 1, 2024

Backport 3/3 commits from #123163.

/cc @cockroachdb/release


opt: add Implicator benchmarks with virtual columns

Release note: None

opt: improve partial index implication with virtual columns

The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

CREATE TABLE (
  j JSONB,
  i INT AS ((j->>'x')::INT) VIRTUAL,
  INDEX (i) WHERE i IS NOT NULL
);

SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as (j->>'x')::INT = 10,
imply the partial index predicate, built as
(j->>'x')::INT IS NOT NULL. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: i = 10 and i IS NOT NULL.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes #122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.

opt: add session setting for virtual computed column implication

The optimizer_prove_implication_with_virtual_computed_columns has been
added which, when enabled, indicates that virtual computed columns
should be considered during partial index implication to try to prove
that a filter implies a predicate.

Release note: None


Release justification: Fix for limitation with partial indexes and
virtual computed columns that is gated behind a session setting.

The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

    CREATE TABLE (
      j JSONB,
      i INT AS ((j->>'x')::INT) VIRTUAL,
      INDEX (i) WHERE i IS NOT NULL
    );

    SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as `(j->>'x')::INT = 10`,
imply the partial index predicate, built as
`(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns: `i = 10` and `i IS NOT NULL`.
Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.

Fixes cockroachdb#122352

Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.
@mgartner mgartner requested a review from a team as a code owner May 1, 2024 20:48
Copy link

blathers-crl bot commented May 1, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label May 1, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Once the memo_test.go backport is fixed.

Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/memo/memo_test.go line 434 at r3 (raw file):

<<<<<<< HEAD
=======

Looks like this bit didn't backport cleanly

The `optimizer_prove_implication_with_virtual_computed_columns` has been
added which, when enabled, indicates that virtual computed columns
should be considered during partial index implication to try to prove
that a filter implies a predicate.

Release note: None
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 10 of 10 files at r4, 10 of 10 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


pkg/sql/opt/partialidx/implicator.go line 541 at r2 (raw file):

			}
		}
		return im.f.Replace(e, replace)

This doesn't look like it will terminate if there are no matching computed columns. What am I missing?


pkg/sql/opt/partialidx/testdata/implicator/virtual line 5 at r2 (raw file):

# Atoms with no computed column references.
#
# These cases best simulate filter-predicate implication in practice, compared

I'm confused, these cases do seem to contain computed column references.


pkg/sql/opt/partialidx/testdata/implicator/virtual line 152 at r2 (raw file):

# Atoms with virtual computed columns referenced in the filter. This should
# never happen in practice because virtual computed column references should be
# replaced with their computed expression.

but isn't this replacing happening as part of implication?


pkg/sql/opt/partialidx/testdata/implicator/virtual line 180 at r2 (raw file):

# Atoms with virtual computed columns referenced in the predicate. This should
# never happen in practice because virtual computed column references should be
# replaced with their computed expression.

same question


pkg/sql/opt/partialidx/testdata/implicator/virtual line 208 at r2 (raw file):

# Atoms with virtual computed columns referenced in the filter and predicate.
# This should never happen in practice because virtual computed column
# references should be replaced with their computed expression.

ditto

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)


pkg/sql/opt/partialidx/implicator.go line 541 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This doesn't look like it will terminate if there are no matching computed columns. What am I missing?

This is the standard pattern for replacement. im.f.Replace will call replace on each of e's children, not on e itself. So it will terminate.


pkg/sql/opt/partialidx/testdata/implicator/virtual line 5 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I'm confused, these cases do seem to contain computed column references.

They contain the computed columns' expression, but they do not reference the computed columns.


pkg/sql/opt/partialidx/testdata/implicator/virtual line 152 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

but isn't this replacing happening as part of implication?

No, implication replaces the virtual columns' expressions with virtual column references, e.g., (a->>'x')::INT is replaced with b during implication.

I tried to explain this in the commit message a bit, but let me know if you have suggestions to make it more clear:

The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

CREATE TABLE (
  j JSONB,
  i INT AS ((j->>'x')::INT) VIRTUAL,
  INDEX (i) WHERE i IS NOT NULL
);

SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as (j->>'x')::INT = 10,
imply the partial index predicate, built as
(j->>'x')::INT IS NOT NULL. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.


pkg/sql/opt/memo/memo_test.go line 434 at r3 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Looks like this bit didn't backport cleanly

Done.

@rytaft rytaft requested a review from DrewKimball May 2, 2024 18:25
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thank you!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)


pkg/sql/opt/partialidx/implicator.go line 541 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This is the standard pattern for replacement. im.f.Replace will call replace on each of e's children, not on e itself. So it will terminate.

Ah right -- forgot how this worked


pkg/sql/opt/partialidx/testdata/implicator/virtual line 5 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

They contain the computed columns' expression, but they do not reference the computed columns.

Got it, thanks


pkg/sql/opt/partialidx/testdata/implicator/virtual line 152 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

No, implication replaces the virtual columns' expressions with virtual column references, e.g., (a->>'x')::INT is replaced with b during implication.

I tried to explain this in the commit message a bit, but let me know if you have suggestions to make it more clear:

The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:

CREATE TABLE (
  j JSONB,
  i INT AS ((j->>'x')::INT) VIRTUAL,
  INDEX (i) WHERE i IS NOT NULL
);

SELECT * FROM t WHERE i = 10;

Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as (j->>'x')::INT = 10,
imply the partial index predicate, built as
(j->>'x')::INT IS NOT NULL. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.

Yea the commit message is clear. I was just confused by this comment in the test. Maybe it would help to say where the references get replaced (i.e., does this just happen in optbuilder? is it a normalization rule?). But it's not too important, not sure it's worth the effort to open a new PR.

Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)


pkg/sql/opt/partialidx/testdata/implicator/virtual line 152 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Yea the commit message is clear. I was just confused by this comment in the test. Maybe it would help to say where the references get replaced (i.e., does this just happen in optbuilder? is it a normalization rule?). But it's not too important, not sure it's worth the effort to open a new PR.

Good point. I'll clarify in a follow-up PR, but go ahead and merge these backports.

@mgartner mgartner merged commit 2329856 into cockroachdb:release-23.2.5-rc May 2, 2024
5 of 6 checks passed
@mgartner mgartner deleted the backport23.2.5-rc-123163 branch May 2, 2024 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants