-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.2.5-rc: opt: improve partial index implication with virtual columns #123408
release-23.2.5-rc: opt: improve partial index implication with virtual columns #123408
Conversation
Release note: None
The partial index implicator is now aware of computed columns and their expressions. This allows implication to be proven in more cases. For example, consider the table and query: CREATE TABLE ( j JSONB, i INT AS ((j->>'x')::INT) VIRTUAL, INDEX (i) WHERE i IS NOT NULL ); SELECT * FROM t WHERE i = 10; Prior to this commit, the optimizer would not plan a constrained scan over the partial index because the implicator could not prove that the query filters, pushed into the projection as `(j->>'x')::INT = 10`, imply the partial index predicate, built as `(j->>'x')::INT IS NOT NULL`. To prove implication cases like this where the expressions are not exact matches, the implicator must build constraints to check if the predicate contains the filter. Constraints cannot be built from these expressions, so verifying containment was impossible. Now that the implicator is aware of computed columns and their expressions, it converts the filter and predicate into expressions referencing virtual computed columns: `i = 10` and `i IS NOT NULL`. Constraints can be built from expressions in this forms, containment can be checked, and implication can be proven. Fixes cockroachdb#122352 Release note (sql change): The optimizer can now plan constrained scans over partial indexes in more cases, particularly on partial indexes with predicates referencing virtual computed columns.
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the memo_test.go
backport is fixed.
Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)
pkg/sql/opt/memo/memo_test.go
line 434 at r3 (raw file):
<<<<<<< HEAD =======
Looks like this bit didn't backport cleanly
The `optimizer_prove_implication_with_virtual_computed_columns` has been added which, when enabled, indicates that virtual computed columns should be considered during partial index implication to try to prove that a filter implies a predicate. Release note: None
591283d
to
46acd12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 7 of 7 files at r2, 10 of 10 files at r4, 10 of 10 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
pkg/sql/opt/partialidx/implicator.go
line 541 at r2 (raw file):
} } return im.f.Replace(e, replace)
This doesn't look like it will terminate if there are no matching computed columns. What am I missing?
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 5 at r2 (raw file):
# Atoms with no computed column references. # # These cases best simulate filter-predicate implication in practice, compared
I'm confused, these cases do seem to contain computed column references.
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 152 at r2 (raw file):
# Atoms with virtual computed columns referenced in the filter. This should # never happen in practice because virtual computed column references should be # replaced with their computed expression.
but isn't this replacing happening as part of implication?
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 180 at r2 (raw file):
# Atoms with virtual computed columns referenced in the predicate. This should # never happen in practice because virtual computed column references should be # replaced with their computed expression.
same question
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 208 at r2 (raw file):
# Atoms with virtual computed columns referenced in the filter and predicate. # This should never happen in practice because virtual computed column # references should be replaced with their computed expression.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/partialidx/implicator.go
line 541 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This doesn't look like it will terminate if there are no matching computed columns. What am I missing?
This is the standard pattern for replacement. im.f.Replace
will call replace
on each of e
's children, not on e
itself. So it will terminate.
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 5 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I'm confused, these cases do seem to contain computed column references.
They contain the computed columns' expression, but they do not reference the computed columns.
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 152 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
but isn't this replacing happening as part of implication?
No, implication replaces the virtual columns' expressions with virtual column references, e.g., (a->>'x')::INT
is replaced with b
during implication.
I tried to explain this in the commit message a bit, but let me know if you have suggestions to make it more clear:
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:
CREATE TABLE (
j JSONB,
i INT AS ((j->>'x')::INT) VIRTUAL,
INDEX (i) WHERE i IS NOT NULL
);
SELECT * FROM t WHERE i = 10;
Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as (j->>'x')::INT = 10
,
imply the partial index predicate, built as
(j->>'x')::INT IS NOT NULL
. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.
pkg/sql/opt/memo/memo_test.go
line 434 at r3 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Looks like this bit didn't backport cleanly
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @mgartner)
pkg/sql/opt/partialidx/implicator.go
line 541 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
This is the standard pattern for replacement.
im.f.Replace
will callreplace
on each ofe
's children, not one
itself. So it will terminate.
Ah right -- forgot how this worked
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 5 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
They contain the computed columns' expression, but they do not reference the computed columns.
Got it, thanks
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 152 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
No, implication replaces the virtual columns' expressions with virtual column references, e.g.,
(a->>'x')::INT
is replaced withb
during implication.I tried to explain this in the commit message a bit, but let me know if you have suggestions to make it more clear:
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:CREATE TABLE ( j JSONB, i INT AS ((j->>'x')::INT) VIRTUAL, INDEX (i) WHERE i IS NOT NULL ); SELECT * FROM t WHERE i = 10;
Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as(j->>'x')::INT = 10
,
imply the partial index predicate, built as
(j->>'x')::INT IS NOT NULL
. To prove implication cases like this where
the expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.
Yea the commit message is clear. I was just confused by this comment in the test. Maybe it would help to say where the references get replaced (i.e., does this just happen in optbuilder? is it a normalization rule?). But it's not too important, not sure it's worth the effort to open a new PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)
pkg/sql/opt/partialidx/testdata/implicator/virtual
line 152 at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Yea the commit message is clear. I was just confused by this comment in the test. Maybe it would help to say where the references get replaced (i.e., does this just happen in optbuilder? is it a normalization rule?). But it's not too important, not sure it's worth the effort to open a new PR.
Good point. I'll clarify in a follow-up PR, but go ahead and merge these backports.
Backport 3/3 commits from #123163.
/cc @cockroachdb/release
opt: add Implicator benchmarks with virtual columns
Release note: None
opt: improve partial index implication with virtual columns
The partial index implicator is now aware of computed columns and their
expressions. This allows implication to be proven in more cases. For
example, consider the table and query:
Prior to this commit, the optimizer would not plan a constrained scan
over the partial index because the implicator could not prove that the
query filters, pushed into the projection as
(j->>'x')::INT = 10
,imply the partial index predicate, built as
(j->>'x')::INT IS NOT NULL
. To prove implication cases like this wherethe expressions are not exact matches, the implicator must build
constraints to check if the predicate contains the filter. Constraints
cannot be built from these expressions, so verifying containment was
impossible.
Now that the implicator is aware of computed columns and their
expressions, it converts the filter and predicate into expressions
referencing virtual computed columns:
i = 10
andi IS NOT NULL
.Constraints can be built from expressions in this forms, containment can
be checked, and implication can be proven.
Fixes #122352
Release note (sql change): The optimizer can now plan constrained scans
over partial indexes in more cases, particularly on partial indexes with
predicates referencing virtual computed columns.
opt: add session setting for virtual computed column implication
The
optimizer_prove_implication_with_virtual_computed_columns
has beenadded which, when enabled, indicates that virtual computed columns
should be considered during partial index implication to try to prove
that a filter implies a predicate.
Release note: None
Release justification: Fix for limitation with partial indexes and
virtual computed columns that is gated behind a session setting.