New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1.20-rc: opt: add GenerateTrigramSimilarityInvertedIndexScans rule #122683
release-23.1.20-rc: opt: add GenerateTrigramSimilarityInvertedIndexScans rule #122683
Conversation
This commit adds a new session setting called `optimizer_use_trigram_similarity_optimization`. It currently has no effect, but in future commits it will allow control over a new optimization for queries with trigram similarity filters. Release note: None
The `GenerateInvertedIndexScans` rule is now disabled for similarity filters on trigram indexes when the `optimizer_use_text_similarity_optimization` session setting is enabled. In a future commit this setting will enable a new rule that makes `GenerateInvertedIndexScans` obsolete. There is no reason to trigger both rules. Release note: None
The `GenerateTrigramSimilarityInvertedIndexScans` exploration rule has been added, which index-accelerates trigram similarity filters. See the comment above the newly added custom function for more details. In future commits, this rule will be optimized further. The tests for this rule break from convention: they are added to a new file rather than to the `select` test file corresponding to the location of the rule in `select.opt`. This proposed new method of organization is motivated by the huge growth of the `select` file over the years. If there is agreement on this, we can incrementally reorganize all existing tests to match. Release note (performance improvement): More efficient query plans are now generated for queries with text similarity filters, e.g., `text_col % 'foobar'`. These plans are generated if the `optimizer_use_trigram_similarity_optimization` session setting is enabled. It is disabled by default.
The `GenerateSimilarityInvertedIndexScans` exploration rule now generates plans that scan fewer trigrams. See the added code comments for more details. Release note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
Also, please add a brief release justification to the body of your PR to justify this |
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, 2 of 2 files at r2, 14 of 14 files at r3, 10 of 10 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1, 2 of 2 files at r2, 14 of 14 files at r3, 10 of 10 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
efa090c
to
aab4088
Compare
The logic for building constraints on prefix columns of multi-column inverted indexes has been refactored and consolidated. Release note: None
aab4088
to
5929688
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few nits -- feel free to ignore or open another PR on master. This can be merged as-is.
Reviewed 8 of 8 files at r1, 2 of 2 files at r2, 14 of 14 files at r3, 10 of 10 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)
pkg/sql/opt/invertedidx/inverted_index_expr.go
line 272 at r3 (raw file):
con = &constraint.Constraint{} con.Init(&keyCtx, &spans) break
What if there are multiple similarity filters? Is it worth generating a different scan expression in the memo for each one?
pkg/sql/opt/xform/select_funcs.go
line 997 at r3 (raw file):
newScanPrivate := *scanPrivate newScanPrivate.Distribution.Regions = nil
nit: maybe add a TODO to calculate the regions here? There are some cases where we should be able to determine the regions statically (e.g., REGIONAL BY TABLE).
pkg/sql/opt/xform/testdata/rules/generate_trigram_similarity_inverted_index_scans
line 1 at r3 (raw file):
exec-ddl
nice tests!
nit: would help to add some comments about what functionality each test is exercising
Previously, rytaft (Rebecca Taft) wrote…
Yes, this is naively for now. I've added a TODO here: #122775 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/xform/select_funcs.go
line 997 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: maybe add a TODO to calculate the regions here? There are some cases where we should be able to determine the regions statically (e.g., REGIONAL BY TABLE).
It looks like custom functions never set the distribution directly, and it's done here instead:
cockroach/pkg/sql/opt/xform/optimizer.go
Line 818 in 9514555
provided.Distribution = distribution.BuildProvided(o.ctx, o.evalCtx, relParent, &parentProps.Distribution) |
pkg/sql/opt/xform/testdata/rules/generate_trigram_similarity_inverted_index_scans
line 1 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nice tests!
nit: would help to add some comments about what functionality each test is exercising
Done here: #122775
This reverts commit 5929688. It is not strictly necessary for the new functionality in cockroachdb#122683. Release note: None
Backport 5/5 commits from #121973.
/cc @cockroachdb/release
sql: add optimizer_use_trigram_similarity_optimization setting
This commit adds a new session setting called
optimizer_use_trigram_similarity_optimization
. It currently has noeffect, but in future commits it will allow control over a new
optimization for queries with trigram similarity filters.
Release note: None
sql: disable inverted scans when similarity optimization is enabled
The
GenerateInvertedIndexScans
rule is now disabled for similarityfilters on trigram indexes when the
optimizer_use_text_similarity_optimization
session setting is enabled.In a future commit this setting will enable a new rule that makes
GenerateInvertedIndexScans
obsolete. There is no reason to triggerboth rules.
Release note: None
opt: add GenerateTrigramSimilarityInvertedIndexScans rule
The
GenerateTrigramSimilarityInvertedIndexScans
exploration rule hasbeen added, which index-accelerates trigram similarity filters. See the
comment above the newly added custom function for more details. In
future commits, this rule will be optimized further.
The tests for this rule break from convention: they are added to a new
file rather than to the
select
test file corresponding to the locationof the rule in
select.opt
. This proposed new method of organization ismotivated by the huge growth of the
select
file over the years. Ifthere is agreement on this, we can incrementally reorganize all existing
tests to match.
Release note (performance improvement): More efficient query plans are
now generated for queries with text similarity filters, e.g.,
text_col % 'foobar'
. These plans are generated if theoptimizer_use_trigram_similarity_optimization
session setting isenabled. It is disabled by default.
opt: reduce scanned trigrams for similarity filters
The
GenerateSimilarityInvertedIndexScans
exploration rule nowgenerates plans that scan fewer trigrams. See the added code comments
for more details.
Release note: None
opt: consolidate logic for constraining inverted index prefix columns
The logic for building constraints on prefix columns of multi-column
inverted indexes has been refactored and consolidated.
Fixes #112675
Release note: None
Release justification: Performance improvements gated behind a
session setting.