Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1.20-rc: opt: add GenerateTrigramSimilarityInvertedIndexScans rule #122683

Merged

Conversation

mgartner
Copy link
Collaborator

@mgartner mgartner commented Apr 19, 2024

Backport 5/5 commits from #121973.

/cc @cockroachdb/release


sql: add optimizer_use_trigram_similarity_optimization setting

This commit adds a new session setting called
optimizer_use_trigram_similarity_optimization. It currently has no
effect, but in future commits it will allow control over a new
optimization for queries with trigram similarity filters.

Release note: None

sql: disable inverted scans when similarity optimization is enabled

The GenerateInvertedIndexScans rule is now disabled for similarity
filters on trigram indexes when the
optimizer_use_text_similarity_optimization session setting is enabled.
In a future commit this setting will enable a new rule that makes
GenerateInvertedIndexScans obsolete. There is no reason to trigger
both rules.

Release note: None

opt: add GenerateTrigramSimilarityInvertedIndexScans rule

The GenerateTrigramSimilarityInvertedIndexScans exploration rule has
been added, which index-accelerates trigram similarity filters. See the
comment above the newly added custom function for more details. In
future commits, this rule will be optimized further.

The tests for this rule break from convention: they are added to a new
file rather than to the select test file corresponding to the location
of the rule in select.opt. This proposed new method of organization is
motivated by the huge growth of the select file over the years. If
there is agreement on this, we can incrementally reorganize all existing
tests to match.

Release note (performance improvement): More efficient query plans are
now generated for queries with text similarity filters, e.g.,
text_col % 'foobar'. These plans are generated if the
optimizer_use_trigram_similarity_optimization session setting is
enabled. It is disabled by default.

opt: reduce scanned trigrams for similarity filters

The GenerateSimilarityInvertedIndexScans exploration rule now
generates plans that scan fewer trigrams. See the added code comments
for more details.

Release note: None

opt: consolidate logic for constraining inverted index prefix columns

The logic for building constraints on prefix columns of multi-column
inverted indexes has been refactored and consolidated.

Fixes #112675

Release note: None


Release justification: Performance improvements gated behind a
session setting.

This commit adds a new session setting called
`optimizer_use_trigram_similarity_optimization`. It currently has no
effect, but in future commits it will allow control over a new
optimization for queries with trigram similarity filters.

Release note: None
The `GenerateInvertedIndexScans` rule is now disabled for similarity
filters on trigram indexes when the
`optimizer_use_text_similarity_optimization` session setting is enabled.
In a future commit this setting will enable a new rule that makes
`GenerateInvertedIndexScans` obsolete. There is no reason to trigger
both rules.

Release note: None
The `GenerateTrigramSimilarityInvertedIndexScans` exploration rule has
been added, which index-accelerates trigram similarity filters. See the
comment above the newly added custom function for more details. In
future commits, this rule will be optimized further.

The tests for this rule break from convention: they are added to a new
file rather than to the `select` test file corresponding to the location
of the rule in `select.opt`. This proposed new method of organization is
motivated by the huge growth of the `select` file over the years. If
there is agreement on this, we can incrementally reorganize all existing
tests to match.

Release note (performance improvement): More efficient query plans are
now generated for queries with text similarity filters, e.g.,
`text_col % 'foobar'`. These plans are generated if the
`optimizer_use_trigram_similarity_optimization` session setting is
enabled. It is disabled by default.
The `GenerateSimilarityInvertedIndexScans` exploration rule now
generates plans that scan fewer trigrams. See the added code comments
for more details.

Release note: None
@mgartner mgartner requested a review from a team April 19, 2024 16:09
@mgartner mgartner requested review from a team as code owners April 19, 2024 16:09
@mgartner mgartner requested review from DrewKimball and removed request for a team April 19, 2024 16:09
Copy link

blathers-crl bot commented Apr 19, 2024

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Backports should only be created for serious
    issues
    or test-only changes.
  • Backports should not break backwards-compatibility.
  • Backports should change as little code as possible.
  • Backports should not change on-disk formats or node communication protocols.
  • Backports should not add new functionality (except as defined
    here).
  • Backports must not add, edit, or otherwise modify cluster versions; or add version gates.
  • All backports must be reviewed by the owning areas TL and one additional
    TL. For more information as to how that review should be conducted, please consult the backport
    policy
    .
If your backport adds new functionality, please ensure that the following additional criteria are satisfied:
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters. State changes must be further protected such that nodes running old binaries will not be negatively impacted by the new state (with a mixed version test added).
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.
  • Your backport must be accompanied by a post to the appropriate Slack
    channel (#db-backports-point-releases or #db-backports-XX-X-release) for awareness and discussion.

Also, please add a brief release justification to the body of your PR to justify this
backport.

@blathers-crl blathers-crl bot added the backport Label PR's that are backports to older release branches label Apr 19, 2024
Copy link

blathers-crl bot commented Apr 19, 2024

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 8 of 8 files at r1, 2 of 2 files at r2, 14 of 14 files at r3, 10 of 10 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball)

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 8 files at r1, 2 of 2 files at r2, 14 of 14 files at r3, 10 of 10 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

The logic for building constraints on prefix columns of multi-column
inverted indexes has been refactored and consolidated.

Release note: None
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just a few nits -- feel free to ignore or open another PR on master. This can be merged as-is.

Reviewed 8 of 8 files at r1, 2 of 2 files at r2, 14 of 14 files at r3, 10 of 10 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @mgartner)


pkg/sql/opt/invertedidx/inverted_index_expr.go line 272 at r3 (raw file):

		con = &constraint.Constraint{}
		con.Init(&keyCtx, &spans)
		break

What if there are multiple similarity filters? Is it worth generating a different scan expression in the memo for each one?


pkg/sql/opt/xform/select_funcs.go line 997 at r3 (raw file):

		newScanPrivate := *scanPrivate
		newScanPrivate.Distribution.Regions = nil

nit: maybe add a TODO to calculate the regions here? There are some cases where we should be able to determine the regions statically (e.g., REGIONAL BY TABLE).


pkg/sql/opt/xform/testdata/rules/generate_trigram_similarity_inverted_index_scans line 1 at r3 (raw file):

exec-ddl

nice tests!

nit: would help to add some comments about what functionality each test is exercising

@mgartner
Copy link
Collaborator Author

pkg/sql/opt/invertedidx/inverted_index_expr.go line 272 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

What if there are multiple similarity filters? Is it worth generating a different scan expression in the memo for each one?

Yes, this is naively for now. I've added a TODO here: #122775

@mgartner mgartner requested a review from rytaft April 21, 2024 16:58
Copy link
Collaborator Author

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/xform/select_funcs.go line 997 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: maybe add a TODO to calculate the regions here? There are some cases where we should be able to determine the regions statically (e.g., REGIONAL BY TABLE).

It looks like custom functions never set the distribution directly, and it's done here instead:

provided.Distribution = distribution.BuildProvided(o.ctx, o.evalCtx, relParent, &parentProps.Distribution)


pkg/sql/opt/xform/testdata/rules/generate_trigram_similarity_inverted_index_scans line 1 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nice tests!

nit: would help to add some comments about what functionality each test is exercising

Done here: #122775

@mgartner mgartner merged commit 0f260de into cockroachdb:release-23.1.20-rc Apr 21, 2024
6 checks passed
@mgartner mgartner deleted the backport23.1.20-rc-121973 branch April 21, 2024 16:59
mgartner added a commit to mgartner/cockroach that referenced this pull request Apr 22, 2024
This reverts commit 5929688. It is not
strictly necessary for the new functionality in cockroachdb#122683.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Label PR's that are backports to older release branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants