Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: like_escape is slower than SELECT LIKE #30192

Open
ghost opened this issue Sep 13, 2018 · 21 comments
Open

opt: like_escape is slower than SELECT LIKE #30192

ghost opened this issue Sep 13, 2018 · 21 comments
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team X-nostale Marks an issue/pr that should be ignored by the stale bot

Comments

@ghost
Copy link

ghost commented Sep 13, 2018

BUG REPORT

Please describe the issue you observed, and any steps we can take to reproduce it:

  • Which version of CockroachDB are you using?
    v2.1.0-beta.20180910

It seems that SELECT ... LIKE is a lot faster than SELECT ... LIKE ... ESCAPE

SELECT id FROM tbl WHERE type = 1 AND name LIKE 'abc_def%';
SELECT id FROM tbl WHERE type = 1 AND name LIKE 'abc\\_def%' ESCAPE '\';

Example timing on a few million row table:
Time: 24.575054ms
Time: 3m37.614735676s

root@:26257/defaultdb> EXPLAIN (VERBOSE) SELECT id FROM video WHERE type = 1 AND name LIKE ''abc_def%';
    tree    |  field   |                           description                           |         columns         | ordering
+-----------+----------+-----------------------------------------------------------------+-------------------------+----------+
  render    |          |                                                                 | (id)                    |
   │        | render 0 | id                                                              |                         |
   └── scan |          |                                                                 | (name, type)            |
            | table    | video@type_name_storing_id                                      |                         |
            | spans    | /0/"_"/PrefixEnd-/0/"_"/PrefixEnd                               |                         |
            | filter   | name LIKE 'abc_def%'                                            |                         |
(6 rows)

Time: 12.71157ms
root@:26257/defaultdb> EXPLAIN (VERBOSE) SELECT id FROM video WHERE type = 1 AND name LIKE 'abc\_def%' ESCAPE '\';
    tree    |  field   |                       description                        |         columns         | ordering
+-----------+----------+----------------------------------------------------------+-------------------------+----------+
  render    |          |                                                          | (id)                    |
   │        | render 0 | id                                                       |                         |
   └── scan |          |                                                          | (name, type)            |
            | table    | video@type_name_storing_id                               |                         |
            | spans    | /0-/1                                                    |                         |
            | filter   | like_escape(name, e'abc\_def%', e'\\')                   |                         |
(6 rows)

Time: 9.861181ms

Jira issue: CRDB-7884

@knz
Copy link
Contributor

knz commented Sep 13, 2018

As you found via EXPLAIN, when you use ESCAPE the query optimization that narrows the search on the index is disabled.

This is a limitation of the current query engine.

cc @andy-kimball

@knz knz changed the title like_escape is slower than SELECT LIKE opt: like_escape is slower than SELECT LIKE Sep 13, 2018
@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-optimizer SQL logical planning and optimizations. docs-known-limitation labels Sep 13, 2018
@andy-kimball
Copy link
Contributor

Currently, the optimizer only recognizes the like operator, not the like_escape function.

cc @RaduBerinde

@andy-kimball andy-kimball added this to To do in BACKLOG, NO NEW ISSUES: SQL Optimizer via automation Sep 13, 2018
@andy-kimball andy-kimball moved this from To do to Higher Priority Backlog in BACKLOG, NO NEW ISSUES: SQL Optimizer Sep 13, 2018
@knz
Copy link
Contributor

knz commented Sep 13, 2018

To propagate my offline comments here: the A LIKE B ESCAPE C is the first/only 3-way operator. It's not completely trivial to introduce 3-way operators throughout the entire SQL layer. However we could perhaps specialize the code to treat a literal/constant C as a 2-way special LIKE operator.

@knz
Copy link
Contributor

knz commented Sep 13, 2018

@andy-kimball FYI this is a rather uncommonly used feature. I'm not sure about the "high priority" label.

@andy-kimball
Copy link
Contributor

That just means it's a backlog that I review more frequently, not that issues in it are necessarily high priority (i.e. it's "higher priority" than the lower priority backlog, not "high priority"). Maybe I'll rename to reduce confusion.

@knz
Copy link
Contributor

knz commented Sep 13, 2018

it's ok, I just didn't know. Thanks for explaining.

@sploiselle
Copy link
Contributor

@andy-kimball Can I get a quick blurb describing this known limitation w/r/t the impact to user experience? Ideally, we need it by Friday 10/26 for the 2.1 Known Limitations page. Posting it on this issue and/or pinging me would be great.

@knz
Copy link
Contributor

knz commented Oct 24, 2018

Sean:

"""
title: Using LIKE...ESCAPE in WHERE and HAVING constraints

CockroachDB tries to optimize most comparisons operators in WHERE and HAVING clauses into constraints on SQL indexes so that only the selected rows are accessed. This is done for LIKE clauses when a common prefix for all selected rows can be determined in the search pattern (e.g. `... LIKE 'Joe%''). However, this optimization is not yet available if the ESCAPE keyword is also used.
"""

@andy-kimball
Copy link
Contributor

Thanks for providing, @knz.

@RaduBerinde RaduBerinde moved this from Higher Priority Backlog to Plan enhancements (higher priority) in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 18, 2020
@github-actions
Copy link

github-actions bot commented Jun 5, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Jun 5, 2021

This problem still exists in v21.1.

Moreover, even the LIKE operator behaves strangely:

with LIKE 'abc_def%' I would expect a span from .../abc_def to .../abc_deg (exclusive).

However the generated span is /abc.../abd which is far more inclusive.

It looks like the span generation for LIKE gets confused by _. Is that intended?

cc @rytaft for re-triage.

@knz knz added this to Triage in SQL Queries via automation Jun 5, 2021
@knz knz changed the title opt: like_escape is slower than SELECT LIKE opt: spans for LIKE are too wide + like_escape is slower than SELECT LIKE Jun 5, 2021
@RaduBerinde
Copy link
Member

It looks like the span generation for LIKE gets confused by _. Is that intended?

Without any escaping, _ matches any character.

@knz
Copy link
Contributor

knz commented Jun 6, 2021

Oh i didn't know this. TIL thanks

@rytaft rytaft changed the title opt: spans for LIKE are too wide + like_escape is slower than SELECT LIKE opt: like_escape is slower than SELECT LIKE Jun 7, 2021
@rytaft
Copy link
Collaborator

rytaft commented Jun 7, 2021

This issue made me realize we also have a bug in our index constraint generation code for LIKE. I've opened #66144 to track that issue.

@rytaft rytaft moved this from Triage to Backlog in SQL Queries Jun 8, 2021
@rytaft rytaft added the E-quick-win Likely to be a quick win for someone experienced. label Jun 8, 2021
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@bladefist
Copy link

I'm dealing with an issue on _ queries, where I want the literal value of an underscore. I tried adding a backslash to escape it, but I get no results (there should be results). ESCAPE keyword is also very slow but not really what I need anyway.

Is this a bug or am I doing something wrong?

@knz
Copy link
Contributor

knz commented Feb 2, 2022

@bladefist is this the right place to ask this question? I don't think this particular github issue is related to it.
I think you'll get a fast(er) answer on the community slack or the forum.

@bladefist
Copy link

@knz I thought I was responding to the original post which commented on a bug with ESCAPE being slow, and/or the escape character \ not escaping an underscore properly.

Maybe I'm doing something wrong but my intention was report bug

@knz
Copy link
Contributor

knz commented Feb 3, 2022

Maybe I'm doing something wrong but my intention was report bug

The concern you're having is unrelated to the issue at top. You want to report a new bug, not contribute to the conversation about the fact that like_escape is slower.

So you can file a new issue and/or discuss the new bug on the community slack.

@mdelvecchio-wasabi
Copy link

We just ran into this issue in production.

The query took the form SELECT ... WHERE StringColum LIKE 'prefix%' ESCAPE '\', and was incredibly slow, reading millions of rows to find the one matching row.

After we removed the unnecessary redundant specification of the escape character used, the query was SELECT ... WHERE StringColumn LIKE 'prefix%', the query was fast, and read only the matching row.

Documentation says that \ is the default escape character, and thus there is no functional difference between specifying ESCAPE '\' and omitting it. And extensive testing confirms that the two queries return the same results.

So our solution is to just remove the ESCAPE '\' from our query, as it is unnecessary, and causes huge slowdowns.

@mgartner mgartner moved this from Backlog to New Backlog in SQL Queries Feb 16, 2023
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Sep 22, 2023

still relevant

@knz knz added X-nostale Marks an issue/pr that should be ignored by the stale bot and removed no-issue-activity labels Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. docs-done docs-known-limitation E-quick-win Likely to be a quick win for someone experienced. T-sql-queries SQL Queries Team X-nostale Marks an issue/pr that should be ignored by the stale bot
Projects
BACKLOG, NO NEW ISSUES: SQL Optimizer
Plan enhancements (higher priority)
Status: Backlog
SQL Queries
New Backlog
Development

No branches or pull requests

9 participants