Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: support running LIKE operator against collated columns (with suitable indexing) #20666

Open
idubrov opened this issue Dec 12, 2017 · 5 comments
Labels
A-sql-collated-strings A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team X-nostale Marks an issue/pr that should be ignored by the stale bot

Comments

@idubrov
Copy link

idubrov commented Dec 12, 2017

Primary use case: prefixed search against collated columns (LIKE 'Prefix%').
Secondary use case: arbitrary LIKE searches against collated columns (LIKE '%Infix%').

Example:

CREATE TABLE test (name STRING COLLATE en_u_ks_level2);
INSERT INTO test(name) VALUES ('Alex' COLLATE en_u_ks_level2);
SELECT name FROM test WHERE name LIKE 'Ale%' COLLATE en_u_ks_level2; -- doesn't work!

Current output:

pq: unsupported comparison operator: <collatedstring{en_u_ks_level2}> LIKE <collatedstring{en_u_ks_level2}>

Prefix search is should-have. Arbitrary expressions are nice-to-have.

Workaround: it is possible to run range query instead of prefix LIKE search:

SELECT * FROM test WHERE (name >= 'Ale' COLLATE "en_u_ks_level2"  AND name < 'Alf' COLLATE "en_u_ks_level2");

However, the tricky part here is figuring out what should be the "next" symbol according to the collation rules ("e" -> "f" is easy, but could be different case, modifiers, etc).

On the other hand, it should be close to trivial for the CockroachDB to figure out that: once key is computed for the prefix, finding the "next" prefix is easy.

Jira issue: CRDB-5918

@idubrov
Copy link
Author

idubrov commented Dec 13, 2017

I looked into the code a little, and I think I was too optimistic about "trivial". It looks like even if you optimize for the prefix using proper scan, you still need a correct "LIKE" implementation for DCollatedString's.

However, "LIKE" uses a regexp to run the match and regexp package seem not to support collations.

On the other hand, it does not look like an impossible task to implement an algorithm specifically tuned for "LIKE" patterns, to run against binary key generated by the collator (although, simple conversion from pattern to collation key might not work -- I don't know what would happen to '%' and '_' -- so you might need to split pattern into pieces somehow).

@knz knz added this to the 2.1 milestone Feb 1, 2018
@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels Feb 1, 2018
@knz knz changed the title Support running LIKE operator against collated columns sql: support running LIKE operator against collated columns (with suitable indexing) Feb 1, 2018
@knz
Copy link
Contributor

knz commented Feb 1, 2018

cc @awoods187 please consider roadmapping this for one of the upcoming releases.

@knz knz added the A-sql-optimizer SQL logical planning and optimizations. label May 9, 2018
@andy-kimball andy-kimball added this to Lower Priority Backlog in BACKLOG, NO NEW ISSUES: SQL Optimizer Aug 25, 2018
@petermattis petermattis removed this from the 2.1 milestone Oct 5, 2018
@RaduBerinde RaduBerinde moved this from Lower Priority Backlog to Functional issues in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 18, 2020
@github-actions
Copy link

github-actions bot commented Jun 8, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023
BACKLOG, NO NEW ISSUES: SQL Optimizer automation moved this from Functional issues to Done Oct 9, 2023
@knz knz added X-nostale Marks an issue/pr that should be ignored by the stale bot and removed X-stale no-issue-activity labels Oct 9, 2023
@knz
Copy link
Contributor

knz commented Oct 9, 2023

still relevant

@knz knz reopened this Oct 9, 2023
BACKLOG, NO NEW ISSUES: SQL Optimizer automation moved this from Done to Triage Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-collated-strings A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team X-nostale Marks an issue/pr that should be ignored by the stale bot
Projects
Status: Backlog
Development

No branches or pull requests

5 participants