sql: support running `LIKE` operator against collated columns (with suitable indexing) #20666

idubrov · 2017-12-12T21:58:00Z

Primary use case: prefixed search against collated columns (LIKE 'Prefix%').
Secondary use case: arbitrary LIKE searches against collated columns (LIKE '%Infix%').

Example:

CREATE TABLE test (name STRING COLLATE en_u_ks_level2);
INSERT INTO test(name) VALUES ('Alex' COLLATE en_u_ks_level2);
SELECT name FROM test WHERE name LIKE 'Ale%' COLLATE en_u_ks_level2; -- doesn't work!

Current output:

pq: unsupported comparison operator: <collatedstring{en_u_ks_level2}> LIKE <collatedstring{en_u_ks_level2}>

Prefix search is should-have. Arbitrary expressions are nice-to-have.

Workaround: it is possible to run range query instead of prefix LIKE search:

SELECT * FROM test WHERE (name >= 'Ale' COLLATE "en_u_ks_level2"  AND name < 'Alf' COLLATE "en_u_ks_level2");

However, the tricky part here is figuring out what should be the "next" symbol according to the collation rules ("e" -> "f" is easy, but could be different case, modifiers, etc).

On the other hand, it should be close to trivial for the CockroachDB to figure out that: once key is computed for the prefix, finding the "next" prefix is easy.

Jira issue: CRDB-5918

The text was updated successfully, but these errors were encountered:

idubrov · 2017-12-13T07:44:18Z

I looked into the code a little, and I think I was too optimistic about "trivial". It looks like even if you optimize for the prefix using proper scan, you still need a correct "LIKE" implementation for DCollatedString's.

However, "LIKE" uses a regexp to run the match and regexp package seem not to support collations.

On the other hand, it does not look like an impossible task to implement an algorithm specifically tuned for "LIKE" patterns, to run against binary key generated by the collator (although, simple conversion from pattern to collation key might not work -- I don't know what would happen to '%' and '_' -- so you might need to split pattern into pieces somehow).

knz · 2018-02-01T16:09:26Z

cc @awoods187 please consider roadmapping this for one of the upcoming releases.

github-actions · 2021-06-08T02:12:41Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

github-actions · 2023-09-26T11:09:42Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

knz · 2023-10-09T12:16:09Z

still relevant

This was referenced Dec 18, 2017

sql: support LIKE with collated strings #20811

Closed

sql: string builtins should also be defined for collated strings #20838

Open

This was referenced Feb 1, 2018

sql: LIKE does not support collated string #22294

Closed

sql,opt: support index constraints for LIKE queries on collated strings #21832

Closed

knz added this to the 2.1 milestone Feb 1, 2018

knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels Feb 1, 2018

knz changed the title ~~Support running LIKE operator against collated columns~~ sql: support running LIKE operator against collated columns (with suitable indexing) Feb 1, 2018

knz added the A-sql-optimizer SQL logical planning and optimizations. label May 9, 2018

andy-kimball added this to Lower Priority Backlog in BACKLOG, NO NEW ISSUES: SQL Optimizer Aug 25, 2018

petermattis removed this from the 2.1 milestone Oct 5, 2018

RaduBerinde moved this from Lower Priority Backlog to Functional issues in BACKLOG, NO NEW ISSUES: SQL Optimizer Apr 18, 2020

github-actions bot added the no-issue-activity label Jun 8, 2021

jordanlewis added A-sql-collated-strings and removed no-issue-activity labels Jun 8, 2021

jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021

github-actions bot added the no-issue-activity label Sep 26, 2023

github-actions bot added the X-stale label Oct 9, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 9, 2023

BACKLOG, NO NEW ISSUES: SQL Optimizer automation moved this from Functional issues to Done Oct 9, 2023

exalate-issue-sync bot closed this as completed Oct 9, 2023

knz added X-nostale Marks an issue/pr that should be ignored by the stale bot and removed X-stale no-issue-activity labels Oct 9, 2023

knz reopened this Oct 9, 2023

BACKLOG, NO NEW ISSUES: SQL Optimizer automation moved this from Done to Triage Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: support running `LIKE` operator against collated columns (with suitable indexing) #20666

sql: support running `LIKE` operator against collated columns (with suitable indexing) #20666

idubrov commented Dec 12, 2017 •

edited by cockroach-jira-scripts

idubrov commented Dec 13, 2017

knz commented Feb 1, 2018

github-actions bot commented Jun 8, 2021

github-actions bot commented Sep 26, 2023

knz commented Oct 9, 2023

sql: support running LIKE operator against collated columns (with suitable indexing) #20666

sql: support running LIKE operator against collated columns (with suitable indexing) #20666

Comments

idubrov commented Dec 12, 2017 • edited by cockroach-jira-scripts

idubrov commented Dec 13, 2017

knz commented Feb 1, 2018

github-actions bot commented Jun 8, 2021

github-actions bot commented Sep 26, 2023

knz commented Oct 9, 2023

sql: support running `LIKE` operator against collated columns (with suitable indexing) #20666

sql: support running `LIKE` operator against collated columns (with suitable indexing) #20666

idubrov commented Dec 12, 2017 •

edited by cockroach-jira-scripts