-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: support running LIKE
operator against collated columns (with suitable indexing)
#20666
Comments
I looked into the code a little, and I think I was too optimistic about "trivial". It looks like even if you optimize for the prefix using proper scan, you still need a correct "LIKE" implementation for DCollatedString's. However, "LIKE" uses a regexp to run the match and regexp package seem not to support collations. On the other hand, it does not look like an impossible task to implement an algorithm specifically tuned for "LIKE" patterns, to run against binary key generated by the collator (although, simple conversion from pattern to collation key might not work -- I don't know what would happen to '%' and '_' -- so you might need to split pattern into pieces somehow). |
LIKE
operator against collated columnsLIKE
operator against collated columns (with suitable indexing)
cc @awoods187 please consider roadmapping this for one of the upcoming releases. |
We have marked this issue as stale because it has been inactive for |
We have marked this issue as stale because it has been inactive for |
still relevant |
Primary use case: prefixed search against collated columns (
LIKE 'Prefix%'
).Secondary use case: arbitrary
LIKE
searches against collated columns (LIKE '%Infix%'
).Example:
Current output:
Prefix search is should-have. Arbitrary expressions are nice-to-have.
Workaround: it is possible to run range query instead of prefix
LIKE
search:However, the tricky part here is figuring out what should be the "next" symbol according to the collation rules ("e" -> "f" is easy, but could be different case, modifiers, etc).
On the other hand, it should be close to trivial for the CockroachDB to figure out that: once key is computed for the prefix, finding the "next" prefix is easy.
Jira issue: CRDB-5918
The text was updated successfully, but these errors were encountered: