Fast case insensitive sort order and regex #149

niklas88 · 2018-11-13T13:42:05Z

We need a way to optimize case insensitive regex filters like FILTER regex(?var, "astr", "i") matching of Astronaut.

This could be solved by always doing a case insensitive sort but I think that might be against the spec (e.g. this SO answer suggests it should be case sensitive). Then again looking at the spec and the linked xpath function it seems to me, that we are free to use any default collation strategy which would allow us to do case-insensitive sort.

The text was updated successfully, but these errors were encountered:

niklas88 · 2019-04-16T10:17:04Z

This is now supported through building a case insensitive index with #209. It remains an open issue to be able to use both case sensitive and case insensitive prefix search with the same index

joka921 · 2019-12-10T12:21:08Z

I have been giving this some thought:
What you are missing is easy to implement, because the case-sensitive Prefix filter is a subset
of the case-insensitive prefix filter (If you sort the values that are the same when ignoring the case according to their casing. Every useful collation strategy should ensure this).

However I would also like to support diacritic-agnostic filtering and I in general don't like my current solution (It is really hacky with Uppercasing and lowercasing).

What I would really like to do is introduce boost_locale as a dependency which correctly supports all the features we would need to do all this collation stuff properly.
IMHO this dependency is justified since
a) Properly handling international (Including handling german Umlauts etc.) should be a priority.
b) Some of the things we need (e.g. find the range of all strings that have the same prefix) cannot be done by std::locale in a way that is correct and portable.

(Some background: Unicode exactly supports what we want, a collation that sorts first by the character 'value' (e) , then by added accents(ée) and then by the case (EeÉé). std::locale does not support only performing the first one or two of those comparison steps since it provides a generic interface that also has to support non-unicode locales.

joka921 · 2021-04-19T15:04:07Z

This has finally been solved for some time now using the ICU library

niklas88 assigned niklas88, joka921 and floriankramer Nov 13, 2018

niklas88 added this to To do in QLever Dec 5, 2018

niklas88 mentioned this issue Apr 16, 2019

The prefix filter is being used for case insensitive queries #170

Closed

niklas88 added the enhancement label Apr 17, 2019

joka921 closed this as completed Apr 19, 2021

QLever automation moved this from To do to Done Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast case insensitive sort order and regex #149

Fast case insensitive sort order and regex #149

niklas88 commented Nov 13, 2018 •

edited

niklas88 commented Apr 16, 2019

joka921 commented Dec 10, 2019

joka921 commented Apr 19, 2021

Fast case insensitive sort order and regex #149

Fast case insensitive sort order and regex #149

Comments

niklas88 commented Nov 13, 2018 • edited

niklas88 commented Apr 16, 2019

joka921 commented Dec 10, 2019

joka921 commented Apr 19, 2021

niklas88 commented Nov 13, 2018 •

edited