ewts indexes? #12

eroux · 2017-10-29T14:55:44Z

It's not very clear how indexes are serialized on disk in terms of char encoding (see there), but it seems to me it could be UTF-8 and not UTF-16. In this case, having indexes in ewts would divide the size of the on-disk indexes by 2. First the situation should be made more clear, but if this is correct, index in ewts should be relatively easy to implement, although they'll make the indexing a bit slower. This could certainly be done after the tokenizer, in a separate filter. It's quite important that the ewts string is first converted into unicode and then back into ewts, so that it's normalized.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ewts indexes? #12

ewts indexes? #12

eroux commented Oct 29, 2017

ewts indexes? #12

ewts indexes? #12

Comments

eroux commented Oct 29, 2017