You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not very clear how indexes are serialized on disk in terms of char encoding (see there), but it seems to me it could be UTF-8 and not UTF-16. In this case, having indexes in ewts would divide the size of the on-disk indexes by 2. First the situation should be made more clear, but if this is correct, index in ewts should be relatively easy to implement, although they'll make the indexing a bit slower. This could certainly be done after the tokenizer, in a separate filter. It's quite important that the ewts string is first converted into unicode and then back into ewts, so that it's normalized.
The text was updated successfully, but these errors were encountered:
It's not very clear how indexes are serialized on disk in terms of char encoding (see there), but it seems to me it could be UTF-8 and not UTF-16. In this case, having indexes in ewts would divide the size of the on-disk indexes by 2. First the situation should be made more clear, but if this is correct, index in ewts should be relatively easy to implement, although they'll make the indexing a bit slower. This could certainly be done after the tokenizer, in a separate filter. It's quite important that the ewts string is first converted into unicode and then back into ewts, so that it's normalized.
The text was updated successfully, but these errors were encountered: