Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

International search #170

Open
TWiStErRob opened this issue Jun 13, 2020 · 10 comments · Fixed by #359 · May be fixed by #357
Open

International search #170

TWiStErRob opened this issue Jun 13, 2020 · 10 comments · Fixed by #359 · May be fixed by #357
Labels
a:feature new feature or request in:search search entry, search results is:need-broadcast
Milestone

Comments

@TWiStErRob
Copy link
Owner

TWiStErRob commented Jun 13, 2020

Cyrillic characters are not searchable because they're probably tokenized wrong by FTS.

The "unicode61" tokenizer is available beginning with SQLite version 3.7.13 (2012-06-11).
-- https://www.sqlite.org/fts3.html#tokenizer

And has become the default in Android 5.0 (API 21) SQLite 3.8.6:

The unicode61 tokenizer is now included in FTS4 by default.
-- https://www.sqlite.org/releaselog/3_8_6.html

According to https://stackoverflow.com/a/4377116/253468

SQLite 3.8.6: 21-5.0-Lollipop
SQLite 3.7.11: 19-4.4-KitKat

Not working:

  • Russian
  • Lithuanian š, č
  • Hungarian accents
  • Greek?
  • Japanese?
  • Chinese?
@TWiStErRob TWiStErRob added a:feature new feature or request in:search search entry, search results labels Jun 13, 2020
@TWiStErRob TWiStErRob added this to the v1.2 milestone Jun 13, 2020
@TWiStErRob
Copy link
Owner Author

TWiStErRob commented Jun 13, 2020

Search can't seem to find words with lithuanian letters, like š, č and others. I'm guessing it's the same with all non-latin letters.
-- https://mail.google.com/mail/u/0/#inbox/FMfcgxvzLDrxJNnThJwBqKTBjnmPVVKk

@razumeiko
Copy link

Hi @TWiStErRob . Thanks for the app, it's really great! Just curious, are you planning to fix cyrillic search soon? Or it's not in near feature plans? This is really critical that you are unable to search in Cyrillic . Thanks!

@TWiStErRob TWiStErRob modified the milestones: v1.3.0, v1.2.1 Sep 3, 2023
@TWiStErRob
Copy link
Owner Author

TWiStErRob commented Sep 3, 2023

@razumeiko I have prepared for this change for many months now, getting closer.

I'm sorry to say, but I'm disabling distribution in Cyrillic script countries until this is fixed. Because I'm getting too many bad reviews for a feature that's listed explicitly as not available in the Play Store description. Existing installs will stay. Sideloading from e.g. apkpure is still possible.

Most recent 1*:

Add app translation as well as search in other languages.
-- https://play.google.com/console/u/0/developers/7995455198986011414/app/4974852622245161228/user-feedback/review-details?reviewId=08a37b19-e1e8-4c4e-b175-56ff0788ac58&corpus=PUBLIC_REVIEWS

@TWiStErRob
Copy link
Owner Author

@razumeiko can you please help me a bit? I just double-checked this and Cyrillic character search works just fine since the first version (this screenshot is from my original first published version):
image

although it doesn't handle upper-case (П) and lower-case (п) characters equal

image

Can you please send me some item names, and search queries that you would expect to work differently?

@razumeiko
Copy link

razumeiko commented Oct 17, 2023

Hey, @TWiStErRob , here is what I found.
So here is the room I created with 5 items, I tried different names with spaces and same text, but some of them starts from the uppercase letters and some of them not, both Cyrillic(Russian/Ukranian) and Latin(English).
image

Interesting, the search works but only if the word is lowercase. You can see if I have item with three words with part "тест", one upper and all other lower cases, search will find this lower cased but will not find same word with upper
image

I tested the same experiment with English words, and they works as expected, here search is case-insensitive.
image

Also it does not matter if you are trying to search exactly as it is with uppercase, if the word has uppercase it will not go to the search no matter how you write it in the search box.

@TWiStErRob
Copy link
Owner Author

Thanks @razumeiko! I see why people say it's "bad". Search only works with non-latin if the whole inventory has lower-case only name, but the "Item Name" field automatically starts with an upper-case letter, so people won't do this, unless they intentionally want to.

The search engine I used only supported ASCII (latin) case-insensitive search (old Android). The new one is Unicode, so it knows how to map all scripts.

Oddly when searching the search query was lower-cased correctly, I probably do that manually somewhere in the code, I'll have to remove that to make it consistent.

I got the fix for this (just using a better search engine); left: bad, right: fixed.

image

@razumeiko
Copy link

Nice! Waiting for this update. Thanks for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:feature new feature or request in:search search entry, search results is:need-broadcast
Projects
None yet
2 participants