You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @oliversauter , this is not enough, as Chinese-like languages have often several UT8 characters. We should look in the field of natural language processing for existing proven approaches covering the main languages that are not latin based.
As described in this community post, non-latin characters (e.g. chinese, japanese) are not parsed correctly and thus are not searchable.
Potential solution: detect language of website, and if non-latin, split all characters before indexing?
The text was updated successfully, but these errors were encountered: