-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lowercase search does not find non-ASCII uppercase pages and vice versa #8375
Comments
Recently I've see a fix for a similar issue #5263 that might also might fix your problem (although you mentioned also trunk with the problem), though it is hard to tell without an example (I don't have a good example at hand with Cyrillic search terms).
|
Searching for "ОСН", "ПРИ", "При" and "проч" works (was fixed in #5263, thanks!) Attaching a self contained example, all the info is duplicated inside in |
It was at first a bit confusing as the output language was still English, but the problem here lies with the names of the sections in the related pages. |
The problem looks like a bit more a fundamental problem when writing (a.o. search/all*.js, search/pages*.js, search/searchdata.js) it looks like the translation to lowercase is not done as it is done for the Latin alphabet / ASCII |
… pages and vice versa Implementation of a uppercase / lowercase conversion as needed by doxygen. The standard tolower / toupper functions don't really work as they need a "locale" which in general is not necessary for Unicode / UTF8 conversions. - caseconvert.cpp / caseconvert.h generated code based on the table from https://www.unicode.org/Public/13.0.0/ucd/UnicodeData.txt with some small modifications regarding uppercase values that shouldn't have a lowercase representation (Kelvin sign) or combined characters where there is no 100% one to one relation between uppercase and lowercase due to some mix (e.g. DZ, Dz and dz). - util.cpp / searchengine.cpp using the new functions - search.js to old "workaround" is not necessary anymore (see issue doxygen#5263)
I've just pushed a proposed patch, pull request #8409 |
@apolukhin Please verify if commit a4ecbee fixes the problem for you. |
As far as I can see it does not work. Example: example.tar.gz Here we have the source to generate the html pages and the directories:
When going to the related pages (for easy cut and paste) an cutting the text and pasting the text into the search bar:
|
@albert-github Fixed two issues:
Let me know if you see more issues. |
|
…s correctly The problem is that that "_" is seen as an Id character and not is escaped for JS search. This is a regression on: ``` Commit: a4ecbee [a4ecbee] Date: Monday, March 22, 2021 8:02:06 PM issue doxygen#8375: Lowercase search does not find non-ASCII uppercase pages and vice versa ``` and ``` Commit: 3a365ab [3a365ab] Date: Wednesday, March 24, 2021 8:34:50 PM issue doxygen#8375 Lowercase search does not find non-ASCII uppercase pages and vice versa (part 2) ```
@doxygen yep, latest master works like a charm. Many thanks! |
This issue was previously marked 'fixed but not released', |
This is a regression on doxygen#8375, the `substr` function requires a length and not an end position. Problem was found when looking at doxygen#3244
Describe the bug
Searching "привет" while the page title is "Привет" does not work.
Expected behavior
For ASCII everything works fine: searching for "faq" finds page "FAQ". The same behavior expected for non ASCII pages.
Screenshots
To Reproduce
# Привет
,# Основные сведения
,# прочее
and# Введение
.SEARCHENGINE=YES
andSERVER_BASED_SEARCH=NO
при
,ПРИВ
,основ
,ПРОЧЕЕ
,вВеден
Version
1.9.1 and trunk.
The text was updated successfully, but these errors were encountered: