Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filters - Cyrillic search not working #8914

Closed
recived opened this issue Jun 23, 2023 · 7 comments · Fixed by #8953 or #8958
Closed

Filters - Cyrillic search not working #8914

recived opened this issue Jun 23, 2023 · 7 comments · Fixed by #8953 or #8958
Labels
bug component: Discovery Related to Discovery search or browse system
Milestone

Comments

@recived recived added bug needs triage New issue needs triage and/or scheduling labels Jun 23, 2023
@toniprieto
Copy link
Contributor

Investigating this issue I have seen that the problem is the use of a regular expression pattern that does not work by default with some Unicode characters.

https://github.com/DSpace/DSpace/blob/main/dspace-api/src/main/java/org/dspace/discovery/indexobject/ItemIndexFactoryImpl.java#L848

Adding a flag Pattern.UNICODE_CHARACTER_CLASS could solve the issue:

Pattern pattern = Pattern.compile("\\b\\w+\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);

@recived recived closed this as completed Jul 12, 2023
@tdonohue
Copy link
Member

@toniprieto : Could you submit a PR with that change? It sounds like that may have solved the issue that @recived was seeing? If that's the solution, it'd be good to add this into out-of-the-box DSpace.

@recived
Copy link
Author

recived commented Jul 13, 2023

@toniprieto : Could you submit a PR with that change? It sounds like that may have solved the issue that @recived was seeing? If that's the solution, it'd be good to add this into out-of-the-box DSpace.

I checked the problem is solved.

@tdonohue
Copy link
Member

tdonohue commented Jul 13, 2023

@recived : What solved the problem for you? Was it fixed in a recent release of DSpace 7? Was it just a configuration error on your end?

@toniprieto
Copy link
Contributor

@tdonohue I assume @recived means the change solves the issue. I have just submitted the PR, if it has been resolved in another way, it can be closed

@tdonohue
Copy link
Member

tdonohue commented Jul 14, 2023

I was able to reproduce this on main. It's definitely still a bug & it exists in 7.6.

To reproduce:

  1. Add an author with Cyrillic characters to an existing Item: E.g. Иванов Иван Иванови
  2. Search for that Item. In the "Author" filter section, use the "Search author name" box and enter in Иванов. This should suggest that Cyrillic author, but it will not (this is the bug).

Verified that #8953 fixes the issue.

@tdonohue tdonohue reopened this Jul 14, 2023
@tdonohue tdonohue added component: Discovery Related to Discovery search or browse system and removed needs triage New issue needs triage and/or scheduling labels Jul 14, 2023
@tdonohue tdonohue added this to the 7.6.1 milestone Jul 14, 2023
@tdonohue
Copy link
Member

Ported to 7.x for 7.6.1 in #8958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment