Remove EnglishStemAnalyzer #11301

LoayGhreeb · 2024-05-17T15:15:04Z

An issue occurs when using a custom analyzer I don't know why it behaves differently from the built-in analyzers in Lucene, even though the implementation is the same.

The issue occurs when searching with uppercase letters with wildcard characters. The custom analyzer does not ignore the case sensitivity as expected.

To reproduce the issue:

Use search in files.
Search using a query like "Kopp*".
Search using a query like "kopp*".
Both searches should yield the same results, but they do not.

Instead of using a custom analyzer, I used Lucene's EnglishAnalyzer, which is very similar to the custom analyzer we used. The main difference is that the custom analyzer uses DecimalDigitFilter.

I tried to use the exact same filters that the EnglishAnalyzer uses, but the same issue persists.

Lucene EnglishAnalyzer implementation:
https://github.com/apache/lucene/blob/2c81649e284b1a1f3a4b46fd589befc87306d0dc/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L102-L110

Mandatory checks

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

* upstream/main: Update latex citations status in JavaFx thread (#11302) Remove EnglishStemAnalyzer and use EnglishAnalyzer (#11301) Fix comment (#11299) Try gradle build speedup (#11300) Remove obsolete step (#11295) Bump com.fasterxml.jackson.dataformat:jackson-dataformat-yaml (#11290) Remove outdated pdf indexed files from Lucene index (#11293) Bump src/main/resources/csl-styles from `5338902` to `434df0a` (#11292) Bump org.mockito:mockito-core from 5.11.0 to 5.12.0 (#11291) Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#11289) Bump com.dlsc.gemsfx:gemsfx from 2.12.0 to 2.16.0 (#11287) Bump org.openrewrite.recipe:rewrite-recipe-bom from 2.9.0 to 2.11.0 (#11288) Introduce formatter to remove word-enclosing braces (#11253) Try parallel tests (#9797) Store preview divider pos in entry editor (#11285)

Remove EnglishStemAnalyzer and use EnglishAnalyzer

123b082

Siedlerchr approved these changes May 17, 2024

View reviewed changes

calixtus added this pull request to the merge queue May 17, 2024

Merged via the queue into JabRef:main with commit b12f65c May 17, 2024
21 checks passed

LoayGhreeb deleted the remove-english-analyzer branch May 17, 2024 21:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove EnglishStemAnalyzer #11301

Remove EnglishStemAnalyzer #11301

LoayGhreeb commented May 17, 2024

Remove EnglishStemAnalyzer #11301

Remove EnglishStemAnalyzer #11301

Conversation

LoayGhreeb commented May 17, 2024

Mandatory checks