Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove EnglishStemAnalyzer #11301

Merged
merged 1 commit into from
May 17, 2024
Merged

Conversation

LoayGhreeb
Copy link
Collaborator

An issue occurs when using a custom analyzer I don't know why it behaves differently from the built-in analyzers in Lucene, even though the implementation is the same.

The issue occurs when searching with uppercase letters with wildcard characters. The custom analyzer does not ignore the case sensitivity as expected.

To reproduce the issue:

  1. Use search in files.
  2. Search using a query like "Kopp*".
  3. Search using a query like "kopp*".
  4. Both searches should yield the same results, but they do not.

Instead of using a custom analyzer, I used Lucene's EnglishAnalyzer, which is very similar to the custom analyzer we used. The main difference is that the custom analyzer uses DecimalDigitFilter.

I tried to use the exact same filters that the EnglishAnalyzer uses, but the same issue persists.

Lucene EnglishAnalyzer implementation:
https://github.com/apache/lucene/blob/2c81649e284b1a1f3a4b46fd589befc87306d0dc/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L102-L110

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@calixtus calixtus added this pull request to the merge queue May 17, 2024
Merged via the queue into JabRef:main with commit b12f65c May 17, 2024
21 checks passed
@LoayGhreeb LoayGhreeb deleted the remove-english-analyzer branch May 17, 2024 21:26
Siedlerchr added a commit that referenced this pull request May 19, 2024
* upstream/main:
  Update latex citations status in JavaFx thread (#11302)
  Remove EnglishStemAnalyzer and use EnglishAnalyzer (#11301)
  Fix comment (#11299)
  Try gradle build speedup (#11300)
  Remove obsolete step (#11295)
  Bump com.fasterxml.jackson.dataformat:jackson-dataformat-yaml (#11290)
  Remove outdated pdf indexed files from Lucene index (#11293)
  Bump src/main/resources/csl-styles from `5338902` to `434df0a` (#11292)
  Bump org.mockito:mockito-core from 5.11.0 to 5.12.0 (#11291)
  Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#11289)
  Bump com.dlsc.gemsfx:gemsfx from 2.12.0 to 2.16.0 (#11287)
  Bump org.openrewrite.recipe:rewrite-recipe-bom from 2.9.0 to 2.11.0 (#11288)
  Introduce formatter to remove word-enclosing braces (#11253)
  Try parallel tests (#9797)
  Store preview divider pos in entry editor (#11285)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants