-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeywordRepeatFilter + OpenNLPLemmatizer Early Exit #11771
Comments
https://ci-builds.apache.org/job/Lucene/job/Lucene-Check-9.x/3057/ Hmm... this patch applied to 9x fails the tests. Could you take a look at that, @kotman12 ? |
I can reproduce those failures with JDK11 but not with JDK17. I didn't look into this deeper. |
Very interesting .. will take a look. First thing that comes to mind is a bug with the output array creation in the actual test https://github.com/kotman12/lucene/blob/fix-sentence-iteration/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java#L335 This is especially suspicious https://github.com/kotman12/lucene/blob/fix-sentence-iteration/lucene/analysis/opennlp/src/test/org/apache/lucene/analysis/opennlp/TestOpenNLPLemmatizerFilterFactory.java#L337 .. I wonder if jdk17 has some optimization with empty string that it doesn't create new ones and thus that |
So this change seems to fix the test locally for me in branch 9x .. Created a PR for the upstream .. not sure how you want to handle the reversion in 9X branch |
If this code went in the main branch then it's also a bug there. Comparing strings by reference is a no-no - I should have caught it earlier. I'll do the update on both branches later today. |
Description
KeywordRepeatFilter + OpenNLPLemmatizer leads to arbitrarily early exit of token stream.
Steps to reproduce: run this test and notice how no text below this line from the test file gets analyzed.
The root cause appears to be an extraneous exit condition that doesn't play nicely with KeywordRepeatFilter.
This is related to the bug #11735 and is addressed by #11734
Version and environment details
latest version of lucene running jdk-17
The text was updated successfully, but these errors were encountered: