Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringsToAutomaton#build to take List as parameter instead of Collection #12427

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

shubhamvishu
Copy link
Contributor

@shubhamvishu shubhamvishu commented Jul 9, 2023

Description

 
Closes #12319

@gsmiller
Copy link
Contributor

Thanks @shubhamvishu for working on this! As I look at the PR, I'm wondering if accepting a List is really the proper thing to do here. If users already have a sorted Collection (like SortedSet), are we forcing them to make unnecessary copies of their data to adhere to our API?

I wonder if the ctor for TermInSetQuery might serve as a reasonable example of handling sortedness? TermInSetQuery accepts a Collection but tries to see if it's already sorted (and de-duped). If it is, it uses it as-is. If not, it sorts and de-dupes on behalf of the user. The automaton building logic right now only has a simple assert down in the internals, which could be disabled depending on the runtime environment, resulting in tricky debugging if the user didn't handle sorting their data properly.

Since StringsToAutomaton is an internal class, maybe one approach would be to leave it untouched, letting it expect pre-sorted data and having a simple assert statement as it does today to check that (for easier debugging), but add sorting logic to Automata#makeStringUnion? What if that method accepted a Collection as it does today but had similar logic as the TermInSetQuery ctor related to sorting?

Apologies for leading us in the wrong direction in the issue I opened. I think a List is a bit more semantically appropriate than a Collection, but I'm not sure it's best from a pragmatic standpoint? What do you think?

@shubhamvishu
Copy link
Contributor Author

@gsmiller I totally agree its not helpful if the input/data Collection is sorted since its a unnecessary overhead to convert to a List here. As per TermInSetQuery ctor its checking if the collection is of SortedSet type and uses natural ordering. But what if the user is explicitly passing a List (or any Collection for that matter) which is already sorted but is not SortedSet?
Could we expose another #build function as build(Collection<BytesRef> input, boolean asBinary, boolean isSorted) or for better maybe makeStringUnion(Collection<BytesRef> utf8Strings, boolean isSorted) (similar to the one in tests in this PR) to completely bypass the sorting/checks if incase isSorted == true else we could have the same check as in TermInSetquery ctor and sort the terms incase they are not before moving on in Automata#makeStringUnion as you mentioned? Let me know your thoughts?

@gsmiller
Copy link
Contributor

Yeah, good points/questions! I'd be curious how much overhead it would actually add to sort the input when it's already sorted?

But to take a step back for a moment, we also have automata building methods that accept a BytesRefIterator as well, which also must be sorted. This is a situation where we really cannot sort on behalf of the caller, so it might be a bit confusing/trappy to sort some flavors of this method but not others? Maybe it's best to leave these methods as they are?

If we want to make these functions a bit more user-friendly, we could look at changing the assert on line 276 of StringsToAutomaton to throw an explicit IllegalArgumentException so that we don't silently built a corrupt automaton on unordered input (with asserts disabled). There would add overhead since we have to now keep track of the previous term all the time, but maybe it's worth benchmarking and considering this change? I do think it's best to explicitly let the user know they passed invalid input in a case like this, so it would be a nice change if it didn't introduce a significant performance drag.

@gautamworah96
Copy link
Contributor

I was looking at this problem too and the change in test classes to meld arbitrary collections to Lists irked me as well.

+1 to throw an IllegalArgumentException in #add instead of having the user debug the error later on.

On a related note, since we only traverse the items in their given order, maybe we can relax StringsToAutomaton#build(Collection<BytesRef>, boolean) to StringsToAutomaton#build(Iterable<BytesRef> input, boolean). This change will also make it more consistent with the build(BytesRefIterator input, boolean asBinary) method.

@shubhamvishu
Copy link
Contributor Author

This is a situation where we really cannot sort on behalf of the caller, so it might be a bit confusing/trappy to sort some flavors of this method but not others? Maybe it's best to leave these methods as they are?

Agreed @gsmiller! Yes it does seem maybe its better leave this as is.

we could look at changing the assert on line 276 of StringsToAutomaton to throw an explicit IllegalArgumentException so that we don't silently built a corrupt automaton on unordered input (with asserts disabled). There would add overhead since we have to now keep track of the previous term all the time, but maybe it's worth benchmarking and considering this change?

I like the idea to throw IAE if wrong input is provided. I think this would only affect the cases with assert disabled? otherwise with asserts enabled we anyways always keep track of previous term.

maybe we can relax StringsToAutomaton#build(Collection, boolean) to StringsToAutomaton#build(Iterable input, boolean). This change will also make it more consistent with the build(BytesRefIterator input, boolean asBinary) method.

Good point @gautamworah96. Iterable seems like a better way instead of Collection and would indeed be consistent with other method.

@shubhamvishu
Copy link
Contributor Author

On the same note, since both the methods expects Iterable or Iterators why do we even need 2 separate methods here which are doing exactly the same thing i.e. iterating over the ByteRef's and adding to automaton.

It would have been much better if BytesRefIterator implemented the Iterable, Iterator interfaces in which case we could just have one method StringsToAutomaton#build(Iterable) which takes an Iterable. I don't see why do we even have a separate BytesRefIterator interface and not just using Iterator instead(maybe because its legacy code?) or is it that I'm missing something important here?

@gsmiller
Copy link
Contributor

I like the idea to throw IAE if wrong input is provided. I think this would only affect the cases with assert disabled? otherwise with asserts enabled we anyways always keep track of previous term.

That's right. I imagine a lot of users disable asserts when running in a production environment for performance reasons, so the only concern here is that we're adding some new work for them to do this validation. I still think the explicit validation is better in this situation, but the counter-argument would be a user that knows for sure they're always properly providing sorted input and don't want the overhead of the validation.

As for Iterable vs. Collection, I agree that Iterable is more appropriate for the API. Whether-or-not it's worth making that change though in a back-compat way is maybe a different question?

I don't see why do we even have a separate BytesRefIterator interface and not just using Iterator instead(maybe because its legacy code?) or is it that I'm missing something important here?

I think it's so that next can throw IOException? You can't do that with Iterator<BytesRef>.

@shubhamvishu
Copy link
Contributor Author

I still think the explicit validation is better in this situation, but the counter-argument would be a user that knows for sure they're always properly providing sorted input and don't want the overhead of the validation.

Agreed! As we discussed above one way could be to sort based on user input that if the collection is sorted or not but since that will be for only one flavour it might be confusing. I'll try to benchmark this change to see if its causing any regression or not.

I think it's so that next can throw IOException? You can't do that with Iterator.

Ahh I see. That makes sense. Thanks @gsmiller !

@shubhamvishu
Copy link
Contributor Author

@gsmiller I finally got a chance to run the benchmarks for this change. Below are the results(looks all good to me). Let me know what do you think? Thanks!

                  TaskQPS baseline  StdDevQPS my_modified_version      StdDev      Pct diff             p-value
                      AndHighMed       96.52      (6.5%)       95.49      (6.3%)   -1.1% ( -12% -   12%) 0.597
                     LowSpanNear      133.61      (4.7%)      132.45      (4.0%)   -0.9% (  -9% -    8%) 0.528
                       OrHighMed       67.95      (3.2%)       67.37      (4.6%)   -0.9% (  -8% -    7%) 0.492
                     MedSpanNear        5.79      (5.2%)        5.74      (4.2%)   -0.8% (  -9% -    8%) 0.573
                    OrNotHighLow      322.30      (2.3%)      319.70      (2.0%)   -0.8% (  -4% -    3%) 0.233
            BrowseDateSSDVFacets        0.90      (8.2%)        0.89      (9.4%)   -0.8% ( -17% -   18%) 0.773
                      AndHighLow      355.80      (4.2%)      352.99      (4.1%)   -0.8% (  -8% -    7%) 0.550
            MedTermDayTaxoFacets        5.85      (6.2%)        5.81      (5.7%)   -0.7% ( -11% -   11%) 0.713
                    OrNotHighMed      222.70      (3.6%)      221.51      (4.0%)   -0.5% (  -7% -    7%) 0.657
                    OrHighNotMed      194.41      (3.6%)      193.40      (3.4%)   -0.5% (  -7% -    6%) 0.641
                       MedPhrase       44.71      (3.1%)       44.50      (3.4%)   -0.5% (  -6% -    6%) 0.639
                   OrNotHighHigh      179.60      (3.7%)      178.85      (3.8%)   -0.4% (  -7% -    7%) 0.726
                    HighSpanNear        8.33      (4.6%)        8.30      (4.1%)   -0.4% (  -8% -    8%) 0.765
                    OrHighNotLow      331.53      (3.8%)      330.15      (3.7%)   -0.4% (  -7% -    7%) 0.728
                   OrHighNotHigh      248.18      (3.5%)      247.20      (3.4%)   -0.4% (  -7% -    6%) 0.718
             MedIntervalsOrdered        4.66      (4.6%)        4.64      (4.3%)   -0.4% (  -8% -    8%) 0.780
          OrHighMedDayTaxoFacets        4.39      (3.9%)        4.38      (4.2%)   -0.3% (  -8% -    8%) 0.820
                       OrHighLow      569.67      (2.6%)      568.04      (2.6%)   -0.3% (  -5% -    5%) 0.733
             LowIntervalsOrdered        2.64      (4.6%)        2.64      (4.6%)   -0.2% (  -8% -    9%) 0.899
            HighTermTitleBDVSort        5.20      (4.0%)        5.19      (4.1%)   -0.2% (  -7% -    8%) 0.895
                      OrHighHigh       35.47      (4.4%)       35.42      (5.8%)   -0.1% (  -9% -   10%) 0.929
       BrowseDayOfYearTaxoFacets        3.80     (15.9%)        3.80     (16.0%)   -0.1% ( -27% -   37%) 0.984
            BrowseDateTaxoFacets        3.78     (15.5%)        3.78     (15.5%)   -0.0% ( -26% -   36%) 0.995
                     AndHighHigh       16.35      (3.6%)       16.35      (7.5%)    0.0% ( -10% -   11%) 0.996
         AndHighMedDayTaxoFacets       49.40      (1.6%)       49.41      (1.7%)    0.0% (  -3% -    3%) 0.975
                       LowPhrase      254.05      (3.6%)      254.16      (4.0%)    0.0% (  -7% -    7%) 0.972
                      HighPhrase       40.53      (3.1%)       40.56      (3.6%)    0.1% (  -6% -    6%) 0.939
                        PKLookup      131.32      (2.3%)      131.44      (2.0%)    0.1% (  -4% -    4%) 0.891
     BrowseRandomLabelTaxoFacets        3.28     (15.0%)        3.28     (14.8%)    0.1% ( -25% -   35%) 0.983
           BrowseMonthTaxoFacets        4.13     (34.7%)        4.13     (34.7%)    0.1% ( -51% -  106%) 0.991
                          Fuzzy1       65.72      (1.1%)       65.81      (1.2%)    0.1% (  -2% -    2%) 0.714
                         LowTerm      513.23      (2.9%)      513.99      (2.9%)    0.1% (  -5% -    6%) 0.870
            HighIntervalsOrdered        1.62      (5.3%)        1.62      (5.0%)    0.2% (  -9% -   11%) 0.909
       BrowseDayOfYearSSDVFacets        3.75     (12.4%)        3.76      (8.6%)    0.2% ( -18% -   24%) 0.956
                         MedTerm      382.67      (3.7%)      383.49      (3.9%)    0.2% (  -7% -    8%) 0.857
                 LowSloppyPhrase       12.31      (2.4%)       12.34      (2.1%)    0.2% (  -4% -    4%) 0.754
                         Respell       36.92      (1.6%)       37.02      (1.6%)    0.3% (  -2% -    3%) 0.617
                 MedSloppyPhrase       22.87      (3.2%)       22.94      (3.0%)    0.3% (  -5% -    6%) 0.751
                        HighTerm      251.80      (4.3%)      252.92      (4.6%)    0.4% (  -8% -    9%) 0.752
        AndHighHighDayTaxoFacets        7.39      (2.7%)        7.43      (2.6%)    0.5% (  -4% -    5%) 0.571
                          Fuzzy2       39.80      (1.4%)       40.02      (1.8%)    0.5% (  -2% -    3%) 0.288
                HighSloppyPhrase        1.88      (4.3%)        1.89      (4.2%)    0.6% (  -7% -    9%) 0.657
                        Wildcard       39.56      (3.9%)       39.82      (2.7%)    0.7% (  -5% -    7%) 0.537
                         Prefix3       77.73      (6.4%)       78.39      (5.8%)    0.8% ( -10% -   13%) 0.663
               HighTermMonthSort     2196.21      (3.5%)     2217.20      (3.5%)    1.0% (  -5% -    8%) 0.389
           HighTermDayOfYearSort      200.38      (3.6%)      203.21      (3.2%)    1.4% (  -5% -    8%) 0.186
           BrowseMonthSSDVFacets        4.46     (13.1%)        4.53      (9.4%)    1.5% ( -18% -   27%) 0.687
     BrowseRandomLabelSSDVFacets        2.68      (7.7%)        2.73      (4.5%)    1.6% (  -9% -   14%) 0.421
                          IntNRQ       17.08      (4.8%)       17.41      (7.4%)    1.9% (  -9% -   14%) 0.335
                      TermDTSort      102.14      (2.9%)      104.31      (5.5%)    2.1% (  -6% -   10%) 0.127
               HighTermTitleSort       38.07     (11.1%)       39.53      (9.0%)    3.8% ( -14% -   26%) 0.228

@mikemccand
Copy link
Member

Below are the results(looks all good to me). Let me know what do you think? Thanks!

+1 -- looks like just noise to me.

@gsmiller
Copy link
Contributor

+1 looks good to me as well. I like that this small change, 1) makes the API a little more general, allowing users to provide any Iterable instead of Collection, and 2) adds an explicit check for expected ordering of the iterable using IAE instead of relying on asserts being enabled. Thanks for following up!

@gsmiller gsmiller merged commit 9cffd4b into apache:main Oct 30, 2023
4 checks passed
gsmiller pushed a commit that referenced this pull request Oct 30, 2023
…rted (#12427)

Also update the API to accept more general Iterable<BytesRef> instead of Collection<BytesRef>
@gsmiller gsmiller added this to the 9.9.0 milestone Oct 31, 2023
@benwtrent
Copy link
Member

Something weird is happening. This commit is causing the following failure:

./gradlew test --tests TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource -Dtests.seed=9E4091CE5D400BA3

I am starting to investigate, but folks involved with this change might find the issue quicker.

  >     java.lang.NoSuchMethodError: 'org.apache.lucene.util.automaton.Automaton org.apache.lucene.util.automaton.Automata.makeStringUnion(java.util.Collection)'
   >         at org.apache.lucene.search.uhighlight.CharArrayMatcher.fromTerms(CharArrayMatcher.java:42)
   >         at org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.buildCombinedAutomaton(MemoryIndexOffsetStrategy.java:71)
   >         at org.apache.lucene.search.uhighlight.MemoryIndexOffsetStrategy.<init>(MemoryIndexOffsetStrategy.java:53)
   >         at org.apache.lucene.search.uhighlight.UnifiedHighlighter.getOffsetStrategy(UnifiedHighlighter.java:1238)
   >         at org.apache.lucene.search.uhighlight.UnifiedHighlighter.getFieldHighlighter(UnifiedHighlighter.java:1072)
   >         at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:880)
   >         at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:814)
   >         at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:792)
   >         at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlight(UnifiedHighlighter.java:725)
   >         at org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource(TestUnifiedHighlighterTermIntervals.java:607)
   >         at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
   >         at java.base/java.lang.Thread.run(Thread.java:1583)
   >
   >     java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 7 open files: {_0_Lucene90_0.pos=1, _0_Lucene90_0.doc=1, _0_Lucene90_0.tim=1, _0.nvd=1, _0.fdx=1, _0.fdt=1, _0_Lucene90_0.tip=1}
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:876)
   >         at org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermIntervals.doAfter(TestUnifiedHighlighterTermIntervals.java:84)
   >         at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   >         at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:1004)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   >         at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   >         at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
   >         at java.base/java.lang.Thread.run(Thread.java:1583)
   >
   >         Caused by:
   >         java.lang.RuntimeException: unclosed IndexInput: _0.nvd
   >             at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:783)
   >             at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.store.MockDirectoryWrapper.openInput(MockDirectoryWrapper.java:835)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90NormsProducer.<init>(Lucene90NormsProducer.java:84)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.codecs.lucene90.Lucene90NormsFormat.normsProducer(Lucene90NormsFormat.java:96)
   >             at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.codecs.asserting.AssertingNormsFormat.normsProducer(AssertingNormsFormat.java:46)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:108)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:95)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:180)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:222)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.lambda$getReader$0(IndexWriter.java:539)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:138)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:601)
   >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter$4.getReader(IndexWriter.java:6510)
   >             at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.index.RandomIndexWriter.getReader(RandomIndexWriter.java:508)
   >             at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.index.RandomIndexWriter.getReader(RandomIndexWriter.java:426)
   >             at org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermIntervals.testCustomFieldValueSource(TestUnifiedHighlighterTermIntervals.java:581)
   >             at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   >             at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   >             at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   >             at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   >             at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   >             at randomizedtesting.runner@2.8.1/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)

@gsmiller
Copy link
Contributor

gsmiller commented Nov 1, 2023

@benwtrent is this an issue of the code only being partially rebuilt? The signature for Automata#makeStringUnion was changed to accept a more general Iterable as opposed to a Collection in this change. Let me see if I can reproduce it locally as well.

UPDATE: It didn't repro for me locally. Have you tried a ./gradlew clean and then rerunning the test?

@benwtrent
Copy link
Member

@gsmiller you are 100% correct 🤦

@gsmiller
Copy link
Contributor

gsmiller commented Nov 1, 2023

@benwtrent glad it was a simple fix. Sorry it created churn!

@dweiss
Copy link
Contributor

dweiss commented Nov 1, 2023

See #12742 - it's a problem with our javac task settings. Running a clean is a workaround. I will provide a fix for this when I get a chance, it's not a critical issue (but it is a bug in how we currently set up task dependencies).

@shubhamvishu
Copy link
Contributor Author

Nice!....I like it how we discovered a build issue bug(which was hidden for sometime now I think) and fixed it. Thanks for solving @dweiss!

rmuir pushed a commit to rmuir/lucene that referenced this pull request Nov 2, 2023
…rted (apache#12427)

Also update the API to accept more general Iterable<BytesRef> instead of Collection<BytesRef>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DaciukMihovAutomatonBuilder#build should probably take a List instead of a Collection
6 participants