Skip to content

Conversation

@shubhamsrkdev
Copy link
Contributor

@shubhamsrkdev shubhamsrkdev commented Nov 27, 2025

Description

Remove the lambda from TermStates.get(LeafReaderContext). The method now eagerly resolves the term lookup. I am not sure if this change would expressly benefit everyone, feedback here would be greatly appreciated.

Benchmarks

The results looks like noise as the p-value is high for all of them.

               TaskQPS.         baseline   StdDevQPS    candidate     StdDev   Pct-diff             p-value
             MedSloppyPhrase       201.59     (7.2%)      194.49      (8.0%)   -3.5% ( -17% -   12%) 0.141
             LowSloppyPhrase      176.82      (5.6%)      170.78      (7.9%)   -3.4% ( -15% -   10%) 0.114
                  TermDTSort      319.06      (3.6%)      308.68      (7.1%)   -3.3% ( -13% -    7%) 0.068
 BrowseRandomLabelSSDVFacets        3.31     (12.4%)        3.24     (10.6%)   -2.2% ( -22% -   23%) 0.551
       HighTermDayOfYearSort      359.86      (3.3%)      352.33      (5.8%)   -2.1% ( -10% -    7%) 0.159
        HighIntervalsOrdered       66.60      (8.1%)       65.37      (9.9%)   -1.9% ( -18% -   17%) 0.517
         LowIntervalsOrdered       88.21      (6.3%)       86.98      (7.5%)   -1.4% ( -14% -   13%) 0.525
                      IntSet      849.67      (4.2%)      839.73      (5.1%)   -1.2% (  -9% -    8%) 0.425
           HighTermMonthSort     1391.79      (3.4%)     1376.85      (1.8%)   -1.1% (  -6% -    4%) 0.212
                    PKLookup      203.28      (2.6%)      201.17      (2.9%)   -1.0% (  -6% -    4%) 0.237
            HighSloppyPhrase       20.25      (3.3%)       20.08      (3.8%)   -0.8% (  -7% -    6%) 0.459
       BrowseMonthTaxoFacets        2.79      (2.1%)        2.78      (1.3%)   -0.7% (  -3% -    2%) 0.209
         MedIntervalsOrdered      160.80      (5.1%)      159.78      (5.9%)   -0.6% ( -11% -   10%) 0.716
                     LowTerm     1625.14      (3.5%)     1614.93      (2.8%)   -0.6% (  -6% -    5%) 0.532
                 MedSpanNear       15.55      (3.2%)       15.46      (3.6%)   -0.6% (  -7% -    6%) 0.603
                     Respell       60.17      (2.3%)       59.88      (2.2%)   -0.5% (  -4% -    4%) 0.500
                    Wildcard      316.12      (3.4%)      314.60      (3.9%)   -0.5% (  -7% -    7%) 0.678
                   MedPhrase       56.20      (1.3%)       55.94      (1.3%)   -0.5% (  -2% -    2%) 0.264
           HighTermTitleSort      129.14      (3.3%)      128.56      (2.8%)   -0.4% (  -6% -    5%) 0.638
                   OrHighMed      416.92      (2.2%)      415.43      (1.4%)   -0.4% (  -3% -    3%) 0.548
                   LowPhrase       42.86      (1.5%)       42.72      (2.1%)   -0.3% (  -3% -    3%) 0.578
       BrowseMonthSSDVFacets        4.41      (7.0%)        4.40      (7.0%)   -0.2% ( -13% -   14%) 0.929
                HighSpanNear       29.41      (2.4%)       29.37      (2.3%)   -0.1% (  -4% -    4%) 0.858
                   OrHighLow     1054.41      (2.9%)     1053.07      (2.4%)   -0.1% (  -5% -    5%) 0.879
               OrHighNotHigh      530.73      (4.5%)      530.13      (7.0%)   -0.1% ( -11% -   11%) 0.952
                  AndHighMed      634.06      (2.0%)      634.04      (1.9%)   -0.0% (  -3% -    4%) 0.996
                 LowSpanNear      351.51      (1.8%)      351.51      (2.2%)    0.0% (  -3% -    4%) 0.998
                  HighPhrase      157.09      (1.3%)      157.11      (1.6%)    0.0% (  -2% -    2%) 0.982
                  OrHighHigh      298.09      (3.1%)      298.21      (1.8%)    0.0% (  -4% -    5%) 0.958
    AndHighHighDayTaxoFacets       17.20      (1.6%)       17.23      (1.4%)    0.2% (  -2% -    3%) 0.753
     AndHighMedDayTaxoFacets      146.02      (0.9%)      146.31      (0.9%)    0.2% (  -1% -    2%) 0.506
        HighTermTitleBDVSort       47.22      (1.9%)       47.32      (1.9%)    0.2% (  -3% -    4%) 0.723
                      Fuzzy1       78.08      (2.7%)       78.31      (2.3%)    0.3% (  -4% -    5%) 0.710
                      Fuzzy2       64.77      (2.7%)       65.00      (2.2%)    0.4% (  -4% -    5%) 0.648
                     Prefix3      465.74      (3.1%)      467.41      (2.4%)    0.4% (  -5% -    6%) 0.687
        MedTermDayTaxoFacets       31.46      (1.0%)       31.59      (1.1%)    0.4% (  -1% -    2%) 0.216
                OrNotHighMed      490.15      (5.5%)      492.51      (6.0%)    0.5% ( -10% -   12%) 0.791
                OrNotHighLow     1128.09      (2.1%)     1133.61      (3.1%)    0.5% (  -4% -    5%) 0.556
                     MedTerm     1044.63      (4.7%)     1052.38      (2.2%)    0.7% (  -5% -    8%) 0.523
                 AndHighHigh      300.64      (4.1%)      303.14      (1.2%)    0.8% (  -4% -    6%) 0.381
               OrNotHighHigh      429.76      (5.3%)      433.35      (6.4%)    0.8% ( -10% -   13%) 0.653
      OrHighMedDayTaxoFacets       12.51      (2.4%)       12.62      (2.0%)    0.9% (  -3% -    5%) 0.194
                  AndHighLow     1293.91      (4.1%)     1305.93      (2.8%)    0.9% (  -5% -    8%) 0.399
                       range     3640.14      (6.4%)     3674.49      (6.8%)    0.9% ( -11% -   15%) 0.651
                    HighTerm      948.31      (4.4%)      959.25      (2.5%)    1.2% (  -5% -    8%) 0.308
                OrHighNotMed      748.02      (4.4%)      758.29      (4.1%)    1.4% (  -6% -   10%) 0.310
                      IntNRQ     1051.87      (2.1%)     1066.42      (2.5%)    1.4% (  -3% -    6%) 0.062
 BrowseRandomLabelTaxoFacets        2.40      (1.7%)        2.43      (5.8%)    1.4% (  -5% -    9%) 0.303
   BrowseDayOfYearTaxoFacets        3.21      (4.9%)        3.26      (7.8%)    1.6% ( -10% -   14%) 0.439
        BrowseDateTaxoFacets        3.18      (5.0%)        3.24      (8.2%)    1.7% ( -10% -   15%) 0.423
                OrHighNotLow      931.69      (4.6%)      953.01      (4.2%)    2.3% (  -6% -   11%) 0.101
        BrowseDateSSDVFacets        0.88      (9.3%)        0.91      (7.3%)    3.1% ( -12% -   21%) 0.240
   BrowseDayOfYearSSDVFacets        4.52     (11.8%)        4.72     (17.0%)    4.3% ( -21% -   37%) 0.348

Removed lambda function
@benwtrent
Copy link
Member

This was added for prefetching: #13359

How does this perform in scenarios where prefetching is beneficial (e.g low memory)?

@shubhamsrkdev
Copy link
Contributor Author

This was added for prefetching: #13359

How does this perform in scenarios where prefetching is beneficial (e.g low memory)?

Thanks @benwtrent - is there a standard way to test this? How much would we consider low memory?

@epotyom
Copy link
Contributor

epotyom commented Dec 5, 2025

@benwtrent this change gives us significant enough performance boost, I think it is because BooleanWeight has optimizations for clauses which return null scorerSupplier https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java#L293

But looks like it is bad for low memory scenarios, so I wonder if propagating isLoaded result from #prefetch https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/store/MemorySegmentIndexInput.java#L332 to TermStates to decide if we want to use lambda or call termExistsSupplier.get() right away could be the way?

@benwtrent
Copy link
Member

this change gives us significant enough performance boost

I honestly don't fully understand all the code here. I am simply pointing out that this change was made with a particular thing in mind, and that thing seems completely ignored (which we shouldn't do).

But looks like it is bad for low memory scenarios

I am not sure it is?

I am pushing back on this change simply because it is effectively a revert of a previous commit and we need a good reason for that (e.g. do we need prefetching? Are we wanting to handling it differently?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants