Skip to content

Native (C++) implementation of "pure OR" BooleanQuery [LUCENE-5049] #6113

@asfimport

Description

@asfimport

I've been playing with a C++ implementation of BooleanQuery containing
only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.

The results are impressive: ~3X speedup for BQ OR over two terms, and
also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
to BQ OR over N terms:

                    Task    QPS base      StdDev    QPS comp      StdDev                Pct diff
                 MedTerm       69.47     (15.8%)       68.61     (13.4%)   -1.2% ( -26% -   33%)
                HighTerm       55.25     (16.2%)       54.63     (13.9%)   -1.1% ( -26% -   34%)
                 LowTerm      333.10      (9.6%)      329.43      (8.0%)   -1.1% ( -17% -   18%)
                  IntNRQ        3.37      (2.6%)        3.36      (4.6%)   -0.2% (  -7% -    7%)
                 Prefix3       18.91      (2.0%)       19.04      (3.5%)    0.7% (  -4% -    6%)
                Wildcard       29.40      (1.7%)       29.70      (2.8%)    1.0% (  -3% -    5%)
               MedPhrase      132.69      (6.2%)      134.66      (7.0%)    1.5% ( -11% -   15%)
        HighSloppyPhrase        0.82      (3.6%)        0.83      (3.5%)    1.9% (  -5% -    9%)
             AndHighHigh       19.65      (0.6%)       20.02      (0.8%)    1.9% (   0% -    3%)
              HighPhrase       11.74      (6.6%)       11.96      (7.1%)    1.9% ( -11% -   16%)
         MedSloppyPhrase       29.09      (1.2%)       29.76      (1.9%)    2.3% (   0% -    5%)
         LowSloppyPhrase       25.71      (1.4%)       26.98      (1.7%)    4.9% (   1% -    8%)
                 Respell      173.78      (3.0%)      182.41      (3.7%)    5.0% (  -1% -   12%)
             MedSpanNear       27.67      (2.5%)       29.07      (2.4%)    5.1% (   0% -   10%)
            HighSpanNear        2.95      (2.4%)        3.10      (2.8%)    5.4% (   0% -   10%)
             LowSpanNear        8.29      (3.4%)        8.82      (3.3%)    6.4% (   0% -   13%)
              AndHighMed       79.32      (1.6%)       84.44      (1.0%)    6.5% (   3% -    9%)
               LowPhrase       23.20      (2.0%)       25.14      (1.6%)    8.4% (   4% -   12%)
              AndHighLow      594.17      (3.4%)      660.32      (1.9%)   11.1% (   5% -   16%)
                  Fuzzy2       88.32      (6.4%)      121.44      (1.7%)   37.5% (  27% -   48%)
                  Fuzzy1       86.34      (6.0%)      153.49      (1.7%)   77.8% (  66% -   90%)
              OrHighHigh       16.29      (2.5%)       48.29      (1.3%)  196.5% ( 188% -  205%)
               OrHighMed       28.98      (2.7%)       87.81      (0.9%)  203.0% ( 194% -  212%)
               OrHighLow       27.38      (2.6%)       84.94      (1.1%)  210.3% ( 201% -  219%)

This is essentially a scaled back attempt at #2668 in that it's
"hardwired" to "just" the "OR of TermQuery" case.


Migrated from LUCENE-5049 by Michael McCandless (@mikemccand), 1 vote, updated Jun 22 2013
Attachments: LUCENE-5049.patch

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions