I've been playing with a C++ implementation of BooleanQuery containing
only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
The results are impressive: ~3X speedup for BQ OR over two terms, and
also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
to BQ OR over N terms:
Task QPS base StdDev QPS comp StdDev Pct diff
MedTerm 69.47 (15.8%) 68.61 (13.4%) -1.2% ( -26% - 33%)
HighTerm 55.25 (16.2%) 54.63 (13.9%) -1.1% ( -26% - 34%)
LowTerm 333.10 (9.6%) 329.43 (8.0%) -1.1% ( -17% - 18%)
IntNRQ 3.37 (2.6%) 3.36 (4.6%) -0.2% ( -7% - 7%)
Prefix3 18.91 (2.0%) 19.04 (3.5%) 0.7% ( -4% - 6%)
Wildcard 29.40 (1.7%) 29.70 (2.8%) 1.0% ( -3% - 5%)
MedPhrase 132.69 (6.2%) 134.66 (7.0%) 1.5% ( -11% - 15%)
HighSloppyPhrase 0.82 (3.6%) 0.83 (3.5%) 1.9% ( -5% - 9%)
AndHighHigh 19.65 (0.6%) 20.02 (0.8%) 1.9% ( 0% - 3%)
HighPhrase 11.74 (6.6%) 11.96 (7.1%) 1.9% ( -11% - 16%)
MedSloppyPhrase 29.09 (1.2%) 29.76 (1.9%) 2.3% ( 0% - 5%)
LowSloppyPhrase 25.71 (1.4%) 26.98 (1.7%) 4.9% ( 1% - 8%)
Respell 173.78 (3.0%) 182.41 (3.7%) 5.0% ( -1% - 12%)
MedSpanNear 27.67 (2.5%) 29.07 (2.4%) 5.1% ( 0% - 10%)
HighSpanNear 2.95 (2.4%) 3.10 (2.8%) 5.4% ( 0% - 10%)
LowSpanNear 8.29 (3.4%) 8.82 (3.3%) 6.4% ( 0% - 13%)
AndHighMed 79.32 (1.6%) 84.44 (1.0%) 6.5% ( 3% - 9%)
LowPhrase 23.20 (2.0%) 25.14 (1.6%) 8.4% ( 4% - 12%)
AndHighLow 594.17 (3.4%) 660.32 (1.9%) 11.1% ( 5% - 16%)
Fuzzy2 88.32 (6.4%) 121.44 (1.7%) 37.5% ( 27% - 48%)
Fuzzy1 86.34 (6.0%) 153.49 (1.7%) 77.8% ( 66% - 90%)
OrHighHigh 16.29 (2.5%) 48.29 (1.3%) 196.5% ( 188% - 205%)
OrHighMed 28.98 (2.7%) 87.81 (0.9%) 203.0% ( 194% - 212%)
OrHighLow 27.38 (2.6%) 84.94 (1.1%) 210.3% ( 201% - 219%)
This is essentially a scaled back attempt at #2668 in that it's
"hardwired" to "just" the "OR of TermQuery" case.
Migrated from LUCENE-5049 by Michael McCandless (@mikemccand), 1 vote, updated Jun 22 2013
Attachments: LUCENE-5049.patch
I've been playing with a C++ implementation of BooleanQuery containing
only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
The results are impressive: ~3X speedup for BQ OR over two terms, and
also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
to BQ OR over N terms:
This is essentially a scaled back attempt at #2668 in that it's
"hardwired" to "just" the "OR of TermQuery" case.
Migrated from LUCENE-5049 by Michael McCandless (@mikemccand), 1 vote, updated Jun 22 2013
Attachments: LUCENE-5049.patch