DrillSideways optimizations #11803

gsmiller · 2022-09-21T22:56:37Z

Description

This change makes use of advance instead of next where possible and splits out 1st and 2nd phase checking to avoid match confirmation when unnecessary.

Note that I only focused on the doQueryFirstScoring implementation here and didn't modify the other two scoring approaches. "Progress not perfection" and all that (plus, I think we should strongly consider removing these other two implementations, but we'd want to benchmark to be certain).

Unfortunately, luceneutil doesn't have dedicated drill sideways benchmarks, but some benchmarks on our internal software that makes use of drill sideways showed a +2% QPS improvement and no obvious regressions.

zhaih

Overall LGTM except for one assertion, also do we have existing unit test testing those code? (I hope there's code coverage report here :)

lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java

zhaih · 2022-09-27T04:59:27Z

Changes LGTM, do we need to add some unit tests?

gsmiller · 2022-09-27T14:02:24Z

Changes LGTM, do we need to add some unit tests?

Thanks @zhaih. Let me consider some specific test for this. I know our randomized testing for DrillSideways covers these code paths but maybe some specific, non-random tests would be useful.

gsmiller · 2022-09-28T16:57:50Z

@zhaih I reexamined our test coverage and think we're in good shape already actually. We've got good coverage for covering drill-sideways correctness with multiple dimensions, etc. (including random and non-random). We could try to take these further by somehow asserting that advance is being used in favor of nextDoc when appropriate, but I think those tests would be reasonably complex to write and I'm not sure they add tremendous value. I'd rather we spent time building drill-sideways benchmarks that focus on ensuring our performance doesn't regress. But that's just my opinion. Please let me know if you feel differently and we can keep discussing. Thanks!

zhaih

Hi @gsmiller, I don't have strong opinion on adding a test because I don't really know what coverage we already got, one fact that makes me a little worry about the coverage is that assertion caught by me should've been caught by unit test and that's why I'm asking. But if you think that'll be covered by randomized test then please feel free to push it!

gsmiller · 2022-09-28T20:59:32Z

@zhaih that's a good point and valid concern. I dug into the existing tests and it looks like we have lots of coverage except that the majority of the coverage is using basic, single-phase drill-down dimensions. I'm going to augment our randomized testing to randomly use two-phase drill-downs to broaden coverage. Thanks for the discussion!

zhaih · 2022-09-28T22:49:28Z

@gsmiller Thank you for checking and continuous effort!

gsmiller · 2022-09-28T23:49:51Z

@zhaih well, thank you for keeping me honest with testing. I think I've already found an insidious, potential bug with some beefier tests.

2. remove the "validateState" assertion since it's illegal to call match() more than one for the same doc (state validation would require separately tracking the match results for all two-phase iterators, which doesn't seem worth it)

zhaih

Thank you, test LGTM!

DrillSidewaysScorer now breaks up first- and second-phase matching and makes use of advance when possible over nextDoc.

zhaih reviewed Sep 26, 2022

View reviewed changes

lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java Outdated Show resolved Hide resolved

lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java Outdated Show resolved Hide resolved

zhaih approved these changes Sep 28, 2022

View reviewed changes

gsmiller added 7 commits September 28, 2022 17:36

DrillSideways uses advance instead of next when multiple dims miss

a19a1ba

changes

f259479

more optimizations

4ec419a

changes update and spotless

0379598

pr feedback and slightly more aggressive doc advance in two-phase misses

677ea21

static comparators

dc10f35

1. augment randomized testing to cover two-phse case

e0ce888

2. remove the "validateState" assertion since it's illegal to call match() more than one for the same doc (state validation would require separately tracking the match results for all two-phase iterators, which doesn't seem worth it)

gsmiller force-pushed the GH/drillsideways-opto branch from f8f1f27 to e0ce888 Compare September 29, 2022 01:16

zhaih approved these changes Sep 29, 2022

View reviewed changes

gsmiller merged commit d02ba31 into apache:main Sep 29, 2022

gsmiller deleted the GH/drillsideways-opto branch September 29, 2022 12:22

gsmiller added a commit that referenced this pull request Sep 29, 2022

DrillSideways optimizations (#11803)

b71d0fc

DrillSidewaysScorer now breaks up first- and second-phase matching and makes use of advance when possible over nextDoc.

rmuir added this to the 9.5.0 milestone Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DrillSideways optimizations #11803

DrillSideways optimizations #11803

gsmiller commented Sep 21, 2022

zhaih left a comment

zhaih commented Sep 27, 2022

gsmiller commented Sep 27, 2022

gsmiller commented Sep 28, 2022

zhaih left a comment

gsmiller commented Sep 28, 2022

zhaih commented Sep 28, 2022

gsmiller commented Sep 28, 2022

zhaih left a comment

DrillSideways optimizations #11803

DrillSideways optimizations #11803

Conversation

gsmiller commented Sep 21, 2022

Description

zhaih left a comment

Choose a reason for hiding this comment

zhaih commented Sep 27, 2022

gsmiller commented Sep 27, 2022

gsmiller commented Sep 28, 2022

zhaih left a comment

Choose a reason for hiding this comment

gsmiller commented Sep 28, 2022

zhaih commented Sep 28, 2022

gsmiller commented Sep 28, 2022

zhaih left a comment

Choose a reason for hiding this comment