Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DrillSideways optimizations #11803

Merged
merged 7 commits into from
Sep 29, 2022
Merged

Conversation

gsmiller
Copy link
Contributor

Description

This change makes use of advance instead of next where possible and splits out 1st and 2nd phase checking to avoid match confirmation when unnecessary.

Note that I only focused on the doQueryFirstScoring implementation here and didn't modify the other two scoring approaches. "Progress not perfection" and all that (plus, I think we should strongly consider removing these other two implementations, but we'd want to benchmark to be certain).

Unfortunately, luceneutil doesn't have dedicated drill sideways benchmarks, but some benchmarks on our internal software that makes use of drill sideways showed a +2% QPS improvement and no obvious regressions.

Copy link
Contributor

@zhaih zhaih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM except for one assertion, also do we have existing unit test testing those code? (I hope there's code coverage report here :)

@zhaih
Copy link
Contributor

zhaih commented Sep 27, 2022

Changes LGTM, do we need to add some unit tests?

@gsmiller
Copy link
Contributor Author

Changes LGTM, do we need to add some unit tests?

Thanks @zhaih. Let me consider some specific test for this. I know our randomized testing for DrillSideways covers these code paths but maybe some specific, non-random tests would be useful.

@gsmiller
Copy link
Contributor Author

@zhaih I reexamined our test coverage and think we're in good shape already actually. We've got good coverage for covering drill-sideways correctness with multiple dimensions, etc. (including random and non-random). We could try to take these further by somehow asserting that advance is being used in favor of nextDoc when appropriate, but I think those tests would be reasonably complex to write and I'm not sure they add tremendous value. I'd rather we spent time building drill-sideways benchmarks that focus on ensuring our performance doesn't regress. But that's just my opinion. Please let me know if you feel differently and we can keep discussing. Thanks!

Copy link
Contributor

@zhaih zhaih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gsmiller, I don't have strong opinion on adding a test because I don't really know what coverage we already got, one fact that makes me a little worry about the coverage is that assertion caught by me should've been caught by unit test and that's why I'm asking. But if you think that'll be covered by randomized test then please feel free to push it!

@gsmiller
Copy link
Contributor Author

@zhaih that's a good point and valid concern. I dug into the existing tests and it looks like we have lots of coverage except that the majority of the coverage is using basic, single-phase drill-down dimensions. I'm going to augment our randomized testing to randomly use two-phase drill-downs to broaden coverage. Thanks for the discussion!

@zhaih
Copy link
Contributor

zhaih commented Sep 28, 2022

@gsmiller Thank you for checking and continuous effort!

@gsmiller
Copy link
Contributor Author

@zhaih well, thank you for keeping me honest with testing. I think I've already found an insidious, potential bug with some beefier tests.

2. remove the "validateState" assertion since it's illegal to call match() more than one for the same doc (state validation would require separately tracking the match results for all two-phase iterators, which doesn't seem worth it)
Copy link
Contributor

@zhaih zhaih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, test LGTM!

@gsmiller gsmiller merged commit d02ba31 into apache:main Sep 29, 2022
@gsmiller gsmiller deleted the GH/drillsideways-opto branch September 29, 2022 12:22
gsmiller added a commit that referenced this pull request Sep 29, 2022
DrillSidewaysScorer now breaks up first- and second-phase matching and makes use of advance when possible over nextDoc.
@rmuir rmuir added this to the 9.5.0 milestone Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants