Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LUCENE-10624] Binary Search for Sparse IndexedDISI advanceWithinBloc… #968

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wuwm
Copy link
Contributor

@wuwm wuwm commented Jun 20, 2022

Description (or a Jira issue link if you have one)

https://issues.apache.org/jira/browse/LUCENE-10624

#11660

@zacharymorn
Copy link
Contributor

Thanks @wuwm for opening this PR! The improvement idea makes sense to me. Quick question though, given the similarities of the binary search implementations in the two methods, is it possible to extract them out into a common method?

@wuwm wuwm force-pushed the main branch 2 times, most recently from d4ec532 to e9e39f3 Compare June 22, 2022 15:22
@wuwm
Copy link
Contributor Author

wuwm commented Jun 22, 2022

Thanks @zacharymorn for comments!

There are some implementation diff inside binary search between two methods to handle some edge cases. To make binary search into a single common method, I thought of two ways: A) Add a lambda expression as method parameter to pass in implementation diff part. B) Add if-else to the combined single method. I felt both of them may make readability worse?

Also, I feel, for future changes, since these two methods serve different use cases, future optimization may vary as well (Not an expert in this part, feel free to correct me). Combing into a common method might make future changes complex.

@zacharymorn
Copy link
Contributor

Hmm I see. I'm actually also wondering if it will be possible to have one of them simply delegate to the other one (potentially indirectly via some helper method), and then check the returned value (e.g. have advanceExactWithinBlock to delegate somehow to advanceWithinBlock and then check if the positioned doc is actually equal to target)?

@yuzhoujianxia
Copy link

Can we get this merged?

@wuwm
Copy link
Contributor Author

wuwm commented Jun 30, 2022

@yuzhoujianxia There are some discussion on if binary or exponential search can cause performance regression in some use cases. We need to address the concerns before merging. https://issues.apache.org/jira/browse/LUCENE-10624

@mikemccand
Copy link
Member

This sounds like a nice optimization @wuwm! Is it still relevant?

Lucene's nightly benchmarks include somewhat sparse documents (NYC taxi database) -- maybe this could be used to test the impact of exponential vs full binary search?

Copy link

github-actions bot commented Jan 8, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants