Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete all live docs when query matched a whole segment. #13395

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vsop-479
Copy link
Contributor

@vsop-479 vsop-479 commented May 21, 2024

Description

We can delete all live docs when a query matched a whole segment, in FrozenBufferedUpdates.applyQueryDeletes.

I have implemented it for PointRangeQuery, and also trying to implement it for other queries.

@vsop-479
Copy link
Contributor Author

@mikemccand
Please take a look when you get a chance.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am nervous about this change -- altering the very widely used Scorer API to express whether it matches all documents feels invasive.

Also, the use case this is optimizing for, feels rare? Though, perhaps if an index sorts its segments by time range, and collates documents into segments by time range as well, delete by range query over a time range might more efficiently drop whole segments if we could do something like this cleanly? Even so, it's quite rare that an index would invoke this pruning, and the likely smallish time it takes today is fine?

@@ -98,4 +100,14 @@ public int advanceShallow(int target) throws IOException {
* {@link #advanceShallow(int) shallow-advanced} to included and {@code upTo} included.
*/
public abstract float getMaxScore(int upTo) throws IOException;

/** Set whether we can match all docs in this scored segment. */
public void setMatchAll(boolean isMatchAll) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm adding this API to such a widely used class (Scorer) makes me nervous -- it means consumers of Scorer can suddenly setMatchAll without being true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means consumers of Scorer can suddenly setMatchAll without being true?

Yes, I should found a better way to mark isMatchAll.

@vsop-479
Copy link
Contributor Author

Also, the use case this is optimizing for, feels rare?

Even so, it's quite rare that an index would invoke this pruning, and the likely smallish time it takes today is fine?

Yes, delete by range query over a time range, is the only case which came to my mind, but the improve maybe will not be significant.

Copy link

github-actions bot commented Jun 7, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants