Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordered intervals query and add test case #12214

Merged
merged 2 commits into from
Mar 27, 2023

Conversation

hongyuyan97
Copy link
Contributor

Description

This PR aims to address issue #12213 Ordered intervals over interleaved terms.

A more detailed explanation of the issue and the reasoning behind the fix can be found in the report link above.

@hongyuyan97 hongyuyan97 changed the title fix ordered intervals query and add test case Fix ordered intervals query and add test case Mar 24, 2023
@romseygeek
Copy link
Contributor

Oh, good catch! I ran a quick experiment to see if this also fixes the bug in LUCENE-9418 but we do unfortunately need to keep the extra minimization boolean. But this is still all good, and the test is great.

Could you add an entry to CHANGES.txt? For now we'll target 9.6.0 as I don't think there's a 9.5.1 bugfix release planned, but we can always backport if one comes up.

@hongyuyan97
Copy link
Contributor Author

Thank you for your quick reply! I added an entry to Lucene 9.6.0 bug fixes.

@romseygeek romseygeek self-assigned this Mar 27, 2023
@romseygeek romseygeek merged commit a6475ce into apache:main Mar 27, 2023
@romseygeek
Copy link
Contributor

Thanks @hongyuyan97!

asfgit pushed a commit that referenced this pull request Mar 27, 2023
Given an input text 'A B A C A B C' and search ORDERED(A, B, C), we should 
retrieve hits [0,3] and [4,6]; currently [4,6] is skipped.

After finding the first interval [0, 3], the subintervals will become A[0,0], B[1,1], 
C[3,3]; then the algorithm will try to minimize it and the subintervals will 
become: A:[2,2], B:[5,5], C:[3,3] (after finding 5 > 3 it breaks the minimization)

And when finding next interval, it will do advance(B) before checking whether 
it is after A(the do-while loop), so subintervals will become A[2,2], B[inf, inf], 
C[3,3] and return NO_MORE_INTERVAL.

This commit instead continues advancing subintervals from where the last
`nextInterval` call stopped, rather than always advancing all subintervals.
@hongyuyan97 hongyuyan97 deleted the fix-ordered-interval branch March 28, 2023 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants