-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use #15332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use #15332
Conversation
… the problem of excessive memory usage in ultra-long text search case
…lve the problem of excessive memory usage in ultra-long text search case
…lve the problem of excessive memory usage in ultra-long text search case
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
…lve the problem of excessive memory usage in ultra-long text search case
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add changes entry as well.
} | ||
if (termThreshold > 0 && terms.size() >= termThreshold) { | ||
throw new IllegalArgumentException( | ||
"The current value of terms is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The current value of terms is " | |
"The current number of terms is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, Dawid! 👍
|
||
public void testPhraseQueryTermLimit() throws Exception { | ||
PhraseQuery.Builder builder = new PhraseQuery.Builder(); | ||
int termLimit = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a smaller termLimit (5) - it makes little sense to make the test run for 1000 terms. Also, I'd change the loops to not use positions 0..termLimit-1 and then check for failure on termLimit - seems more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points, @dweiss! I've made the following changes based on your feedback:
- Reduced
termLimit
from 1000 to 5 to make the test more lightweight - Changed the position logic to use
0..termLimit-1
and check for failure at positiontermLimit
This does make the test more intuitive. Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry this comes in multiple requests. Could we also rename the issue and changes entry to make it clear to people as to what's changed? I'd suggest "Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use."
I would also rename the method accordingly and add javadocs to it that say what happens when you exceed the threshold (IllegalArgumentException is thrown). Thank you.
…ber of terms and excessive memory use.
Thanks for the review, @dweiss! Great suggestions. I've made the following changes:
Looking forward to getting this merged! |
…ber of terms and excessive memory use (#15332) * perf: Added configurable limit for PhraseQuery#builder terms to solve the problem of excessive memory usage in ultra-long text search case * perf: Added configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case * perf: Added configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case * perf: Added configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case * add changes entry * Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use. * Optimizing test cases:testPhraseQueryMaxTerms#testPhraseQueryTermLimit --------- Co-authored-by: nickyulin <nickyulin@tencent.com>
Description
Add configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case.
Key Changes