Skip to content

Conversation

linyunanit
Copy link
Contributor

Description

Add configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case.

Key Changes

  • Add termThreshold variables in PhraseQuery.Builder.
  • Check whether the current terms size reaches the threshold.

nickyulin added 3 commits October 13, 2025 14:51
… the problem of excessive memory usage in ultra-long text search case
…lve the problem of excessive memory usage in ultra-long text search case
…lve the problem of excessive memory usage in ultra-long text search case
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

…lve the problem of excessive memory usage in ultra-long text search case
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add changes entry as well.

}
if (termThreshold > 0 && terms.size() >= termThreshold) {
throw new IllegalArgumentException(
"The current value of terms is "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The current value of terms is "
"The current number of terms is "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, Dawid! 👍


public void testPhraseQueryTermLimit() throws Exception {
PhraseQuery.Builder builder = new PhraseQuery.Builder();
int termLimit = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a smaller termLimit (5) - it makes little sense to make the test run for 1000 terms. Also, I'd change the loops to not use positions 0..termLimit-1 and then check for failure on termLimit - seems more intuitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, @dweiss! I've made the following changes based on your feedback:

  • Reduced termLimit from 1000 to 5 to make the test more lightweight
  • Changed the position logic to use 0..termLimit-1 and check for failure at position termLimit

This does make the test more intuitive. Thanks for the review!

@dweiss dweiss added this to the 10.4.0 milestone Oct 13, 2025
@linyunanit linyunanit requested a review from dweiss October 13, 2025 12:22
Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry this comes in multiple requests. Could we also rename the issue and changes entry to make it clear to people as to what's changed? I'd suggest "Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use."

I would also rename the method accordingly and add javadocs to it that say what happens when you exceed the threshold (IllegalArgumentException is thrown). Thank you.

@linyunanit linyunanit changed the title Solve the problem of excessive memory usage in ultra-long text search case Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use Oct 13, 2025
@linyunanit
Copy link
Contributor Author

I'm sorry this comes in multiple requests. Could we also rename the issue and changes entry to make it clear to people as to what's changed? I'd suggest "Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use."

I would also rename the method accordingly and add javadocs to it that say what happens when you exceed the threshold (IllegalArgumentException is thrown). Thank you.

Thanks for the review, @dweiss! Great suggestions. I've made the following changes:

  1. Renamed the method: Changed setTermThreshold(int value) to setMaxTerms(int maxTerms) as suggested.
  2. Updated Javadoc: Added clear documentation explaining that IllegalArgumentException is thrown when the threshold is exceeded.
  3. Updated the issue/changes entry: Modified to make it clear to people as to what's changed.

Looking forward to getting this merged!

@dweiss dweiss merged commit e3184cb into apache:main Oct 13, 2025
12 checks passed
dweiss pushed a commit that referenced this pull request Oct 13, 2025
…ber of terms and excessive memory use (#15332)

* perf: Added configurable limit for PhraseQuery#builder terms to solve the problem of excessive memory usage in ultra-long text search case

* perf: Added configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case

* perf: Added configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case

* perf: Added configurable term threshold for PhraseQuery#Builder to solve the problem of excessive memory usage in ultra-long text search case

* add changes entry

* Add PhraseQuery.Builder.setMaxTerms() method to limit the maximum number of terms and excessive memory use.

* Optimizing test cases:testPhraseQueryMaxTerms#testPhraseQueryTermLimit

---------

Co-authored-by: nickyulin <nickyulin@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants