Skip to content

Conversation

@fragosoluana
Copy link

Description

When a query includes overlapping phrases, the expansion process may generate duplicate phrases—one with the original (possibly high) user-defined boost, and another one with the boost of 1. As a result, the final boost value assigned to the QueryPhraseMap may be incorrect, since it is determined by whichever duplicate is processed last during the creation of the QueryPhraseMap in the markTerminal method.

We could avoid boost overrides of conflicting expanded phrases by taking the max boost in markTerminal. The expectation is that if there are duplicate phrases, one is from the original query and the other is from the expand method with boost of 1. Therefore, it should have one phrase with boost > 1 from the original query, and another equals to 1 from the expanded query. For example, with the expanded phrases [“a b c”: 100, “a b”: 20, “a b c”: 1, “b c”: 50], the final query phrase mapping would be “a b c”: 100.

Fixes #15433

this.terminal = true;
this.slop = slop;
this.boost = boost;
this.boost = Math.max(this.boost, boost);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your approach seems simpler, but I think we have an opportunity to optimize this. Instead of always executing all assignments, should we have something like:

   if (!this.terminal || boost > this.boost) {
        this.terminal = true;
        this.slop = slop;
        this.boost = boost;
        this.termOrPhraseNumber = fieldQuery.nextTermOrPhraseNumber();
      }

Benefits:

Performance: Avoids unnecessary state updates when boost doesn't improve

Semantic correctness: termOrPhraseNumber only increments when meaningful changes occur

Cleaner logic: Single condition handles both initialization and duplicate prevention

Current approach with Math.max():

Always updates all fields, even when boost=1 < existing boost=100

Increments termOrPhraseNumber unnecessarily on duplicate calls

Proposed approach:

Only updates when first call (!this.terminal) or when boost improves (boost > this.boost)

More efficient for queries with many overlapping phrases

What do you think?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Highlighter QueryPhraseMap boost overridden due to conflicting query expansion

2 participants