Skip to content

Phrase queries using SpanNearQuery highlight suspected bugs. [LUCENE-10609] #11645

@asfimport

Description

@asfimport

document: Blockchain technology 5G technology VR Technology AI Technology
analyzer: WhitespaceAnalyzer
query: spanNear([spanNear([title:Blockchain, title:technology], 0, true), spanNear([title:VR, title:Technology], 0, true)], 2, true)

 

//query code
SpanQuery termQuery_sub01 = new SpanTermQuery(new Term("title", "Blockchain"));
SpanQuery termQuery_sub02 = new SpanTermQuery(new Term("title", "technology"));
SpanNearQuery spanNearQuery_Sub01 = new SpanNearQuery(new SpanQuery[] { termQuery_sub01, termQuery_sub02 }, 0, true);
SpanQuery termQuery_sub03 = new SpanTermQuery(new Term("title", "VR"));
SpanQuery termQuery_sub04 = new SpanTermQuery(new Term("title", "Technology"));
SpanNearQuery spanNearQuery_Sub02 = new SpanNearQuery(new SpanQuery[] { termQuery_sub03, termQuery_sub04 }, 0, true);
SpanNearQuery spanNearQuery = new SpanNearQuery(new SpanQuery[] { spanNearQuery_Sub01, spanNearQuery_Sub02 }, 2, true); 

The query hits the document, but is there a problem with highlighting? 

//highlight code
QueryScorer scorer = new QueryScorer(query);
SimpleHTMLFormatter simpleHtmlFormatter = new SimpleHTMLFormatter("[", "]");
Highlighter highlighter = new Highlighter(simpleHtmlFormatter, scorer);
highlighter.setTextFragmenter(new SimpleFragmenter(100)); 

highlight result
[Blockchain] [technology] 5G [technology] [VR] [Technology] AI Technology

 

I think "Blockchain Technology" and "VR Technology" should be highlighted, but the "technology" in "5G Technology" should not be highlighted.

Uh, uh, UH, I'm not sure if it's a bug or if it's designed that way.


Migrated from LUCENE-10609 by FengFeng Cheng

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions