Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove XPostingsHighlighter #10625

Closed
jpountz opened this issue Apr 16, 2015 · 1 comment
Closed

Remove XPostingsHighlighter #10625

jpountz opened this issue Apr 16, 2015 · 1 comment
Assignees
Labels
PITA :Search/Highlighting How a query matched a document v2.0.0-beta1

Comments

@jpountz
Copy link
Contributor

jpountz commented Apr 16, 2015

This class is a fork of Lucene's PostingsHighlighter. Quoting its javadocs:

FORKED from Lucene 4.5 to be able to:
1) support discrete highlighting for multiple values, so that we can return a different snippet per value when highlighting the whole text
2) call the highlightField method directly from subclasses and provide the terms by ourselves
3) Applied LUCENE-4906 to allow PassageFormatter to return arbitrary objects (LUCENE 4.6)

All our changes start with //BEGIN EDIT

I don't think we can maintain such forks in the long term as reconciliating changes that come from Lucene upgrades with changes that we did ourselves might even be impossible at some point. So I think it's important that everything that we need gets merged back to Lucene's PostingsHighlighter and that we remove XPostingsHighlighter from Elasticsearch.

@jpountz jpountz added :Search/Highlighting How a query matched a document PITA labels Apr 16, 2015
@javanna
Copy link
Member

javanna commented Apr 16, 2015

I agree this is a problem, it is impossible to maintain.

To give you some more background here, the idea behind this experiment was to support the postings highlighter in elasticsearch, but trying to make it work as similar as possible to the other highlighters in terms of supported options and the result they provide.

One problem was supporting fragment_size: 0 which means highlighting the whole content of a field, against multiple values. Other highlighters return an array, one fragment per value, while the postings highlighter from lucene used to return all values in a single string. That's why we added "discrete per value highlighting" as a short-term solution.

One other problem was the require_field_match option that wasn't and probably will never be supported in lucene's postings highlighters, but other highlighters do support it. That's the ability to ignore which fields were queried, and just look for the terms in any field that needs to be highlighted.

In order to get rid of this custom class, either we

  1. accept these differences between the standard postings highlighter from lucene and the other highlighters. We document them and we break bw comp as for the above two behaviours

  2. find a way to contribute these two changes back to lucene

To be honest, I don't see 2) happen, especially for require_field_match so I'd vote for 1). At the end of the day, it is a different highlighter and we already tell people to try out the different highlighters without expecting exactly the same results.

javanna added a commit to javanna/elasticsearch that referenced this issue May 9, 2015
Our own fork of the lucene PostingsHighlighter is not easy to maintain and doesn't give us any added value at this point. In particular, it was introduced to support the require_field_match option and discrete per value highlighting, used in case one wants to highlight the whole content of a field, but get back one snippet per value. These two features won't
 make it into lucene as they slow things down and shouldn't have been supported from day one on our end probably.

One other customization we had was support for a wider range of queries via custom rewrite etc. (yet another way to slow
 things down), which got added to lucene and works much much better than what we used to do (instead of or rewrite, term
s are pulled out of the automata for multi term queries).

Removing our fork means the following in terms of features:
- dropped support for require_field_match: the postings highlighter will only highlight fields that were queried
- the output is different compared to other highlighters in case `fragment_size` is set to 0: one single snippet is returned in case a field has multiple values, rather than one highlighted snipper per value

Closes elastic#10625
javanna added a commit to javanna/elasticsearch that referenced this issue May 9, 2015
Our own fork of the lucene PostingsHighlighter is not easy to maintain and doesn't give us any added value at this point. In particular, it was introduced to support the require_field_match option and discrete per value highlighting, used in case one wants to highlight the whole content of a field, but get back one snippet per value. These two features won't
 make it into lucene as they slow things down and shouldn't have been supported from day one on our end probably.

One other customization we had was support for a wider range of queries via custom rewrite etc. (yet another way to slow
 things down), which got added to lucene and works much much better than what we used to do (instead of or rewrite, term
s are pulled out of the automata for multi term queries).

Removing our fork means the following in terms of features:
- dropped support for require_field_match: the postings highlighter will only highlight fields that were queried
- the output is different compared to other highlighters in case `fragment_size` is set to 0: one single snippet is returned in case a field has multiple values, rather than one highlighted snipper per value

Closes elastic#10625
Closes elastic#11077
javanna added a commit to javanna/elasticsearch that referenced this issue May 9, 2015
Our own fork of the lucene PostingsHighlighter is not easy to maintain and doesn't give us any added value at this point. In particular, it was introduced to support the require_field_match option and discrete per value highlighting, used in case one wants to highlight the whole content of a field, but get back one snippet per value. These two features won't
 make it into lucene as they slow things down and shouldn't have been supported from day one on our end probably.

One other customization we had was support for a wider range of queries via custom rewrite etc. (yet another way to slow
 things down), which got added to lucene and works much much better than what we used to do (instead of or rewrite, term
s are pulled out of the automata for multi term queries).

Removing our fork means the following in terms of features:
- dropped support for require_field_match: the postings highlighter will only highlight fields that were queried
- the output is different compared to other highlighters in case `fragment_size` is set to 0: one single snippet is returned in case a field has multiple values, rather than one highlighted snipper per value

Closes elastic#10625
Closes elastic#11077
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PITA :Search/Highlighting How a query matched a document v2.0.0-beta1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants