Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postings highlighter wrong highlighting #4103

Closed
javanna opened this issue Nov 6, 2013 · 5 comments
Closed

Postings highlighter wrong highlighting #4103

javanna opened this issue Nov 6, 2013 · 5 comments

Comments

@javanna
Copy link
Member

javanna commented Nov 6, 2013

As reported as a comment in #4042, the postings highlighter has a weird behaviour, it seems like it remembers the offsets of the previously highlighted documents. Well, what happens is that it actually highlights the right text against the wrong offsets, because of most likely the silliest mistake one can make working with lucene, that is using doc ids that are relative to the segment they belong as they were unique instead of the doc ids that contain the segment offset too.
Surprising that this wasn't raised by our tests though, as we have a pretty good coverage for the postings highlighter. Will improve that too.

@ghost ghost assigned javanna Nov 6, 2013
@javanna javanna closed this as completed in a3e355d Nov 6, 2013
javanna added a commit that referenced this issue Nov 6, 2013
… topLevelId rather than just docId

Improved test to catch this problem calling refresh more frequently and having the word to highlight in different positions in the text

Closes #4103
@roytmana
Copy link

roytmana commented Nov 6, 2013

Hi Luca,

The prior test works with new commit however my original issue of not highlighting many fields at all remains.
I test it by switching from plain to postings highlighter and the difference is dramatic with my real mapping and data - very few fields are now highlighted with postings.

I am trying to reproduce it with a recreation but so far I could not.

to remind you of the progression of fixes to the highlighter

  1. first version worked fine with simple tokens but not wildcards
  2. with wildcard enhancement i had some results randomly not highlighted and the small recreation showed scrambled highlighting
  3. with the latest fix lot more results (majority) does not get highlighted but i can't recreate it on small mapping/data set

Do you have any suggestions for recreation? Does highlighting with postings depends on search query at all?

here is a recreation with few mappings similar to my real ones https://gist.github.com/roytmana/7336502
but it works

any suggestions towards recreation of the issue would be very helpful

Alex

@clintongormley
Copy link

@roytmana i had an example of one not highlighting at all from your example yesterday. when searching for "photo*" or "photo" or "photography", " equipment" was highlighted, but "photography" or "photography equipment" wasn't highlighting.

also got different responses if i used query_string vs match vs match_phrase_prefix

sorry can't provide more details atm

@roytmana
Copy link

roytmana commented Nov 6, 2013

this test works now with the latest 0.90 snapshot but I still am having issue of not highlighting at all which I could not reproduce with a recreate so far :-(

@javanna
Copy link
Member Author

javanna commented Nov 6, 2013

Hi @roytmana, thanks a lot for your feedback!

What you called "scrambled highlighting" has been solved :)

I wonder if you are seeing a regression or a problem that's always been there. Sounds weird that you say the difference is dramatic but you can't reproduce it. It might depend either on your queries or your analysis chain, would be great if you can open another issue with some examples of what doesn't work.

@roytmana
Copy link

roytmana commented Nov 6, 2013

well it worked (except for the wildcards) in the very first drop I tested
(remember when I posted some performance stats) and then it had scrambled
highlighting with some results not highlighted at all and now it
practically highlight nothing

I will open a new issue based on my last comment
I will attach the sample recreation with analysis chain I use and a sample
query but it does not show the issue
I can post my real mapping and few real queries if it is of any help
hopefully it may give you some ideas and you could suggest steps for
troubleshooting/recreating
is there anything else I can provide?

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
… topLevelId rather than just docId

Improved test to catch this problem calling refresh more frequently and having the word to highlight in different positions in the text

Closes elastic#4103
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants