New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Postings highlighter wrong highlighting #4103
Comments
… topLevelId rather than just docId Improved test to catch this problem calling refresh more frequently and having the word to highlight in different positions in the text Closes #4103
Hi Luca, The prior test works with new commit however my original issue of not highlighting many fields at all remains. I am trying to reproduce it with a recreation but so far I could not. to remind you of the progression of fixes to the highlighter
Do you have any suggestions for recreation? Does highlighting with postings depends on search query at all? here is a recreation with few mappings similar to my real ones https://gist.github.com/roytmana/7336502 any suggestions towards recreation of the issue would be very helpful Alex |
@roytmana i had an example of one not highlighting at all from your example yesterday. when searching for also got different responses if i used query_string vs match vs match_phrase_prefix sorry can't provide more details atm |
this test works now with the latest 0.90 snapshot but I still am having issue of not highlighting at all which I could not reproduce with a recreate so far :-( |
Hi @roytmana, thanks a lot for your feedback! What you called "scrambled highlighting" has been solved :) I wonder if you are seeing a regression or a problem that's always been there. Sounds weird that you say the difference is dramatic but you can't reproduce it. It might depend either on your queries or your analysis chain, would be great if you can open another issue with some examples of what doesn't work. |
well it worked (except for the wildcards) in the very first drop I tested I will open a new issue based on my last comment |
… topLevelId rather than just docId Improved test to catch this problem calling refresh more frequently and having the word to highlight in different positions in the text Closes elastic#4103
As reported as a comment in #4042, the postings highlighter has a weird behaviour, it seems like it remembers the offsets of the previously highlighted documents. Well, what happens is that it actually highlights the right text against the wrong offsets, because of most likely the silliest mistake one can make working with lucene, that is using doc ids that are relative to the segment they belong as they were unique instead of the doc ids that contain the segment offset too.
Surprising that this wasn't raised by our tests though, as we have a pretty good coverage for the postings highlighter. Will improve that too.
The text was updated successfully, but these errors were encountered: