Highlighter returns unexpected results if one term is found within another #378

witten · 2011-06-17T01:02:29Z

from haystack.utils import Highlighter
Highlighter(query='portland or').highlight('portland')

Actual result:

u'<span class="highlighted">portland</span><span class="highlighted">or</span>tland'

Note that the text to highlight is just "portland", but the highlighter is tacking "or" + "tland" onto the end as well.

Expected result:

u'<span class="highlighted">portland</span>'

Or perhaps:

u'<span class="highlighted">p<span class="highlighted">or</span>tland</span>'

I'm using haystack 1.2.4.

The text was updated successfully, but these errors were encountered:

witten · 2011-06-17T22:12:57Z

I added one potential solution which makes the output look like the expected result above. Basically, if there would be two highlight areas that overlap or coincide, it just highlights the first one.

witten · 2012-02-07T19:36:29Z

I added a test for this fix. Without the fix, the test fails. After the fix is added, the test passes.

…ound within another.

…f one term is found within another.

seddonym · 2015-08-19T09:10:16Z

I ran into this issue too. IMHO this fix is not addressing the correct problem, which is that Highlighter.find_highlightable_words is returning the wrong results. Here's a different version that uses regular expression word boundaries to return the correct results:

import re

class Highlighter(object):
    ...        
    def find_highlightable_words(self):
        word_positions = {}

        lower_text_block = self.text_block.lower()

        for word in self.query_words:
            # Use a regular expression to search by whole words
            # \b corresponds to a word boundary
            matches = re.finditer(r'\b%s\b' % word, lower_text_block)

            if matches:
                word_positions[word] = [match.start() for match in matches]

        return word_positions

Let me know if you'd like me to make a pull request.

Incidentally in the meantime if anyone wants to correct this behaviour themselves, they can use a custom highlighter - see this gist.

jw · 2016-01-15T20:39:21Z

I agree with @seddonym. Maybe just close this issue?

toastdriven pushed a commit that referenced this issue May 31, 2012

Fix for #378: Highlighter returns unexpected results if one term is f…

ff4b05c

…ound within another.

floppya pushed a commit to floppya/django-haystack that referenced this issue Mar 29, 2013

Fix for django-haystack#378: Highlighter returns unexpected results i…

2d08c70

…f one term is found within another.

acdha added the needs review label May 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlighter returns unexpected results if one term is found within another #378

Highlighter returns unexpected results if one term is found within another #378

witten commented Jun 17, 2011

witten commented Jun 17, 2011

witten commented Feb 7, 2012

seddonym commented Aug 19, 2015

jw commented Jan 15, 2016

Highlighter returns unexpected results if one term is found within another #378

Highlighter returns unexpected results if one term is found within another #378

Comments

witten commented Jun 17, 2011

witten commented Jun 17, 2011

witten commented Feb 7, 2012

seddonym commented Aug 19, 2015

jw commented Jan 15, 2016