Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighter returns unexpected results if one term is found within another #378

Open
witten opened this issue Jun 17, 2011 · 4 comments
Open

Comments

@witten
Copy link

witten commented Jun 17, 2011

from haystack.utils import Highlighter
Highlighter(query='portland or').highlight('portland')

Actual result:

u'<span class="highlighted">portland</span><span class="highlighted">or</span>tland'

Note that the text to highlight is just "portland", but the highlighter is tacking "or" + "tland" onto the end as well.

Expected result:

u'<span class="highlighted">portland</span>'

Or perhaps:

u'<span class="highlighted">p<span class="highlighted">or</span>tland</span>'

I'm using haystack 1.2.4.

@witten
Copy link
Author

witten commented Jun 17, 2011

I added one potential solution which makes the output look like the expected result above. Basically, if there would be two highlight areas that overlap or coincide, it just highlights the first one.

@witten
Copy link
Author

witten commented Feb 7, 2012

I added a test for this fix. Without the fix, the test fails. After the fix is added, the test passes.

toastdriven pushed a commit that referenced this issue May 31, 2012
floppya pushed a commit to floppya/django-haystack that referenced this issue Mar 29, 2013
@seddonym
Copy link

I ran into this issue too. IMHO this fix is not addressing the correct problem, which is that Highlighter.find_highlightable_words is returning the wrong results. Here's a different version that uses regular expression word boundaries to return the correct results:

import re

class Highlighter(object):
    ...        
    def find_highlightable_words(self):
        word_positions = {}

        lower_text_block = self.text_block.lower()

        for word in self.query_words:
            # Use a regular expression to search by whole words
            # \b corresponds to a word boundary
            matches = re.finditer(r'\b%s\b' % word, lower_text_block)

            if matches:
                word_positions[word] = [match.start() for match in matches]

        return word_positions

Let me know if you'd like me to make a pull request.

Incidentally in the meantime if anyone wants to correct this behaviour themselves, they can use a custom highlighter - see this gist.

@jw
Copy link

jw commented Jan 15, 2016

I agree with @seddonym. Maybe just close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants