-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Search result with broken HTML for matches inside links #1303
Comments
I'll take a look on this! |
This has some very tricky edge cases here, what if you store the following: I like the apples from the [fresh market](https://delimarket.example) And then you search for First Django will go to the search view, and there we use the most common FTS technique https://github.com/django-wiki/django-wiki/blob/main/src/wiki/views/article.py#L703-L706 Later on, Django renders the template tied to the search view, in the render process we have inside a loop that then calls an include which finally executes: <p><small>{{ article.render|get_content_snippet:search_query }}</small></p> The problem here is that when <p>I like the apples from the <a href="https://delimarket.example">fresh market</a></p>\n We might have two possible solutions:
Both solutions share the same complexity level - Cc @benjaoming @hermannromanek |
@benjaoming what option do you think will be the best here? |
It's possibly Solution B that's best and easiest way? The consideration that I'd give is that there can be additional plugin contents included after calling Rendered HTML is cached so that should be easy. An optional refinement to Solution B could be to create a search index only from specific HTML attribute values and text nodes in the HTML tree, given that certain HTML contents would be irrelevant to the search. Let's say that the user wants to find the keyword "href", that would be pretty hard if only rendered HTML is parsed. In Solution A (markdown search), it could similarly be argued that are contents irrelevant to the search index... but stripping out irrelevant custom markdown tags is a much more difficult problem to solve than using a conventional HTML/XML parser. |
I want to add another error that occurs when searching for the title of an h2. Conflicts with the edit plugin This is the way I solved it @register.filter
def get_text(html):
cleaned = bleach.clean(html, tags=[], strip=True)
# Remove all occurrences of [edit]
cleaned = re.sub(r'\[edit\]', '', cleaned)
return cleaned {{ article.render|get_text|get_content_snippet:search_query }} I simply take advantage of the bleach plugin that is already installed and clean the html with all the tags before searching |
Discussed in #1302
Originally posted by hermannromanek September 24, 2023
Describe the bug
When searched terms are found within a links slug, the filter get_content_snippet returns invalid HTML.
Problem is the filter cleaning HTML (maybe intentionally to also search through that?) for the keyword. My quick fix was to do a strip_tags at the start of the function:
This probably means it is no longer neccessary in the nested function clean_text.
Steps to reproduce.
The text [Lorem Ipsum](/lorem-ipsum/) will display incorrectly when searching for "lorem".
Expected behavior
Expected would be correct HTML, highlighting the search term within the a tag.
May require more complex logic if the search should find slugs that are named differently than the link pointing there.
Python Version
3.9+
Django Version
4.0+
Extra context
No response
The text was updated successfully, but these errors were encountered: