-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Thank you for your contribution 💪
As it's your first contribution be sure to check out the patch review checklist.
If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!
If you have any design or process questions then you can ask in the Django forum.
Welcome aboard ⛵️!
django/utils/html.py
Outdated
@@ -275,9 +275,11 @@ class Urlizer: | |||
r"^www\.|^(?!http)\w[^@]+\.(com|edu|gov|int|mil|net|org)($|/.*)$", re.IGNORECASE | |||
) | |||
word_split_re = _lazy_re_compile(r"""([\s<>"']+)""") | |||
markdown_link_re = _lazy_re_compile(r"\[([^\]]+)\]\(([^)]+)\)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that this expression will introduce a new reDoS vector (similar to the one fixed in 3394fc6).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @felixxm,
Thank you for your feedback regarding the potential reDoS vector. I am planning to update the regex pattern to:
markdown_link_re = _lazy_re_compile(r"\[([^\]\[]+?)\]\((https?:\/\/[^\s\)]+)\)")
This change aims to ensure that the regex captures Markdown hyperlinks correctly without introducing performance issues. Could you please review this updated pattern and let me know if it addresses your concerns?
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still contains potentially unsafe clauses e.g. [^\]\[]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback. Based on your suggestions, I have updated the regex pattern to:
markdown_link_re = _lazy_re_compile(r"\[([^\[\]]+?)\]\((https?:\/\/[^\s\)]+?)\)")
If this is still not correct, could you please recommend a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still contains potentially unsafe clauses e.g. [^\]\[]+
. Unfortunately, I don't have a good recommendation I only know that the current version can introduce a security issue. TBH, I don't think that introducing a regular expression to find markdown links is a good idea, I cannot imagine that we'll add more regular expressions for other formats, e.g. reStructuredText
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback and for pointing out the potential security issues with the current approach. I appreciate your insights and will rework the solution without using regular expressions, as you suggested.
52754b7
to
6c19489
Compare
…rectly. Updated the urlize function to correctly handle markdown links. Added tests to ensure the correct behavior of the urlize function with various markdown link inputs.
6c19489
to
ee5b8e5
Compare
Trac ticket number
ticket-35533
Branch description
This branch addresses issue #35533 by improving the urlize function to correctly handle Markdown links.
Key Changes:
Updated the urlize function to parse and convert Markdown links correctly.
Added tests to ensure the correct behavior of the urlize function with various Markdown link inputs.
Tests Included:
Basic Markdown link handling.
Trimming long URLs.
Handling of nofollow attribute.
Ensuring autoescape functionality.
No additional documentation changes were necessary as this is a bug fix for existing functionality.
Checklist
main
branch.