Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302

DongwookKim0823 · 2024-06-24T06:14:39Z

Trac ticket number

Branch description

This branch addresses issue #35533 by improving the urlize function to correctly handle Markdown links.

Key Changes:
Updated the urlize function to parse and convert Markdown links correctly.
Added tests to ensure the correct behavior of the urlize function with various Markdown link inputs.

Tests Included:
Basic Markdown link handling.
Trimming long URLs.
Handling of nofollow attribute.
Ensuring autoescape functionality.

No additional documentation changes were necessary as this is a bug fix for existing functionality.

Checklist

This PR targets the main branch.
The commit message is written in past tense, mentions the ticket number, and ends with a period.
I have checked the "Has patch" ticket flag in the Trac system.
I have added or updated relevant tests.
I have added or updated relevant docs, including release notes if applicable.
I have attached screenshots in both light and dark modes for any UI changes.

github-actions

Hello! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

felixxm · 2024-06-24T07:50:10Z

django/utils/html.py

@@ -275,9 +275,11 @@ class Urlizer:
        r"^www\.|^(?!http)\w[^@]+\.(com|edu|gov|int|mil|net|org)($|/.*)$", re.IGNORECASE
    )
    word_split_re = _lazy_re_compile(r"""([\s<>"']+)""")
+    markdown_link_re = _lazy_re_compile(r"\[([^\]]+)\]\(([^)]+)\)")


I'm pretty sure that this expression will introduce a new reDoS vector (similar to the one fixed in 3394fc6).

Hi @felixxm,

Thank you for your feedback regarding the potential reDoS vector. I am planning to update the regex pattern to:
markdown_link_re = _lazy_re_compile(r"\[([^\]\[]+?)\]\((https?:\/\/[^\s\)]+)\)")

This change aims to ensure that the regex captures Markdown hyperlinks correctly without introducing performance issues. Could you please review this updated pattern and let me know if it addresses your concerns?

Thank you!

This still contains potentially unsafe clauses e.g. [^\]\[].

Thank you for your feedback. Based on your suggestions, I have updated the regex pattern to:
markdown_link_re = _lazy_re_compile(r"\[([^\[\]]+?)\]\((https?:\/\/[^\s\)]+?)\)")

If this is still not correct, could you please recommend a better approach?

This still contains potentially unsafe clauses e.g. [^\]\[]+. Unfortunately, I don't have a good recommendation I only know that the current version can introduce a security issue. TBH, I don't think that introducing a regular expression to find markdown links is a good idea, I cannot imagine that we'll add more regular expressions for other formats, e.g. reStructuredText.

Thank you for your feedback and for pointing out the potential security issues with the current approach. I appreciate your insights and will rework the solution without using regular expressions, as you suggested.

…rectly. Updated the urlize function to correctly handle markdown links. Added tests to ensure the correct behavior of the urlize function with various markdown link inputs.

github-actions bot reviewed Jun 24, 2024

View reviewed changes

felixxm reviewed Jun 24, 2024

View reviewed changes

DongwookKim0823 requested a review from felixxm June 24, 2024 11:09

DongwookKim0823 force-pushed the ticket_35533 branch 3 times, most recently from 52754b7 to 6c19489 Compare June 29, 2024 08:02

Fixed #35533 -- Improved urlize function to handle markdown links cor…

ee5b8e5

…rectly. Updated the urlize function to correctly handle markdown links. Added tests to ensure the correct behavior of the urlize function with various markdown link inputs.

DongwookKim0823 force-pushed the ticket_35533 branch from 6c19489 to ee5b8e5 Compare June 29, 2024 14:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302

Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302

DongwookKim0823 commented Jun 24, 2024

github-actions bot left a comment

felixxm Jun 24, 2024

DongwookKim0823 Jun 24, 2024

felixxm Jun 24, 2024

DongwookKim0823 Jun 24, 2024

felixxm Jun 28, 2024

DongwookKim0823 Jun 28, 2024

Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302

Are you sure you want to change the base?

Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302

Conversation

DongwookKim0823 commented Jun 24, 2024

Trac ticket number

Branch description

Checklist

github-actions bot left a comment

Choose a reason for hiding this comment

felixxm Jun 24, 2024

Choose a reason for hiding this comment

DongwookKim0823 Jun 24, 2024

Choose a reason for hiding this comment

felixxm Jun 24, 2024

Choose a reason for hiding this comment

DongwookKim0823 Jun 24, 2024

Choose a reason for hiding this comment

felixxm Jun 28, 2024

Choose a reason for hiding this comment

DongwookKim0823 Jun 28, 2024

Choose a reason for hiding this comment