Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed #35533 -- Improved urlize function to handle markdown links correctly. #18302

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DongwookKim0823
Copy link

Trac ticket number

ticket-35533

Branch description

This branch addresses issue #35533 by improving the urlize function to correctly handle Markdown links.

Key Changes:
Updated the urlize function to parse and convert Markdown links correctly.
Added tests to ensure the correct behavior of the urlize function with various Markdown link inputs.

Tests Included:
Basic Markdown link handling.
Trimming long URLs.
Handling of nofollow attribute.
Ensuring autoescape functionality.

No additional documentation changes were necessary as this is a bug fix for existing functionality.

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Thank you for your contribution 💪

As it's your first contribution be sure to check out the patch review checklist.

If you're fixing a ticket from Trac make sure to set the "Has patch" flag and include a link to this PR in the ticket!

If you have any design or process questions then you can ask in the Django forum.

Welcome aboard ⛵️!

@@ -275,9 +275,11 @@ class Urlizer:
r"^www\.|^(?!http)\w[^@]+\.(com|edu|gov|int|mil|net|org)($|/.*)$", re.IGNORECASE
)
word_split_re = _lazy_re_compile(r"""([\s<>"']+)""")
markdown_link_re = _lazy_re_compile(r"\[([^\]]+)\]\(([^)]+)\)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that this expression will introduce a new reDoS vector (similar to the one fixed in 3394fc6).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @felixxm,

Thank you for your feedback regarding the potential reDoS vector. I am planning to update the regex pattern to:
markdown_link_re = _lazy_re_compile(r"\[([^\]\[]+?)\]\((https?:\/\/[^\s\)]+)\)")

This change aims to ensure that the regex captures Markdown hyperlinks correctly without introducing performance issues. Could you please review this updated pattern and let me know if it addresses your concerns?

Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still contains potentially unsafe clauses e.g. [^\]\[].

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback. Based on your suggestions, I have updated the regex pattern to:
markdown_link_re = _lazy_re_compile(r"\[([^\[\]]+?)\]\((https?:\/\/[^\s\)]+?)\)")

If this is still not correct, could you please recommend a better approach?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still contains potentially unsafe clauses e.g. [^\]\[]+. Unfortunately, I don't have a good recommendation I only know that the current version can introduce a security issue. TBH, I don't think that introducing a regular expression to find markdown links is a good idea, I cannot imagine that we'll add more regular expressions for other formats, e.g. reStructuredText.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback and for pointing out the potential security issues with the current approach. I appreciate your insights and will rework the solution without using regular expressions, as you suggested.

@DongwookKim0823 DongwookKim0823 force-pushed the ticket_35533 branch 3 times, most recently from 52754b7 to 6c19489 Compare June 29, 2024 08:02
…rectly.

Updated the urlize function to correctly handle markdown links. Added tests to ensure the correct behavior of the urlize function with various markdown link inputs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants