Add a new renderer class with XSS protections #60

Merged
merged 10 commits into from Jan 15, 2017

Conversation

Projects
None yet
2 participants
@Changaco
Contributor

Changaco commented Jan 14, 2017

This Pull Request implements #59. Let me know if anything doesn't look right.

@FSX

Thanks for your pull request!

misaka/api.py
+ :arg img_src_rewrite: the URL of an image proxy, necessary to rewrite the
+ ``src`` attributes of images
+
+ Both srings should include a ``{link}`` placeholder for the URL-encoded

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

srings should be strings

@FSX

FSX Jan 14, 2017

Owner

srings should be strings

+
+ def __init__(self, flags=(), sanitization_mode='skip-html', nesting_level=0,
+ link_rewrite=None, img_src_rewrite=None):
+ if not isinstance(flags, tuple):

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

This breaks compatibility with HtmlRenderer, which accepts flags as a tuple of strings and ORed constants. It's not a big issue, but it won't be a drop-in for HtmlRenderer.

I would probably be better to have the flags be one type, but it'll be annoying for people if their code breaks.

@FSX

FSX Jan 14, 2017

Owner

This breaks compatibility with HtmlRenderer, which accepts flags as a tuple of strings and ORed constants. It's not a big issue, but it won't be a drop-in for HtmlRenderer.

I would probably be better to have the flags be one type, but it'll be annoying for people if their code breaks.

This comment has been minimized.

@Changaco

Changaco Jan 14, 2017

Contributor

I didn't add support for passing integers because I saw that it's deprecated.

@Changaco

Changaco Jan 14, 2017

Contributor

I didn't add support for passing integers because I saw that it's deprecated.

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

Oops. My mistake.

@FSX

FSX Jan 14, 2017

Owner

Oops. My mistake.

+
+ def test_html_escape(self):
+ supplied = 'Example <script>alert(1);</script>'
+ expected = '<p>%s</p>\n' % escape_html(supplied)

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

I prefer str.format. Is there a reason you used old-style formatting?

@FSX

FSX Jan 14, 2017

Owner

I prefer str.format. Is there a reason you used old-style formatting?

misaka/api.py
+ else:
+ return escape_html("[%s](%s)" % (content, raw_link))
+
+ def check_link(self, link, is_image_src=False):

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

is_image_src is not used.

@FSX

FSX Jan 14, 2017

Owner

is_image_src is not used.

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

Why not put self._allowed_url_re.match(link) directly in the render functions?

@FSX

FSX Jan 14, 2017

Owner

Why not put self._allowed_url_re.match(link) directly in the render functions?

This comment has been minimized.

@Changaco

Changaco Jan 14, 2017

Contributor

The separate check_link method is to allow customization of link filtering by subclasses.

You're right that is_image_src isn't used, that's a copy-paste error on my part.

@Changaco

Changaco Jan 14, 2017

Contributor

The separate check_link method is to allow customization of link filtering by subclasses.

You're right that is_image_src isn't used, that's a copy-paste error on my part.

misaka/api.py
+ link = self.rewrite_link(raw_link)
+ maybe_title = ' title="%s"' % escape_html(title) if title else ''
+ link = escape_html(link)
+ return ('<a href="%s"%s>' + content + '</a>') % (link, maybe_title)

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

I prefer str.format. Is there a reason you used old-style formatting?

@FSX

FSX Jan 14, 2017

Owner

I prefer str.format. Is there a reason you used old-style formatting?

This comment has been minimized.

@Changaco

Changaco Jan 14, 2017

Contributor

I use percent formatting by default, mostly because it's shorter and easier to type.

@Changaco

Changaco Jan 14, 2017

Contributor

I use percent formatting by default, mostly because it's shorter and easier to type.

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

Ok. That's cool.

@FSX

FSX Jan 14, 2017

Owner

Ok. That's cool.

tests/test_xss_protection.py
+ for url in ('http://a', 'https://b'):
+ actual = render_rewrite("['foo](%s \"bar'\")" % url)
+ expected = '<p><a href="%s" title="bar&#39;">&#39;foo</a></p>\n'
+ ok(actual).diff(expected % rewrite_link(url))

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

I would put % rewrite_link(url) on the above line.

@FSX

FSX Jan 14, 2017

Owner

I would put % rewrite_link(url) on the above line.

misaka/api.py
+ """
+ return bool(self._allowed_url_re.match(link))
+
+ def rewrite_link(self, link, is_image_src=False):

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

Maybe rewrite_url? As it's both used for links and images.

@FSX

FSX Jan 14, 2017

Owner

Maybe rewrite_url? As it's both used for links and images.

tests/test_xss_protection.py
+ ok(actual).diff(expected)
+
+ def test_autolink_rewriting(self):
+ for url in ('http://a', "https://b?x&y"):

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

Use single quotes. ;D

@FSX

FSX Jan 14, 2017

Owner

Use single quotes. ;D

misaka/api.py
+ """
+ Filters links.
+ """
+ if self.check_link(raw_link):

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

When I enable the autolink extension email addresses are not rendered into links anymore.

My testing code:

from misaka import escape_html, Markdown, SaferHtmlRenderer, HtmlRenderer

render_normal = Markdown(HtmlRenderer(), extensions=('autolink',))
render = Markdown(SaferHtmlRenderer())
render_escape = Markdown(SaferHtmlRenderer(sanitization_mode='escape'))
renderer_rewrite = SaferHtmlRenderer(
    link_rewrite='//example.com/redirect/{link}',
    img_src_rewrite='//img_proxy/{link}',
)
render_rewrite = Markdown(renderer_rewrite, extensions=('autolink',),)
rewrite_link = renderer_rewrite.rewrite_link


print(render_normal('<https://b?x&y>'))
print(render_normal('Hello https://b?x&y Hola'))
print(render_normal('Hello https://example.com Hola'))
print(render_normal('Banana frank@61924.nl'))
print(render_normal('Banana <frank@61924.nl>'))

print('Safe:\n')

print(render_rewrite('<https://b?x&y>'))
print(render_rewrite('Hello https://b?x&y Hola'))
print(render_rewrite('Hello https://example.com Hola'))
print(render_rewrite('Banana frank@61924.nl'))
print(render_rewrite('Banana <frank@61924.nl>'))

Output:

<p><a href="https://b?x&amp;y">https://b?x&amp;y</a></p>

<p>Hello https://b?x&amp;y Hola</p>

<p>Hello <a href="https://example.com">https://example.com</a> Hola</p>

<p>Banana <a href="mailto:frank@61924.nl">frank@61924.nl</a></p>

<p>Banana <a href="mailto:frank@61924.nl">frank@61924.nl</a></p>

Safe:

<p><a href="//example.com/redirect/https%3A//b%3Fx%26y">https://b?x&amp;y</a></p>

<p>Hello https://b?x&amp;y Hola</p>

<p>Hello <a href="//example.com/redirect/https%3A//example.com">https://example.com</a> Hola</p>

<p>Banana &lt;frank@61924.nl&gt;</p>

<p>Banana &lt;frank@61924.nl&gt;</p>
@FSX

FSX Jan 14, 2017

Owner

When I enable the autolink extension email addresses are not rendered into links anymore.

My testing code:

from misaka import escape_html, Markdown, SaferHtmlRenderer, HtmlRenderer

render_normal = Markdown(HtmlRenderer(), extensions=('autolink',))
render = Markdown(SaferHtmlRenderer())
render_escape = Markdown(SaferHtmlRenderer(sanitization_mode='escape'))
renderer_rewrite = SaferHtmlRenderer(
    link_rewrite='//example.com/redirect/{link}',
    img_src_rewrite='//img_proxy/{link}',
)
render_rewrite = Markdown(renderer_rewrite, extensions=('autolink',),)
rewrite_link = renderer_rewrite.rewrite_link


print(render_normal('<https://b?x&y>'))
print(render_normal('Hello https://b?x&y Hola'))
print(render_normal('Hello https://example.com Hola'))
print(render_normal('Banana frank@61924.nl'))
print(render_normal('Banana <frank@61924.nl>'))

print('Safe:\n')

print(render_rewrite('<https://b?x&y>'))
print(render_rewrite('Hello https://b?x&y Hola'))
print(render_rewrite('Hello https://example.com Hola'))
print(render_rewrite('Banana frank@61924.nl'))
print(render_rewrite('Banana <frank@61924.nl>'))

Output:

<p><a href="https://b?x&amp;y">https://b?x&amp;y</a></p>

<p>Hello https://b?x&amp;y Hola</p>

<p>Hello <a href="https://example.com">https://example.com</a> Hola</p>

<p>Banana <a href="mailto:frank@61924.nl">frank@61924.nl</a></p>

<p>Banana <a href="mailto:frank@61924.nl">frank@61924.nl</a></p>

Safe:

<p><a href="//example.com/redirect/https%3A//b%3Fx%26y">https://b?x&amp;y</a></p>

<p>Hello https://b?x&amp;y Hola</p>

<p>Hello <a href="//example.com/redirect/https%3A//example.com">https://example.com</a> Hola</p>

<p>Banana &lt;frank@61924.nl&gt;</p>

<p>Banana &lt;frank@61924.nl&gt;</p>

This comment has been minimized.

@Changaco

Changaco Jan 14, 2017

Contributor

That's not exactly a bug, only HTTP and HTTPS links are allowed by default, so mailto: is rejected. Do you think mailto: should be allowed by default, or that the documentation should be modified to clarify that the new renderer rejects email addresses?

@Changaco

Changaco Jan 14, 2017

Contributor

That's not exactly a bug, only HTTP and HTTPS links are allowed by default, so mailto: is rejected. Do you think mailto: should be allowed by default, or that the documentation should be modified to clarify that the new renderer rejects email addresses?

This comment has been minimized.

@FSX

FSX Jan 14, 2017

Owner

The latter is a good choice I think.

@FSX

FSX Jan 14, 2017

Owner

The latter is a good choice I think.

@Changaco

This comment has been minimized.

Show comment
Hide comment
@Changaco

Changaco Jan 15, 2017

Contributor

@FSX I've tried to address all your comments. Let me know if you'd like me to squash the commits to keep the Git history clean.

Contributor

Changaco commented Jan 15, 2017

@FSX I've tried to address all your comments. Let me know if you'd like me to squash the commits to keep the Git history clean.

@Changaco

This comment has been minimized.

Show comment
Hide comment
@Changaco

Changaco Jan 15, 2017

Contributor

The python 2.6 build has errored on Travis.

@FSX Have you considered running a single build for all python versions? This project already has Tox set up so it'd be very easy to change from multiple builds to a single one.

Contributor

Changaco commented Jan 15, 2017

The python 2.6 build has errored on Travis.

@FSX Have you considered running a single build for all python versions? This project already has Tox set up so it'd be very easy to change from multiple builds to a single one.

@FSX

This comment has been minimized.

Show comment
Hide comment
@FSX

FSX Jan 15, 2017

Owner

Looks good.

I do like the separate builds of Travis. When something errors out I just restart the build.

Owner

FSX commented Jan 15, 2017

Looks good.

I do like the separate builds of Travis. When something errors out I just restart the build.

@FSX

This comment has been minimized.

Show comment
Hide comment
@FSX

FSX Jan 15, 2017

Owner

Merging this now. Thanks!

I'll do a release when I get home tonight.

Owner

FSX commented Jan 15, 2017

Merging this now. Thanks!

I'll do a release when I get home tonight.

@FSX FSX merged commit bc251d4 into FSX:master Jan 15, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Changaco Changaco deleted the Changaco:xss-protection branch Jan 15, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment