Skip to content

Commit

Permalink
let cleaner drop 'data:' links just like it does for 'javascript:' links
Browse files Browse the repository at this point in the history
  • Loading branch information
Stefan Behnel committed Aug 12, 2011
1 parent ae9c604 commit 443e5b3
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGES.txt
Expand Up @@ -16,6 +16,8 @@ Features added
Bugs fixed
----------

* HTML cleaning didn't remove 'data:' links.

* The html5lib parser integration now uses the 'official'
implementation in html5lib itself, which makes it work with newer
releases of the library.
Expand Down
2 changes: 1 addition & 1 deletion src/lxml/html/clean.py
Expand Up @@ -80,7 +80,7 @@
# All kinds of schemes besides just javascript: that can cause
# execution:
_javascript_scheme_re = re.compile(
r'\s*(?:javascript|jscript|livescript|vbscript|about|mocha):', re.I)
r'\s*(?:javascript|jscript|livescript|vbscript|data|about|mocha):', re.I)
_substitute_whitespace = re.compile(r'\s+').sub
# FIXME: should data: be blocked?

Expand Down
8 changes: 8 additions & 0 deletions src/lxml/html/tests/test_clean.txt
Expand Up @@ -8,12 +8,14 @@
... <link rel="alternate" type="text/rss" src="evil-rss">
... <style>
... body {background-image: url(javascript:do_evil)};
... div {background-image: url(data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==)};
... div {color: expression(evil)};
... </style>
... </head>
... <body onload="evil_function()">
... <!-- I am interpreted for EVIL! -->
... <a href="javascript:evil_function()">a link</a>
... <a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
... <a href="#" onclick="evil_function()">another link</a>
... <p onclick="evil_function()">a paragraph</p>
... <div style="display: none">secret EVIL!</div>
Expand All @@ -34,12 +36,14 @@
<link rel="alternate" type="text/rss" src="evil-rss">
<style>
body {background-image: url(javascript:do_evil)};
div {background-image: url(data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==)};
div {color: expression(evil)};
</style>
</head>
<body onload="evil_function()">
<!-- I am interpreted for EVIL! -->
<a href="javascript:evil_function()">a link</a>
<a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
<a href="#" onclick="evil_function()">another link</a>
<p onclick="evil_function()">a paragraph</p>
<div style="display: none">secret EVIL!</div>
Expand All @@ -60,12 +64,14 @@
<link rel="alternate" type="text/rss" src="evil-rss">
<style>
body {background-image: url(javascript:do_evil)};
div {background-image: url(data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==)};
div {color: expression(evil)};
</style>
</head>
<body onload="evil_function()">
<!-- I am interpreted for EVIL! -->
<a href="javascript:evil_function()">a link</a>
<a href="data:text/html;base64,PHNjcmlwdD5hbGVydCgidGVzdCIpOzwvc2NyaXB0Pg==">data</a>
<a href="#" onclick="evil_function()">another link</a>
<p onclick="evil_function()">a paragraph</p>
<div style="display: none">secret EVIL!</div>
Expand All @@ -86,6 +92,7 @@
</head>
<body>
<a href="">a link</a>
<a href="">data</a>
<a href="#">another link</a>
<p>a paragraph</p>
<div style="display: none">secret EVIL!</div>
Expand All @@ -103,6 +110,7 @@
</head>
<body>
<a href="">a link</a>
<a href="">data</a>
<a href="#">another link</a>
<p>a paragraph</p>
<div>secret EVIL!</div>
Expand Down

0 comments on commit 443e5b3

Please sign in to comment.