XSS with HTML entities #592

Merged
merged 3 commits into from Jul 29, 2016
Next

added explicit matching for HTML entities to prevent XSS

commit 2cff85979be8e7a026a9aca35542c470cf5da523 @matt- matt- committed May 19, 2015
View
@@ -1094,7 +1094,8 @@ function escape(html, encode) {
}
function unescape(html) {
- return html.replace(/&([#\w]+);/g, function(_, n) {
+ // explicitly match decimal, hex, and named HTML entities
+ return html.replace(/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(\w+))/g, function(_, n) {
@rsp
rsp Jun 9, 2016 Contributor

@matt- Shouldn't there be an optional semicolon in the regex so that unescape wouldn't leave the semicolons in the string if they are there?
I mean something like this:
/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(\w+));?/g
instead of:
/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(\w+))/g
or possibly with the third non-capturing group to be consistent with the first two:
/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(?:\w+));?/g

@matt-
matt- Jul 14, 2016 Collaborator

oops. yes that looks right to me. I will update the PR with tests for that.

@rsp
rsp Jul 15, 2016 Contributor

I posted a PR matt-/marked#1 for your xss_html_entities branch with that little change but with no tests yet. Let me know how you think the tests should look like so I'll add some. Thanks.

n = n.toLowerCase();
if (n === 'colon') return ':';
if (n.charAt(0) === '#') {
@@ -0,0 +1,4 @@
+<p></p>
+<p></p>
+<p></p>
+<p></p>
@@ -0,0 +1,7 @@
+[URL](javascript:alert)
+
+[URL](vbscript:alert)
+
+[URL](javascript&colon;alert&#40;1&#41;)
+
+[URL](javascript&#58document;alert&#40;1&#41;)