XSS with HTML entities #592

merged 3 commits into from Jul 29, 2016

added explicit matching for HTML entities to prevent XSS

commit 2cff85979be8e7a026a9aca35542c470cf5da523 @matt- matt- committed May 19, 2015
@@ -1094,7 +1094,8 @@ function escape(html, encode) {
function unescape(html) {
- return html.replace(/&([#\w]+);/g, function(_, n) {
+ // explicitly match decimal, hex, and named HTML entities
+ return html.replace(/&(#(?:\d+)|(?:#x[0-9A-Fa-f]+)|(\w+))/g, function(_, n) {
rsp Jun 9, 2016 Contributor

@matt- Shouldn't there be an optional semicolon in the regex so that unescape wouldn't leave the semicolons in the string if they are there?
I mean something like this:
instead of:
or possibly with the third non-capturing group to be consistent with the first two:

matt- Jul 14, 2016 Collaborator

oops. yes that looks right to me. I will update the PR with tests for that.

rsp Jul 15, 2016 Contributor

I posted a PR matt-/marked#1 for your xss_html_entities branch with that little change but with no tests yet. Let me know how you think the tests should look like so I'll add some. Thanks.

n = n.toLowerCase();
if (n === 'colon') return ':';
if (n.charAt(0) === '#') {
@@ -0,0 +1,4 @@
@@ -0,0 +1,7 @@