Skip to content
This repository was archived by the owner on Sep 5, 2024. It is now read-only.

fix(autocomplete): properly parse unsafe strings to prevent advanced XSS attacks #8368

Merged
merged 1 commit into from
Aug 31, 2016

Conversation

devversion
Copy link
Member

@devversion devversion commented May 6, 2016

  • The highlight controller no longer sanitizes the term or content, because this would require complex libraries like ngSanitize
  • Also there is no need to sanitize those strings, because after this change we never insert them as HTML nodes anymore.
  • We decompose the content string into different tokens and then compose them together into different elements.
    This allows us to be 100% sure that we only insert trusted HTML code and no unsafe HTML code, which could include XSS attacks.
  • This approach also supports now HTML identifiers and special text characters in the highlight text and content.

Fixes #8356

@robertmesserle
Copy link
Contributor

@ThomasBurleson LGTM, closes #8388 as well.

@ThomasBurleson ThomasBurleson modified the milestone: Deprecated May 26, 2016
@devversion devversion modified the milestones: - Backlog, Deprecated May 27, 2016
@devversion devversion force-pushed the fix/parse-unsafe-regex branch from f1f73cd to e25a01c Compare June 4, 2016 23:15
@devversion devversion added the needs: rebase This PR needs to be rebased on the latest commits from master and conflicts need to be resolved label Aug 13, 2016
@devversion devversion force-pushed the fix/parse-unsafe-regex branch from e25a01c to 0b31a12 Compare August 26, 2016 13:38
@devversion devversion removed the needs: rebase This PR needs to be rebased on the latest commits from master and conflicts need to be resolved label Aug 26, 2016
@jammed343
Copy link

jammed343 commented Aug 28, 2016

This doesn't work properly with html escaping. Also Is there a reason why you are using html escaping instead of regex escaping.

I changed:

function parseUnsafeString(text) {
    return angular.element('<div>').text(text).html();
}

To:

function parseUnsafeString(text) {
    // http://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
    return text.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
 }

Which makes it work well for me.

Otherwise i can search for "amp" and all the ampersands come up in search results with the current solution.

I suggest we add regex escape and then merge this into angular-material

@ThomasBurleson ThomasBurleson added the in progress Mainly for in progress PRs, but may be used for issues that require multiple PRs label Aug 28, 2016
@devversion
Copy link
Member Author

@jammed343 I cannot see that issue. The way we currently parse the unsafe string, seems to work as expected. It just depends on the native browser behavior, which makes sure that the displayed results and the search text match.

Can you explain what issues you are experiencing? Did you run that PR already?

@jammed343
Copy link

jammed343 commented Aug 28, 2016

@devversion
Using your current html escaping gives me this:
screen shot 2016-08-28 at 11 24 11 pm
screen shot 2016-08-28 at 11 24 42 pm

this last screenshot is just to show that the data is fine normally

screen shot 2016-08-28 at 11 26 21 pm

@devversion
Copy link
Member Author

devversion commented Aug 28, 2016

@jammed343 Everything works for me as expected. And the tests are also passing.

How did you pull / download this PR to run on your codebase?

Before After
image image

Also the way this Regex parses the string is not valid. We want to retrieve the HTML encoded way, instead of the UTF-8 text.

Here is a demo: http://codepen.io/DevVersion/pen/PzrQGy

@jammed343
Copy link

@devversion I used the DevVersion:fix/parse-unsafe-regex branch which didn't work for me at first. then i modified the parseUnsafeString function you added to use regex escape and that worked for me.

@devversion
Copy link
Member Author

devversion commented Aug 28, 2016

@jammed343 Hmm weird. The tests pass, the demos are working as expected. It may be an issue on the PR pulling. Not sure how you served your application with the new changes etc.

Also as seen in my demo about the parsing, it is expected to turn & into &amp; so we can properly match them.

@jammed343
Copy link

@devversion

I added some console output to your original pull:

if (text === null || state.unsafeText !== prevState.unsafeText) {
5        text = parseUnsafeString(state.unsafeText);
5      }
-      if (regex === null || state.term !== prevState.term) {
5        regex = getRegExp(parseUnsafeString(state.term), flags);
5      }
Added these console.logs
4      console.log("STATE CHANGE");
4      console.log("state.unsafeText: ", state.unsafeText);
4      console.log("state.term: ", state.term);
4      console.log("text: ", parseUnsafeString(state.unsafeText));
4      console.log("regex: ", getRegExp(parseUnsafeString(state.term), flags));
4      console.log("$element.html", text.replace(regex, '<span class="highlight">$&</span>'));
4
4      $element.html(text.replace(regex, '<span class="highlight">$&</span>'));
STATE CHANGE
angular-material.js:23902 state.unsafeText:  Animals & Pet Supplies > Pet Supplies > Bird Supplies > Bird Cages & Stands
angular-material.js:23903 state.term:  a
angular-material.js:23904 text:  Animals &amp; Pet Supplies &gt; Pet Supplies &gt; Bird Supplies &gt; Bird Cages &amp; Stands
angular-material.js:23905 regex:  /a/gi
angular-material.js:23906 $element.html <span class="highlight">A</span>nim<span class="highlight">a</span>ls &<span class="highlight">a</span>mp; Pet Supplies &gt; Pet Supplies &gt; Bird Supplies &gt; Bird C<span class="highlight">a</span>ges &<span class="highlight">a</span>mp; St<span class="highlight">a</span>nds

@ThomasBurleson ThomasBurleson added pr: merge ready This PR is ready for a caretaker to review and removed in progress Mainly for in progress PRs, but may be used for issues that require multiple PRs needs: review This PR is waiting on review from the team labels Aug 28, 2016
@jammed343
Copy link

@devversion
Your latest commit:
776d82a
is now working.

@devversion devversion added the in progress Mainly for in progress PRs, but may be used for issues that require multiple PRs label Aug 28, 2016
@devversion devversion removed their assignment Aug 28, 2016
@devversion devversion added needs: review This PR is waiting on review from the team and removed in progress Mainly for in progress PRs, but may be used for issues that require multiple PRs labels Aug 28, 2016
@devversion devversion removed the pr: merge ready This PR is ready for a caretaker to review label Aug 28, 2016
@devversion
Copy link
Member Author

devversion commented Aug 28, 2016

@ThomasBurleson @jammed343 I made some changes regarding the way we display the highlighted text.

Those changes were necessary to support HTML entity identifiers and special characters inside of the highlight text.

The current approach was very unstable and really dangerous for XSS attacks, since we are setting the HTML of the element.

@ThomasBurleson
Copy link
Contributor

@jammed343 - Thx for your persistence on improving this PR. 👍
@devversion - lgtm.

@ThomasBurleson ThomasBurleson added pr: merge ready This PR is ready for a caretaker to review and removed needs: review This PR is waiting on review from the team labels Aug 28, 2016
@ThomasBurleson ThomasBurleson modified the milestones: 1.1.1, - Backlog Aug 28, 2016
@ThomasBurleson
Copy link
Contributor

@devversion - plz squash and (if needed) rebase.

@devversion devversion changed the title fix(autocomplete) highlight directive should transform regex text into html entities fix(autocomplete): properly parse unsafe strings to prevent advanced XSS attacks Aug 28, 2016
@devversion devversion force-pushed the fix/parse-unsafe-regex branch from 9173d08 to 17ccbda Compare August 29, 2016 00:08
…o html entities.

* The highlight controller no longer sanitizes the term or content, because this would require complex libraries like `ngSanitize`
* Also there is no need to sanitize those strings, because after this change we never insert them as HTML nodes anymore.

* We decompose the content string into different tokens and then compose them together into different elements.
   This allows us to be 100% sure that we only insert trusted HTML code and no unsafe HTML code, which could include XSS attacks.

* This approach also supports now HTML identifiers and special text characters in the highlight text and content.

Fixes angular#8356
@devversion devversion force-pushed the fix/parse-unsafe-regex branch from 17ccbda to fee017e Compare August 29, 2016 00:09
@hansl hansl merged commit 9b9059a into angular:master Aug 31, 2016
@devversion devversion deleted the fix/parse-unsafe-regex branch August 31, 2016 21:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr: merge ready This PR is ready for a caretaker to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants