Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(security): no warning when sanitizing escaped html (#9392) #9413

Merged
merged 1 commit into from Jun 23, 2016

Conversation

wkwiatek
Copy link
Contributor

@wkwiatek wkwiatek commented Jun 21, 2016

Please check if the PR fulfills these requirements

What kind of change does this PR introduce? (check one with "x")

[x] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Build related changes
[ ] CI related changes
[ ] Other... Please describe:

What is the current behavior? (You can also link to an open issue here)
#9392

What is the new behavior?
No warning when properly escaped html is passed to sanitize

Does this PR introduce a breaking change? (check one with "x")

[ ] Yes
[x] No

If this PR contains a breaking change, please describe the impact and migration path for existing applications: ...

Other information:

@mprobst
Copy link
Contributor

mprobst commented Jun 21, 2016

Thanks for the PR!

This looks conceptually fine (we want to compare to the input, not to the intermediate parsed state), and the test looks good, but I don't understand why it works.

Shouldn't safeHtml, once we parsed and re-serialized it, also no longer contain the escaped entity, but rather the unicode code point directly? And if so, how or why is it different from the treatment of unsafeHtml (before your change), which was also just parsed and re-serialized?

Any idea what's going on there?

@wkwiatek
Copy link
Contributor Author

wkwiatek commented Jun 21, 2016

The problem is produced by these lines:
https://github.com/angular/angular/pull/9413/files#diff-475057312e0ba8afc2f24a4df80d6eaaL249 https://github.com/angular/angular/pull/9413/files#diff-475057312e0ba8afc2f24a4df80d6eaaL243

First modifies the input so that it's no longer the initial value in some cases (as you can see unsafeHtml becomes parsedHtml). DOM.setInnerHTML(containerEl, unsafeHtml); leaves the treatment to the browser so in the end the value may be a little bit different than the one we passed in to the function.

@mprobst
Copy link
Contributor

mprobst commented Jun 21, 2016

Assume this code, from a DevTools console session:

let d = document.createElement('div');
// <div>​</div>​
let unsafeHtml = 'hello &#x1f680;'; // the input to sanitize
d.innerHTML = unsafeHtml; // parse it
let safeHtml = d.innerHTML; // serialize it back
// "hello 🚀" -- safeHtml now contains the actual rocket character, unescaped.
safeHtml === unsafeHtml;
// false

Turns out our code explicitly encodes entities in safeHtml, in particular surrogate pairs. That means with this change, it'll work if entities in unsafeHtml are encoded originally, but then will fail if they were not because it encodes all entities (that's what I forgot, which was confusing me here).

E.g. if you add a test case for sanitizeHtml('hellö') (with the actual Unicode character in there), this will break as the result will be 'hell&#246;', won't it? Fundamentally, this code is not aware of what is encoded and what is not in the input at the point of the comparison, so it seems to me that there is no real winning here, is there?

@wkwiatek
Copy link
Contributor Author

The code you've just pasted is actually fine and reflects the situation except of naming. Look what safeHtml is in the source: https://github.com/angular/angular/pull/9413/files#diff-475057312e0ba8afc2f24a4df80d6eaaL253.

safeHtml at the end in condition is simply not related in any way with this whole stuff we're talking about. I refer to do {} while () block which I guess is only for mXSS protection. And was accidentally overwriting input of the function.

@mprobst
Copy link
Contributor

mprobst commented Jun 21, 2016

Could you add this test case and see what happens?

    t.it('supports sanitizing escaped entities', () => {
      t.expect(sanitizeHtml('hellö')).toEqual('hellö');
      t.expect(logMsgs).toEqual([]);
    });

@wkwiatek
Copy link
Contributor Author

No problem.

Here's the output:

Expected 'hell&#246;' to equal 'hellö'.
Expected [ 'WARNING: sanitizing HTML stripped some content.' ] to equal [  ].

Both I've expected before starting tests.

@mprobst
Copy link
Contributor

mprobst commented Jun 21, 2016

So, do you think this change is worth it, given that we fundamentally cannot fix the problem?

@vicb vicb added flag: can be closed? area: security Issues related to built-in security features, such as HTML sanitation labels Jun 22, 2016
@wkwiatek
Copy link
Contributor Author

wkwiatek commented Jun 22, 2016

I think that fundamentally it works pretty well. Look, sanitizeHtml('hellö') gives you the output: hell&#246;which is fine (because really sanitizes the output) and also gives you warning that something was stripped. Then try to add a test like this:

t.it('supports sanitizing escaped entities', () => {
  t.expect(sanitizeHtml('hell&#246;')).toEqual('hell&#246;');
  t.expect(logMsgs).toEqual([]);
});

Now input and output are exactly the same. Both tests in this PR will succeed. However in the master second will fail because of warning message that should not be logged in this case.

I think it's still worth to add because in current version the information is just misleading.

Make sense?

@@ -223,11 +223,11 @@ function stripCustomNsAttrs(el: any) {
* Sanitizes the given unsafe, untrusted HTML fragment, and returns HTML text that is safe to add to
* the DOM in a browser environment.
*/
export function sanitizeHtml(unsafeHtml: string): string {
export function sanitizeHtml(entryHtml: string): string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you rename this to unsafeHtmlInput? It's kind of important here to keep track of what's safe and what isn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mprobst
Copy link
Contributor

mprobst commented Jun 22, 2016

We'll still warn people about effectively harmless changes (input changes that are not actually changing anything), but I can see your reasoning. Also, the change is harmless enough.

Could you fix the parameter name? Otherwise good to go.

@mprobst mprobst merged commit 98cef76 into angular:master Jun 23, 2016
@mprobst
Copy link
Contributor

mprobst commented Jun 23, 2016

Merged. Thanks for the contribution @wkwiatek!

@wkwiatek wkwiatek deleted the issue9392 branch June 23, 2016 20:36
@wkwiatek
Copy link
Contributor Author

Thanks. You're welcome!

@angular-automatic-lock-bot
Copy link

This issue has been automatically locked due to inactivity.
Please file a new issue if you are encountering a similar or related problem.

Read more about our automatic conversation locking policy.

This action has been performed automatically by a bot.

@angular-automatic-lock-bot angular-automatic-lock-bot bot locked and limited conversation to collaborators Sep 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: security Issues related to built-in security features, such as HTML sanitation cla: yes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants