Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unrecognized elements are silently removed without validation error reporting #1100

Closed
westonruter opened this Issue Apr 27, 2018 · 4 comments

Comments

@westonruter
Copy link
Member

westonruter commented Apr 27, 2018

When coming across an element that is not recognized at all, the following code in process_node will be invoked:

https://github.com/Automattic/amp-wp/blob/1e2cd22f1bfe8ab9d833c901e02111120f52fe72/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L284-L292

In the call to replace_node_with_children it will call remove_node instead of remove_invalid_child, and only the latter will report the removal. This needs to be improved to report invalidity when it is being removed for that reason.

@westonruter westonruter added this to the v1.0 milestone May 2, 2018

@westonruter westonruter added this to Definition in v1.0 May 2, 2018

@postphotos postphotos moved this from Definition to To do in v1.0 May 2, 2018

@postphotos postphotos added the release label Jul 9, 2018

@kienstra

This comment has been minimized.

Copy link
Collaborator

kienstra commented Jul 20, 2018

Question About remove_invalid_child()

Hi @westonruter,
Thanks for your detailed description.

I'm probably not understanding this. But it looks like when replace_node_with_children() calls remove_node(), remove_node() calls remove_invalid_child(). Is that enough to report the removed node?

https://github.com/Automattic/amp-wp/blob/024e450798c35a89c3390c410ab42522a924f712/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1700-L1709

@kienstra

This comment has been minimized.

Copy link
Collaborator

kienstra commented Jul 21, 2018

I think I understand this now 😄

@hellofromtonya hellofromtonya self-assigned this Jul 24, 2018

@hellofromtonya hellofromtonya moved this from To do to In progress in v1.0 Jul 24, 2018

@hellofromtonya

This comment has been minimized.

Copy link
Contributor

hellofromtonya commented Jul 25, 2018

Recreating the Issue

Using this HTML in a new post:

<h2>Finds and reports these invalid elements</h2>

<imgs src="/none.jpg" width="100" height="100" alt="None"></imgs>

<baz class="baz"><p>Invalid baz parent element.</p></baz>

<h2>Silently removes these elements</h2>

<foo class="foo">Invalid baz tag.</foo>

<foobaz class="foobaz"><zab>Invalid <span>nested elements</span></zab></foobaz>

<bazbar invalid="bazbar"><span>Is an invalid "bar" tag.</span></bazbar>

<div class="parent">
	<p>Nesting valid and invalid elements.</p>
	<invalid class="invalid">Is an invalid "invalid" tag</invalid>
	<bazfoo class="bazfoo">Is an invalid "foo" tag <p id="testing-id" style="width: 100px">This should pass.</p></bazfoo>
</div>

<ul>
	<li>hello</li>
	<lili>world</lili>
</ul>

Then publish the post.

Results

  1. It catches <imgs> and <baz> and reports these 2 elements as invalid.
  2. It does not catch the other invalid elements and silently removes them: <foo>, <foobaz>, <zab>, <bazbar>, <invalid>, <bazfoo>, and <lili>.

screen shot 2018-07-25 at 2 54 15 pm

Tracing Issue via Xdebug

The 2 invalid elements that were caught flowed into replace_node_with_children() and into the if code block to run $this->remove_node( $node );:

https://github.com/Automattic/amp-wp/blob/024e450798c35a89c3390c410ab42522a924f712/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1662-L1668

The other invalid elements that were not caught flowed into the else code block:

https://github.com/Automattic/amp-wp/blob/024e450798c35a89c3390c410ab42522a924f712/includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php#L1668-L1687

Possible Problem

There is no validation error being generated in the else for the node that's being replaced by the fragment after pushing child nodes back onto the stack.

hellofromtonya added a commit that referenced this issue Jul 25, 2018

Remove unrecognized elements and report.
Closes #1100.

Fixes the problem of silently removing unrecognized elements.  Uses the same code pattern and methodology as `remove_invalid_child`:

- Prevent double-reporting nodes that are rejected.
- Check through should_sanitize_validation_error().
- Replace if true.
- Else, store in the `should_not_replace_nodes` property.

@hellofromtonya hellofromtonya moved this from In progress to Ready for review in v1.0 Jul 28, 2018

@hellofromtonya hellofromtonya moved this from Ready for review to Ready for Merging in v1.0 Jul 28, 2018

@hellofromtonya hellofromtonya moved this from Ready for Merging to Ready for QA in v1.0 Jul 28, 2018

@kienstra

This comment has been minimized.

Copy link
Collaborator

kienstra commented Aug 3, 2018

Moving To "Ready For Merging"

If it's alright, I'm moving this to "Ready For Merging." If you think this could use functional testing, feel free to move it back.

@kienstra kienstra moved this from Ready for QA to Ready for Merging in v1.0 Aug 3, 2018

@kienstra kienstra moved this from Ready for Merging to In Production in v1.0 Dec 11, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.