Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize entire HTML output when theme support is present #888

Merged
merged 9 commits into from Jan 23, 2018

Conversation

Projects
None yet
3 participants
@westonruter
Copy link
Member

commented Jan 23, 2018

See #875.

This largely eliminates the need for amp_component_scripts entirely because the AMP component scripts are discovered via the whitelist sanitizer in #882. So for example, the Form sanitizer in #871 can be used without needing to define its get_scripts() method.

For another PR:

  • Fix character encoding issue. See #855 (comment)
  • Add elements from head to sanitization.

ThierryA and others added some commits Jan 18, 2018

Thierry Muller
Merge pull request #884 from Automattic/fix/script-additional-allowed…
…-tag

Remove erroneous additional allowance of script[type=text/javascript]
self::$amp_scripts = array_merge( self::$amp_scripts, $scripts );
self::$amp_styles = array_merge( self::$amp_styles, $styles );
$output = preg_replace( '#(<body.*?>)(.+)(</body>)#si', '$1' . $sanitized_inner_body . '$3', $output );

This comment has been minimized.

Copy link
@DavidCramer

DavidCramer Jan 23, 2018

Contributor

@westonruter here we're replacing the body with the sanitised body. This is because it's returning AMP_DOM_Utils::get_content_from_dom( $dom ) from the sanitize method. Wouldn't this be easier if the method returned $dom->saveHTML() if the content passed is a complete document? This would stop the need to replace the body after it was sanitized.

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 23, 2018

Author Member

Yes, potentially. The sanitize method is better called sanitize_content. Better handling of sanitizing entire documents should be done.

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 23, 2018

Author Member

@DavidCramer how about this: 876c22f

This comment has been minimized.

Copy link
@DavidCramer

DavidCramer Jan 23, 2018

Contributor

@westonruter Thats nice, makes way for a full doc later on. I see that you made a comment about the head needs updating to mandatory_parent_blacklist so that makes sense now.

@DavidCramer
Copy link
Contributor

left a comment

Other than that I'm happy with with this.

westonruter added some commits Jan 23, 2018

Lower-case tag param in is_self_closing_tag check; replace static met…
…hod w/ static return with a plain static variable
* as otherwise elements from the HEAD could get added to the BODY.
*/
$sanitized_inner_body = AMP_DOM_Utils::get_content_from_dom( $dom );
$output = preg_replace( '#(<body.*?>)(.+)(</body>)#si', '$1' . $sanitized_inner_body . '$3', $output );
// Inject required scripts.
$output = preg_replace(

This comment has been minimized.

Copy link
@DavidCramer

DavidCramer Jan 23, 2018

Contributor

@westonruter get_amp_component_scripts doesn't require the argument anymore.

@DavidCramer
Copy link
Contributor

left a comment

Awesome work @westonruter. This will help to move forward, thanks!

@westonruter

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2018

Hummm, something isn't working quite right.

If I add this to a template:

<script type="text/javascript">bad</script>

It is not getting stripped out as expected.

@DavidCramer

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2018

hmm. testing as well.

@westonruter

This comment has been minimized.

Copy link
Member Author

commented Jan 23, 2018

😅 whew, it's just because #884 wasn't merged yet from the 0.6 branch. Fixed with merging in ef69f2d.

@ThierryA
Copy link
Collaborator

left a comment

Awesome work @westonruter @DavidCramer. This is really exciting!

$original_html = trim( ob_get_clean() );
$sanitized_html = AMP_Theme_Support::finish_output_buffering( $original_html );
$this->assertContains( '<meta charset="utf-8">', $sanitized_html );

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 23, 2018

Collaborator

@westonruter @DavidCramer these tests could have a bit more coverage here. I believe it is ok to do that at a later stage since the other components sanitization (form etc.) are still WIP.

@ThierryA ThierryA merged commit b8af86a into develop Jan 23, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@ThierryA ThierryA deleted the add/output-buffer-sanitization branch Jan 23, 2018

@westonruter westonruter added this to the v0.7 milestone Jan 23, 2018

kienstra added a commit that referenced this pull request Jan 23, 2018

Issue #864: Remove widget subclasses that used filter_the_content().
That function has been removed in PR #888.
It looks like that uses full-page buffering.
So remove the widget subclasses that use that.
@todo: look at regressions in the 'Archives' and 'Categories' dropdowns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.