Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitization reporting #912

Merged
merged 43 commits into from Feb 9, 2018

Conversation

Projects
None yet
3 participants
@westonruter
Copy link
Member

commented Jan 27, 2018

  • Add callback mechanism for being able to be informed of when an element or attribute is removed. (mainly applied in e9f39, pending review)
  • Add REST API endpoint allowing for HTML to be POST'ed and get back results for what changes were made. An element removal could be considered a sanitization error. The request would require an edit_post cap and require a nonce (see this screencast)
  • For normal frontend GET requests, consider being able to return validation errors in response header when requested via query param. The errors can be gathered during sanitization and then added before the output buffer is sent. (applied, pending review)

Relates to content validation (#843) and also determining when a theme or plugin does something invalid (#842).

Fixes #843.

@kienstra

This comment has been minimized.

Copy link
Collaborator

commented Jan 27, 2018

In Development

Hi @westonruter,
Thanks for starting this pull request, and for the outline of how to proceed. I'm working on this now if that's alright.

kienstra added some commits Jan 28, 2018

Issue #843: Tracking for removed nodes and attributes.
Building upon Weston's work and solution design,
Add a class to track whenever a node or attribute is removed.
And a method to get whether a node was removed.
The format of the stored nodes and attributes might change.
This will probably depend on the error reporting needed
in the REST API and GET request response.
Issue #843: Correct a failed Travis build by excluding a PHPCS rule.
There was an error:
Class file names should be based on the class name with 'class-'
But the format of the other test files is different.
So use that format, and exclude this rule for test files.
Issue #843: Add a method to process markup for AMP validtity.
The 'mutation_callback' will then track removed nodes and attributes.
Also, change the way in which we pass the 'mutation_callback.'
Before, it was part of the constructor of:
AMP_Tag_And_Attribute_Sanitizer.
Instead, move it to the $args of:
AMP_Content_Sanitizer::sanitize().
This will pass it to all of the sanitizer/* files when they're instantiated.
@todo: look at whether to call the callback for all node removals.
Issue #843: Track removed iframes in a helper method.
Before, there were 3 places in the file that called removeChild().
This was fine, but they now need to call the mutation callback.
So abstract these into remove_child().
Also, call the mutation callback in AMP_Video_Sanitizer.
Issue #843: Initial registration of the REST endpoint for validation.
Per Weston's description in PR #912,
It allows sending a POST with markup for validation.
The headers should have 'Content-Type' of 'application/json.'
And it should pass the markup in the param 'markup.'
The current response only has 'is_error.'
@todo: look at returning more in the response,
like the stripped tags and attributes.
Also, add nonce verification.
Issue #864: Support <amp-carousel> in 'Gallery' widget.
There's an existing handler to create 'amp-carousel' elements:
class AMP_Gallery_Embed_Handler.
So override the 'Gallery' widget class.
And use that in render_media().
Otherwise, that function is copied from the parent.
It calls gallery_shortcode() at the end.
Which doesn't have a filter for the markup.
Issue #843: Report removed attributes and nodes in a histogram.
This is only one approach.
But for now, the response has counts for:
'removed_nodes' and 'removed_attributes'.
If a <script> is removed, 'removed_nodes' will be:
{"script":1}.
The count will increment every time the same node type is removed.
There is a similar histogram for 'removed_attributes'.
Issue #843: Align equals signs vertically.
In response to Travis errors.
@todo: apply next requirement in PR #912.
Issue #843: Prepare to add headers to frontend GET requests.
Abstract the logic for the response into get_response().
This enables using it for the existing REST API logic,
And the new use-case of full-page GET requests.
@kienstra

This comment has been minimized.

Copy link
Collaborator

commented Jan 30, 2018

Screencast Of REST API Responses

Here's a screencast of the current validation via the REST API. The schema of the response will probably change as I look at outputting it in a frontend GET request.

Issue #864: Validation data in the response header.
In a frontend GET request, add a header:
'AMP-Validation-Error'.
This outputs whether the sanitizers stripped nodes or tags.
A possible output is:
'{"has_error":true,"removed_nodes":{"script":1},"removed_attributes":{"async":1}}'
*/
public static function finish_buffer_add_header( $output ) {
$markup = self::finish_output_buffering( $output );
AMP_Mutation_Utils::add_header();

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

We'll want to limit this to only be added when a user specifically requests this additional information. Like there should be a nonce that must be present to authorize the reporting.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Thanks, @westonruter. That's a good idea to add the header only for users with a nonce.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Applied with a nonce, details to come shortly.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

AMP_Mutation_Utils::add_header() now has a check for authorization via self::is_authorized(). That verifies the nonce and capability.

@@ -523,6 +535,7 @@ public static function finish_output_buffering( $output ) {
$dom = AMP_DOM_Utils::get_dom( $output );
$args = array(
'content_max_width' => ! empty( $content_width ) ? $content_width : AMP_Post_Template::CONTENT_MAX_WIDTH, // Back-compat.
'mutation_callback' => 'AMP_Mutation_Utils::track_removed',

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

Per above, this should be conditional based on whether an authorized nonce is present.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Good point. This is now only added if AMP_Mutation_Utils::is_authorized(). That checks for the nonce and capability.

@@ -506,13 +506,25 @@ public static function get_amp_component_scripts() {
* Start output buffering.
*/
public static function start_output_buffering() {
ob_start( array( __CLASS__, 'finish_output_buffering' ) );
ob_start( array( __CLASS__, 'finish_buffer_add_header' ) );

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

We can leave this as finish_output_buffering I think.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Thanks, this commit changes the name back to finish_output_buffering().

@kienstra

This comment has been minimized.

Copy link
Collaborator

commented Feb 1, 2018

Example Of GET Response Headers
Question About Schema

Hi @westonruter,
Here's a screencast of the frontend response headers showing AMP errors. What do you think about the schema of the response?

amp-validation-issue

@westonruter

This comment has been minimized.

Copy link
Member Author

commented Feb 1, 2018

@kienstra thanks for that great video. Really cool to see.

Aside: you can still use Postman with authenticated requests if you just install the basic auth plugin.

* @since 0.7
*
* @param object $child The node to remove.
* @param object $parent The parent node for which to remove the child (optional).

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

Instead of object these should probably be DOMNode.

Also, is the $parent even needed since it will always be available via $child->nodeParent?

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Thanks, @westonruter. This commit changes the @param type to DOMNode, and removes the second parameter. Thanks for pointing out that the second isn't needed.

$parent->removeChild( $child );
}
if ( isset( $this->args['mutation_callback'] ) ) {

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

It's possible that remove_child could be called without any node being removed, per the above if/elseif conditions. Should this check to see if something was actually removed? I'm curious as to the scenarios when the else condition would be met.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Thanks, that's a good point. This commit from above has the if conditional nested, to ensure there was a removal before calling the 'mutation_callback'.

Issue #864: Remove an extra conditional, nest the 'mutation_callback.'
As Weston mentioned, the child could get the parentNode.
So there's no reason for the elseif.
Also, this makes it possible to nest the 'mutation_callback.'
So it's only called if there's a removal.
$response = array(
'has_error' => self::was_node_removed(),
'removed_nodes' => self::$removed_nodes,
'removed_attributes' => self::$removed_attributes,

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

It would be useful if the processed markup were also returned here. It could then be used for previewing, for example.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Hi @westonruter, that would help. Do you think we should return the processed markup only if we're validating a limited amount of markup, like a single Gutenberg block? We could detect this by whether get_response() is called with a $markup argument.

if ( isset( $markup ) ) {
        $response['processed_markup'] = $markup;
};

If get_response() is called without a $markup argument, it's validating the entire document.

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 1, 2018

Author Member

Yes, if markup is supplied to validate, then it should get returned.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 1, 2018

Collaborator

Thanks, @westonruter.

These commits output the 'processed_markup', when $markup is passed to the function.
request-postman-validate

kienstra added some commits Feb 1, 2018

Issue #864: Rename function to finish_output_buffering().
This function has the same logic as the current get_buffer().
But the name is more descriptive.
Issue #843: Remove the extra variabl in the @return tag.
The return value is simply void.
So there's no need for any more information.
Issue #843: Add processed markup to REST API response.
Respond with the markup that is submitted in the request,
In the value 'processed_markup'.
Full-page requests won't have the markup in the response.
esc_html() might not be the best way to escape the markup.
But it doesn't display properly without escaping.
Issue #843: Fix an issue in the error message.
Before, the error message always appeared.
This is because it only checked that the response
had a value for 'has_error'.
But this needs to be true in order for there to be a reported error.
@kienstra

This comment has been minimized.

Copy link
Collaborator

commented Feb 8, 2018

Fixing My Mistakes In Resolving Merge Conflicts

Hi @ThierryA,
I also see an issue with this PR. I think they're related to how I resolved the merge conflicts. I'm working on them now.

Issue #843: Revert renaming of methods, adjust unit tests.
I had renamed some methods in this branch: add/sanitization-reporting.
Also, remove the parameter from finish_output_buffering().
That function in the 'develop' branch no longer has as parameter.
*
* @return void.
*/
public static function add_header() {

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

This isn't actually used yet. I think it was a foundation for validating plugins on activation (#842)

@@ -11,6 +11,9 @@
</properties>
</rule>

<rule ref="WordPress.Files.FileName.InvalidClassFileName">
<exclude-pattern>tests/*</exclude-pattern>
</rule>

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Almost all of the test files begin with test-, while this rule states that they should begin with class-.

Issue #843: Remove special characters, update documentation.
There were different characters in prepare_response(),
Mabye from copying from GitHub.
Also, adjust documentation, and add a @codingStandardsIgnoreEnd.
*
* @return void.
*/
public static function amp_rest_validation() {

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

This isn't currently used, though it would be helpful for Gutenberg block validation.

@kienstra

This comment has been minimized.

Copy link
Collaborator

commented Feb 8, 2018

Ready For Review

Hi @ThierryA,
This pull request is ready for review, with the error addressed.

@ThierryA
Copy link
Collaborator

left a comment

Thank @kienstra, nice an clean coding here. I left my CR below.

@@ -704,6 +704,9 @@ private function sanitize_disallowed_attributes_in_node( $node, $attr_spec_list
foreach ( $attrs_to_remove as $attr ) {
$node->removeAttributeNode( $attr );
if ( isset( $this->args['mutation_callback'], $attr->name ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attr->name );

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

I believe AMP_Mutation_Util::ATTRIBUTE_REMOVED could be used here.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Thanks, this commit from above substitutes the constant.

if ( method_exists( $node, 'removeAttribute' ) ) {
$node->removeAttribute( $attribute );
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $node, 'removed_attr', $attribute );

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

I believe AMP_Mutation_Util::ATTRIBUTE_REMOVED could be used here.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Thanks, this commit from above substitutes the constant.

if ( method_exists( $child->parentNode, 'removeChild' ) ) { // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
$child->parentNode->removeChild( $child ); // phpcs:ignore WordPress.NamingConventions.ValidVariableName.NotSnakeCaseMemberVar.
if ( isset( $this->args['mutation_callback'] ) ) {
call_user_func( $this->args['mutation_callback'], $child, 'removed' );

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

I believe AMP_Mutation_Util::NODE_REMOVED could be used here.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Thanks, this commit uses that constant instead of the string.

/**
* The argument if an attribute was removed.
*
* @const array.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

For constants the @var tag must be used for the PHPDoc to be valid. This comment applies in multiple constants in this PR.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Thanks, this commit changes the tags to @var.

* Tracks when a sanitizer removes an attribute or node.
*
* @param array $histogram The count of attributes or nodes removed.
* @param string $key The attribute or node name removed.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

It is a wonder Travis didn't pickup alignment comments miss alignment, it might only sniff the variables alignment actually.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Should this also align the comments, like:

@param array  $histogram The count of attributes or nodes removed.
@param string $key       The attribute or node name removed.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

Yes

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

This commit aligns the spacing of the comments. The pre-commit hook blocked aligning this in two places, possibly because there was only one @param and a @return.

* @return void.
*/
public static function display_error() {
$error = isset( $_GET[ self::ERROR_QUERY_KEY ] ) ? sanitize_text_field( wp_unslash( $_GET[ self::ERROR_QUERY_KEY ] ) ) : ''; // WPCS: CSRF ok.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

For compliance sake, it would be good to check nonce here.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

Good point, @ThierryA. This commit adds nonce verification, via check_admin_referer().

*
* @since 0.7
*/
class AMP_Mutation_Utils {

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

What does Mutation stands for here, it could because I am native French but AMP_Validation_Utils would make more sense to me.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

You're right that AMP_Validation_Utils would be better. Mutation refers to the mutation_callback, which tracks when the sanitizer removes an attribute or node.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

This commit changes the name of the classes.

*
* @since 0.7
*/
class AMP_Mutation_Utils {

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

From an architectural perspective, it is interesting that all methods are public static. Are they really all meant to but used in that purposes?

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

It would be much better to not have all of these methods static. This might mean bootstrapping the class somewhere else.

It's currently calling AMP_Validation_Utils::init(); here in amp.php. It might help if we could instantiate the class somewhere, like:

$validation_utils = new AMP_Validation_Utils();
validation_utils->init();

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 8, 2018

Collaborator

amp_post_meta_box() might be a good model for this.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Feb 8, 2018

Collaborator

Sure, this is not really a pressing need or a blocker though, it doesn't have to be addressed in this PR.

kienstra added some commits Feb 8, 2018

Issue #843: Change @const to @var for constants.
As Thierry mentioned,
this is required for a valid PHPDoc.
Issue #843: Rename class to 'AMP_Validation_Utils'
This was previously 'AMP_Mutation_Utils'
The new name describes better what this does.
Issue #843; Use constants instead of string literals.
On Thierry's suggestion,
As these were already stored in constants.
Issue #843: Add nonce verification for the editor message.
Use check_admin_referer(),
as this will display the 'are you sure' message.
Also , update the test.
Issue #843: Align comments in addition to variable names.
In PHPDoc blocks, most of the comments weren't aligned.
The types aren't aligned.
@kienstra

This comment has been minimized.

Copy link
Collaborator

commented Feb 8, 2018

Applied Code Review Suggestions

Hi @ThierryA,
Thanks for waiting for this. All of your suggestions above are applied, exception for the (good) question about whether all of the methods should be static.

westonruter added some commits Feb 8, 2018

Only report mutations when node/attribute is removed due to invalidity
* Skip reporting iframe removal when merely being moved
* Skip reporting removal of form[action] attribute when transformed to action-xhr.
* Rename sanitizer base methods to make explicit they are for removal of invalid nodes.
*
* @var array.
*/
public static $removed_nodes;

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 9, 2018

Author Member

Let's rename this to removed_elements since that is what it contains.

*/
public static function init() {
add_action( 'rest_api_init', array( __CLASS__, 'amp_rest_validation' ) );
add_action( 'save_post', array( __CLASS__, 'validate_content' ), 10, 2 );

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 9, 2018

Author Member

Instead of doing this in the save_post action, let's do this right inside the edit_form_top so that when the post is loaded they will see the notification. In this way they will be able to activate the AMP plugin and then be informed before they even start editing a post that it has invalid elements.

This would then eliminate the need for the nonce in the request.

*/
public static function validate_content( $post_id, $post ) {
unset( $post_id );
if ( ! self::is_authorized() ) {

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 9, 2018

Author Member

This also needs to check if post_supports_amp( $post ). Because if not, the user wouldn't want a warning.

@@ -752,7 +752,9 @@ public static function start_output_buffering() {
* @see AMP_Theme_Support::start_output_buffering()
*/
public static function finish_output_buffering() {
echo self::prepare_response( ob_get_clean() ); // WPCS: xss ok.
$output = self::prepare_response( ob_get_clean() );
AMP_Validation_Utils::add_header();

This comment has been minimized.

Copy link
@westonruter

westonruter Feb 9, 2018

Author Member

I'm going to remove thus until we have it being utilized in the plugin.

This comment has been minimized.

Copy link
@kienstra

kienstra Feb 9, 2018

Collaborator

Sure, that's fine.

Feedback addressed

@kienstra
Copy link
Collaborator

left a comment

Approved

Hi @westonruter,
This pull request is approved. Thanks a lot for improving this so much. The notices of which tags and attributes were removed really helps.

@westonruter westonruter merged commit 7847f0e into develop Feb 9, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@westonruter westonruter deleted the add/sanitization-reporting branch Feb 9, 2018

@westonruter westonruter referenced this pull request Feb 10, 2018

Merged

Update sanitization reporting #951

2 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.