Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for comments #871

Closed
wants to merge 39 commits into from

Conversation

DavidCramer
Copy link
Contributor

@DavidCramer DavidCramer commented Jan 17, 2018

See #797, #862.


// Set a target if needed.
if ( ! $node->hasAttribute( 'target' ) ) {
$node->setAttribute( 'target', '_blank' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the spec: https://www.ampproject.org/docs/reference/components/amp-form#target

Indicates where to display the form response after submitting the form. The value must be _blank or _top.

The default I think should be _top as that is closer to the default behavior for a form. The actual default is _self but that's not valid AMP. So I think this should be doing:

if ( ! in_array( $node->hasAttribute( 'target' ), array( '_top', '_blank' ) ) ) {
    $node->setAttribute( 'target', '_top' );

// Correct action.
if ( $node->hasAttribute( 'action' ) && ! $node->hasAttribute( 'action-xhr' ) ) {
$action_url = $node->getAttribute( 'action' );
$action_url = str_replace( 'http:', '', $action_url );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that http: appears somewhere else in the URL. So then this should instead use a regex like: preg_replace( '#^http:(?=//)#', '', $action_url ) to ensure it only replaces at the beginning of the string.

$action_url = str_replace( 'http:', '', $action_url );
$node->setAttribute( 'action', $action_url );

if ( 'post' === $node->getAttribute( 'method' ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this could compare with $method.

continue;
}

if ( ! $node->hasAttribute( 'action' ) || '' === $node->getAttribute( 'action' ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the spec: https://www.ampproject.org/docs/reference/components/amp-form#target

This attribute is required for method=GET. For method=POST, the action attribute is invalid, use action-xhr instead.

Is the second part of this conditional intended to check if 'get' === $node->getAttribute( 'method' )? Since the default method is get then I think there should be something above that does:

$method = 'get';
if ( $node->hasAttribute( 'method' ) ) {
    $method = strtolower( $node->getAttribute( 'method' ) );
}

And then use the $method variable going forward.

}

if ( ! $node->hasAttribute( 'action' ) || '' === $node->getAttribute( 'action' ) ) {
$node->parentNode->removeChild( $node );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right. If method is get and there is no action, then what this really should do I think is just set the action to be esc_url_raw( 'https://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'] ), because the default action for a form is the URL that the form appears on.

@@ -35,7 +33,7 @@ public static function get_dom_from_content( $content ) {
* Add utf-8 charset so loadHTML does not have problems parsing it.
* See: http://php.net/manual/en/domdocument.loadhtml.php#78243
*/
$result = $dom->loadHTML( '<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body>' . $content . '</body></html>' );
$result = $dom->loadHTML( $document );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes here are conflicting in develop.

@westonruter
Copy link
Member

westonruter commented Jan 17, 2018

For handling form method=POST submissions I think we can pretty cleanly handle them via https://www.ampproject.org/docs/reference/components/amp-form#redirecting-after-a-submission

Usually a POST request should be accompanied by a redirect. So we should be able to handle that via:

add_filter( 'wp_redirect', function( $location ) {
    $location = wp_sanitize_redirect( $location );
    header( "AMP-Redirect-To: $location" );
    header( "Access-Control-Expose-Headers: AMP-Redirect-To", false );
    return false; // Prevent header( 'Location:' ). I'm not sure this is necessary, or if we can just return $location.
}, 1000 );

Some more examples in general I've found at https://ampbyexample.com/components/amp-form/

// @todo Add more validation checking and potentially the whitelist sanitizer.
$output = $dom->saveHTML();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidCramer I chatted about this with @ThierryA, but I think we should avoid having to load the entire response into the DOM for sanitization. I think that we should limit sanitization to just the post content, 3rd-party widgets, and other “leaf nodes” which we know are liable to be invalid AMP.

Is this right? Or am I off track? My worry is that loading the HTML into a DOMDocument for every request will be too heavy. But at the same time, if we're already spinning up a DOMDocument for every instance of the_content() maybe it is actually better to just do one for the entire HTML. This would certainly simplify somethings for sanitizing widgets (cc @kienstra). It could also mean that we wouldn't have to have this ad hoc logic to search for elements requiring components: https://github.com/Automattic/amp-wp/blob/e60cb152a50e2ffbf5504d41aee859ad1d0e0baa/includes/class-amp-theme-support.php#L448-L455

Maybe we should decide to not prematurely optimize and instead keep things simple at first and just go ahead and sanitize the entire response. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See findings at #875 (comment)

It seems we should definitely not sanitize the entire output buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter whats the consensus now on this? It looks like your findings indicated it may be a good idea to sanitize the whole buffer output. I personally think this is the most efficient. I spoke with @ThierryA about this yesterday and he was going to do some tests, like you did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning toward sanitizing the entire output buffer now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidCramer do you want to build off of #875 (comment) in a new PR to serve as a basis for what you're introducing here with form sanitization?

@westonruter
Copy link
Member

@DavidCramer FYI: There won't need to be a get_scripts() method defined in the form sanitizer once #882 is merged.

@DavidCramer DavidCramer changed the title [WIP] - Add form sanitization for comment submissions [WIP] - Add support for comments Jan 19, 2018
@ThierryA
Copy link
Collaborator

@DavidCramer #882 was just merged to develop so you can apply @westonruter suggestion here.

$url = substr( $url, 5 );
}
// @todo Identify arguments and make filterable/settable.
$template .= '<amp-list src="' . esc_attr( $url ) . '" height="400" single-item="true" layout="fixed-height">';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about calling the actual comments walker here too and provide the content as placeholder/fallback content? Maybe that is overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a bad idea. However, it would increase page weight. But setting some kind of fallback is great idea.


// Get the action URL.
if ( ! $node->hasAttribute( 'action' ) ) {
$action_url = esc_url_raw( '//' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'] ); // WPCS: ignore. input var okay, sanitization ok.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I don't normally like using $_SERVER if I can avoid it, but this is from your suggestion here: #871 (comment)
I feel this should be a little cleaner, but can't think how. any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right what you have here, except that there should a beautiful wp_unslash() call added to it, since $_SERVER vars are auto-slashed for back-compat reasons:

esc_url_raw( '//' . wp_unslash( $_SERVER['HTTP_HOST'] ) . wp_unslash( $_SERVER['REQUEST_URI'] ) );

The one for HTTP_HOST is somewhat overkill.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on the overkill. but will unslash the request uri.

}
return 'element';
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter component scripts are have custom-element in the script tags, however amp-mustache used for amp-list does not, instead it has custom-template. see https://www.ampproject.org/docs/reference/components/amp-mustache
It looks like this is the only component that's different, however I separated it to its own method in case the spec changes, or additionals are added in future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think get_component_type would be needed if we sanitize the entire body, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I think we still do, since this is done after the sanitisation and the scripts are placed in. All the scripts are custom-element but the template is custom-template.

@DavidCramer
Copy link
Contributor Author

@westonruter Regarding moving the sanitisation from the_content to the output buffer: I needed this in order to have the comments to work, which I have done in my dev, but have not added to this PR. Is that still under discussion or could I do a PR on it based on what I've already got?

@westonruter
Copy link
Member

@DavidCramer we'll go ahead and sanitize the entire body, or actually, the entire html document. I was also going to work on a PR, so if you can get one up first that is good as well.

$GLOBALS['comment'] = $comment; // WPCS: override ok.
$comment->comment_date = get_comment_date();
$comment->comment_time = get_comment_time();
$comment->comment = amp_get_comments_recursive( $id, $comment->comment_ID );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will have performance problems, as it could result in a lot of DB queries. If you look at wp_list_comments() I believe you can see it gets a flat list of all comments for the post, and then it passes it into Walker::paged_walk() which then goes about identifying which comments are children of other comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, AMP_Comment_Walker:: paged_walk() here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter Agreed. Can revise it to be more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter how about this? b7c5963

@DavidCramer DavidCramer changed the title [WIP] - Add support for comments Add support for comments Jan 24, 2018
@DavidCramer
Copy link
Contributor Author

@westonruter You can start reviewing this, but there is an issue with the url template strings I need to fix. I'm needing to head off now, but will finish when I'm back. If not by the time you get on, I'll sort it in the morning.

@westonruter
Copy link
Member

there is an issue with the url template strings I need to fix

Shoot. This is a side effect of switching from saveXML to saveHTML in #891.

Similar to what was done in #895 with AMP_DOM_Utils::convert_amp_bind_attributes(), we need to do a temporary replacement of amp-mustache tokens during parsing and serialization, then swap them back after. I'm working on the change.

@westonruter
Copy link
Member

We talked it over and we decided to try the route of amp-live-list instead and see how it goes. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants