New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1103/add Amp_Embed_Sanitizer #1128

Merged
merged 12 commits into from Jun 20, 2018

Conversation

Projects
None yet
2 participants
@juanchaur1
Contributor

juanchaur1 commented May 7, 2018

Fixes issue: #1103

@westonruter

This comment has been minimized.

Member

westonruter commented May 7, 2018

This looks really good. One question I have is whether we should really have an Instagram-specific sanitizer or if we should have a more generic “Embed” sanitizer as I mentioned in #1124 (comment) which is a similar issue for Facebook embeds. Instead of repeating the same class structure over and over again, we could have a single AMP_Embed_Sanitizer which is has functions for handling each non-oEmbed embed that needs to be converted.

Alternatively, perhaps the existing embed classes could be decorated with additional methods for handling conversion of non-oEmbed embeds. In that way, the AMP_Embed_Sanitizer could be passed a list of the AMP_Base_Embed_Handler instances and then for each one it could invoke some sanitize_non_oembed() method (needs better naming). This would allow for all of the Instagram code to be kept in one single class rather than separating it into multiple classes.

Also, if you re-base your commits onto the 0.7 branch then this could be considered for inclusion in 0.7.1.

@juanchaur1

This comment has been minimized.

Contributor

juanchaur1 commented May 7, 2018

It would definitely make sense to have a more generic approach or a decorator as you mentioned. I will take a look as soon as I can and make a proposal. Thanks @westonruter !

@juanchaur1

This comment has been minimized.

Contributor

juanchaur1 commented May 8, 2018

@westonruter, following your comments here it's a proposal:

Create the AMP_Embed_Sanitizer you mentioned that simply will do an apply_filters( 'sanitize_raw_embed', $this->dom );

Then in each embed we want to sanitised we register the filter:
e.g: AMP_Instagram_Embed_Handler

public function register_embed() {
   	...		
   	add_filter( 'sanitize_raw_embed', array( $this, 'sanitize_blockquote' ) );
   }
}

private function sanitize_blockquote( $dom ) {
 // rest of the logic
}

What are your thoughts on this?

@westonruter

This comment has been minimized.

Member

westonruter commented May 9, 2018

@juanchaur1 I you look at the existing sanitizers right now you can see that none of them interact with the WordPress plugin API. That being the case, I think it would be better if the list of embed handlers were passed as an argument to the AMP_Embed_Sanitizer class. And then for each embed handler, check of a sanitize_raw_embeds method exists, and if so, invoke it. The list of embed handlers could be passed to the embed sanitizer via something like this:

diff --git a/includes/class-amp-theme-support.php b/includes/class-amp-theme-support.php
index 67caee9..ac56f54 100644
--- a/includes/class-amp-theme-support.php
+++ b/includes/class-amp-theme-support.php
@@ -145,8 +145,10 @@ class AMP_Theme_Support {
 		}
 
 		self::add_hooks();
-		self::$sanitizer_classes = amp_get_content_sanitizers();
 		self::$embed_handlers    = self::register_content_embed_handlers();
+		self::$sanitizer_classes = amp_get_content_sanitizers();
+
+		self::$sanitizer_classes['AMP_Embed_Sanitizer']['embed_handlers'] = self::$embed_handlers;
 	}
 
 	/**
diff --git a/includes/templates/class-amp-post-template.php b/includes/templates/class-amp-post-template.php
index ff5980f..93d1dc1 100644
--- a/includes/templates/class-amp-post-template.php
+++ b/includes/templates/class-amp-post-template.php
@@ -315,9 +315,13 @@ class AMP_Post_Template {
 	 * Build post content.
 	 */
 	private function build_post_content() {
+		$embed_handlers = amp_get_content_embed_handlers( $this->post );
+		$sanitizers = amp_get_content_sanitizers( $this->post );
+		$sanitizers['AMP_Embed_Sanitizer']['embed_handlers'] = $embed_handlers;
+
 		$amp_content = new AMP_Content(
 			$this->post->post_content,
-			amp_get_content_embed_handlers( $this->post ),
+			$embed_handlers,
 			amp_get_content_sanitizers( $this->post ),
 			array(
 				'content_max_width' => $this->get( 'content_max_width' ),
@juanchaur1

This comment has been minimized.

Contributor

juanchaur1 commented May 10, 2018

I have added the change to facebook embed to support the issue #1124 (comment)

@westonruter, I'm passing the embed_handlers to the Amp_Embed_Sanitizer in AMP_Content as it's in there where we have the actual instance of embed_handlers.

@juanchaur1 juanchaur1 changed the title from 1103/fix embed instagram sanitizer to 1103/add Amp_Embed_Sanitizer May 10, 2018

@westonruter

This comment has been minimized.

Member

westonruter commented May 21, 2018

Sorry for the delay in reviewing this. It's still on my radar.

juanchaur1 added some commits May 30, 2018

@juanchaur1

This comment has been minimized.

Contributor

juanchaur1 commented May 30, 2018

I have added another scenario: blockquotes from twitter: https://dev.twitter.com/web/embedded-tweets/parameters

@juanchaur1

This comment has been minimized.

Contributor

juanchaur1 commented Jun 5, 2018

@westonruter what's the status of this review?

@westonruter

This comment has been minimized.

Member

westonruter commented Jun 5, 2018

@juanchaur1 It's still in my queue. Sorry.

*
* @var string embed HTML blockquote tag to identify and replace with AMP version.
*/
protected $sanitize_tag = 'div';

This comment has been minimized.

@westonruter

westonruter Jun 5, 2018

Member

I am curious to find out what the performance impacts are between $this->dom->getElementsByTagName( 'div' ) and then filtering out the ones that have the required attributes (is_facebook_raw_embed) .vs using XPath to obtain just the exact matching set up front, for example via:

//div[ @data-href and ( contains( @class, 'fb-post' ) or contains( @class, 'fb-video' ) ) ]

Given that div is a very generic element, I worry about the performance impacts of locating elements in large documents. But I'm not sure if XPath would be faster.

This comment has been minimized.

@juanchaur1

juanchaur1 Jun 6, 2018

Contributor

I created a quick benchmark checking both implementations:
seems like xpath isn't faster.

*
* @return boolean
*/
private function get_embed_type( $node ) {

This comment has been minimized.

@westonruter

westonruter Jun 5, 2018

Member

This method could potentially be merged with is_facebook_raw_embed, if we don't go the XPath route. The get_embed_type() method could also check to see if data-href attribute is present, and likewise check if the classes are present.

juanchaur1 and others added some commits Jun 6, 2018

Ensure embed scripts for Twitter and Instagram get removed
* Clean up phpdoc and method names.
* Account for issues with wpautop.
* Clean some phpcs issues.

@westonruter westonruter added this to the v1.0 milestone Jun 20, 2018

@westonruter westonruter merged commit cd21c2b into ampproject:develop Jun 20, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment