Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for amp-bind #895

Merged
merged 7 commits into from Jan 25, 2018

Conversation

Projects
None yet
4 participants
@westonruter
Copy link
Member

commented Jan 24, 2018

  • Rename [x] binding attribute syntax with data-amp-binding-RANDOMSTRING---x so PHP can parse it.
  • When iterating over attributes in the sanitizer, treat attribute names prefixed by data-amp-binding...--- as being bracketed instead.
  • Make sure that the whitelist sanitizer adds the amp-bind component script when it comes across an AMP bound attribute.
  • When serializing back-out, replace data-amp-binding-... with the original syntax.

Fixes #836.

@westonruter westonruter added this to the v0.7 milestone Jan 24, 2018

@westonruter westonruter force-pushed the fix/amp-bind branch from ba8c9cd to d266233 Jan 24, 2018

@westonruter westonruter changed the title [WIP] Fix amp-bind Add support for amp-bind Jan 24, 2018

@amedina amedina requested a review from pbakaus Jan 24, 2018

@pbakaus

This comment has been minimized.

Copy link
Collaborator

commented Jan 24, 2018

Nice! Generally seems reasonable, but want to see if @choumx has any additional thoughts.

@choumx

This comment has been minimized.

Copy link

commented Jan 25, 2018

ampproject/amphtml#11115 would help but haven't had the time. 😓

@westonruter

This comment has been minimized.

Copy link
Member Author

commented Jan 25, 2018

I'm glad that an XHTML-friendly option is being considered, but we'll need to support the existing bracketed syntax as well.

@ThierryA
Copy link
Collaborator

left a comment

This is very exciting ❤️ I left my CR below.

}
// Save all alternative names in lookup to improve performance.
if ( isset( $attr_spec[ AMP_Rule_Spec::ALTERNATIVE_NAMES ] ) ) {

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Any reason not to cache this in the conditional statement above to avoid looping through the $attr_spec[ AMP_Rule_Spec::ALTERNATIVE_NAMES ]?

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

I thought about that, but there are other alternative names than just the bind ones. For example, srcset is an alt for src, and xlink:href is an alternate for href. These need to be accounted for as well. So that is why there is a separate conditional.

// Prepare whitelists.
$this->allowed_tags = $this->args['amp_allowed_tags'];
foreach ( AMP_Rule_Spec::$additional_allowed_tags as $tag_name => $tag_rule_spec ) {
$this->allowed_tags[ $tag_name ][] = $tag_rule_spec;

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Is $this->allowed_tags[ $tag_name ] always an array, amp-share-tracking doesn't seem to be declared as an array?

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

This logic was just moved from the sanitize method to the constructor. Yes, it should always an array. The $additional_allowed_tags is an array of arrays:

https://github.com/Automattic/amp-wp/blob/216ec376ab5bc698e03a149e6e68fa771cffd2d0/includes/sanitizers/class-amp-rule-spec.php#L88-L95

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Yes, though it is still looking for $this->allowed_tags[ 'amp-share-tracking' ] which doesn't exists isn't it? To make sure the array is conventionally declared, then we should have an:

if ( ! isset( $this->allowed_tags[ $tag_name ] ) ) {
	$this->allowed_tags[ $tag_name ] = array();
}

Like you have done here.

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

Potentially but it's just moved from existing code and it didn't have any problem. And actually, my use of isset() is actually overkill it seems. It's actually not required in PHP at all. For example:

<?php
error_reporting(E_ALL);
$array = array();
$array['foo']['bar']['baz'] = 'Hello!';
print_r( $array );

Results in:

Array
(
    [foo] => Array
        (
            [bar] => Array
                (
                    [baz] => Hello!
                )

        )

)

There are no notices or warnings. It works all the way back to PHP 4.3.0 (and beyond potentially): https://3v4l.org/b8ZaG

/**
* Replace AMP binding attributes with something that libxml can parse (as HTML5 data-* attributes).
*
* This is necessary necessary because attributes in square brackets are not understood in PHP and

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Typo mistake necessary necessary.

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

It is very necessary 😉

@@ -56,6 +56,8 @@ public static function get_dom( $document ) {
$dom = new DOMDocument();
$document = self::convert_amp_bind_attributes( $document );

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

I think at this stage it would be beneficial to look at extending the DOMDocument. For example here is a quick prototype which handles the amp binding conversion:

/**
 * Custom AMP DOM Document.
 */
class AMP_Dom_Document extends DOMDocument {

	/**
	 * Class constructor.
	 *
	 * @param string $version  The version number of the document as part of the XML declaration.
	 * @param string $encoding The encoding of the document as part of the XML declaration.
	 */
	public function __construct( $version = null, $encoding = null ) {
		parent::__construct( $version, $encoding );
	}

	/**
	 * Load HTML.
	 *
	 * @param string $source  The HTML code
	 * @param int    $options Additional Libxml parameters
	 * @return boolean True on success or False on failure
	 */
	public function loadHTML( $source, $options = 0 ) {
		$source = AMP_DOM_Utils::convert_amp_bind_attributes( $source );
		parent::loadHTML( $source, $options );
	}

	/**
	 * Save HTML.
	 *
	 * @param DOMNode $node Optional parameter to output a subset of the document.
	 * @return string The document (or node) HTML code as string
	 */
	public function saveHTML( DOMNode $node = null ) {
		$html = parent::saveHTML( $node );
		return AMP_DOM_Utils::restore_amp_bind_attributes( $html );
	}
}

Then new AMP_Dom_Document would be called instead of new DOMDocument.

Eventually, AMP_Dom_Document could call the entire sanitization logic making the processing workflow easier to understand and more bullet proof.

Also, instead of removing attributes causing Warning: DOMDocument::loadHTML(): error parsing attribute name and restoring them after saveHTML(), we could look at potentially using DOMDocument::registerNodeClass() in our AMP_DOM_Document::__construct to allow AMP attributes and values.

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

That is a good idea, but I want to wait to do that until we do the sanitization of the entire doc including the head.

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

Anyway, it is a good enhancement but it's not critical for this first merge.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Yes that is perfectly fine, good to have it as a @todo.

*/
private function process_alternate_names( $attr_spec_list ) {
foreach ( $attr_spec_list as $attr_name => &$attr_spec ) {
if ( '[' === $attr_name[0] ) {

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Is $this->args['amp_bind_placeholder_prefix'] prefix used for spec attributes like [src]?

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

Yes, since such attributes aren't understood by DOMDocument we add an safe placeholder version of that same attribute (e.g. amp-binding-1234…-src) and add it to the alternative names so it can be matched instead when the nodes are processed.

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

Perhaps we could use another term than binding in the future if some of the attributes are not specific to the binding component.

* @param array $attr_spec_list Attribute spec list.
* @return array Modified attribute spec list.
*/
private function process_alternate_names( $attr_spec_list ) {

This comment has been minimized.

Copy link
@ThierryA

ThierryA Jan 25, 2018

Collaborator

This function is called for all tags and then for all attributes of each tag which is quite resource heavy. My understanding is that it is used to add the $this->args['amp_bind_placeholder_prefix'] to the AMP_Rule_Spec::ATTR_SPEC_LIST which is later on used for the allowed attrs and flagging scripts.

To avoid having to do all these loops, we could perhaps do the conditional check at the is_amp_allowed_attribute() and process_node() level since we have access to the single node attributes?

This comment has been minimized.

Copy link
@westonruter

westonruter Jan 25, 2018

Author Member

Actually it should just be called once for each tag. It loops over all attributes itself.

By doing it once at the constructor we avoid double-nested foreach loops inside the sanitize_disallowed_attributes_in_node method below. Since the double foreach O(n^2) was being done at the node level, it could be extremely poor for performance. So now is_amp_allowed_attribute is able to use the rev_alternate_attr_name_lookup to do fast lookups.

Code review feedback applied.

@westonruter westonruter merged commit 4744f2d into develop Jan 25, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@westonruter westonruter deleted the fix/amp-bind branch Jan 25, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.