Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove scripts for components that were not detected in output buffer #3705

Merged
merged 11 commits into from Nov 15, 2019

Conversation

schlessera
Copy link
Collaborator

@schlessera schlessera commented Nov 10, 2019

Summary

If there are AMP scripts present on the page which are not present in $script_handles then they are removed, as otherwise there can be an AMP validation error (as they are unused).

Fixes #3489

Checklist

  • My pull request is addressing an open issue (please create one otherwise).
  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@googlebot googlebot added the cla: yes Signed the Google CLA label Nov 10, 2019
@schlessera schlessera added Sanitizers Enhancement New feature or improvement of an existing one labels Nov 10, 2019
@schlessera
Copy link
Collaborator Author

schlessera commented Nov 10, 2019

@westonruter I had to change a few existing tests to get them to pass, as this now removes any AMP component script that is not detected in the output buffer.

This might have side effects:

  • If we do not check for a component that a script is associated with.
  • If a script is not directly tied to a component (I added special handling for amp-dynamic-css-class for now, are there others?).
  • If a component is outside of the markup we check. This might happen for reader mode, maybe?

Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes work as expected, nice work @schlessera 👌

@westonruter
Copy link
Member

  • If a script is not directly tied to a component (I added special handling for amp-dynamic-css-class for now, are there others?).

That's a great question. I'll inquire.

  • If a component is outside of the markup we check. This might happen for reader mode, maybe?

Yes, the featured image for example gets sanitized separately from the content. However, the scripts gleaned from it will get merged with the scripts gleaned from the_content processing, so I think it OK.

@westonruter
Copy link
Member

  • If we do check for a component that a script is associated with.

@schlessera Can you elaborate on this? I'm not sure I understand.

@westonruter
Copy link
Member

I've inquired in #validator-discuss on AMP Slack:

We're looking to improve the AMP plugin to remove AMP component scripts on the page if the AMP components aren't used on the page. For example, if the amp-facebook-like script is on the page, but there is no <amp-facebook-like> on the page, there is a validation error. I have two questions:

  1. Why does this apply only to some component scripts but not others? For example I can include amp-form script on the page or amp-video on the page without there being any <amp-form> or <amp-video> just fine. But for amp-facebook-like it is different.
  2. Is there an enumeration of the component scripts that do not have associated components (or attributes) expected to be in the document? One example we're aware of is amp-dynamic-css-classes. What would the others be? Perhaps we should rather be using an enumerated list of the component scripts that specifically cause validation errors if present in the document without the components/attributes present. How can that list be derived?

@westonruter
Copy link
Member

FYI: If you rebase against develop the build should pass.

includes/class-amp-theme-support.php Outdated Show resolved Hide resolved
Comment on lines 1762 to 1767
$custom_element = $amp_scripts[ $superfluous_script_handle ]->getAttribute( 'custom-element' );

// Skip dynamic CSS classes script, it has no associated element.
if ( 'amp-dynamic-css-classes' === $custom_element ) {
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this single check here, and depending on what the AMP validator team comes back with regarding any list of components, perhaps [ 'amp-dynamic-css-classes' ] should be merged with $script_handles when it is passed into array_diff(). Then there would be no need for $custom_element, as the script handles are the component names.

@@ -1745,6 +1745,30 @@ public static function ensure_required_markup( DOMDocument $dom, $script_handles
$amp_scripts[ $missing_script_handle ] = AMP_DOM_Utils::create_node( $dom, 'script', $attrs );
}

// Remove scripts that had already been added but couldn't be detected from output buffering.
foreach ( array_diff( array_keys( $amp_scripts ), $script_handles ) as $superfluous_script_handle ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_diff( array_keys( $amp_scripts ), $script_handles ) may need to be array_intersect()'ed with a list of components that are actually prone to cause validation errors. Scripts like amp-video and amp-form don't cause any issue when being included. Perhaps this is due to some having built-in status. We need to find out.

In that case, there would be no need for the amp-dynamic-css-classes check below either.

@schlessera
Copy link
Collaborator Author

schlessera commented Nov 11, 2019

  • If we do check for a component that a script is associated with.

@schlessera Can you elaborate on this? I'm not sure I understand.

Sorry, I updated the comment, it was missing a word:

If we do not check for a component that a script is associated with.

What I mean by that is that the scripts are only added to our list of detected scripts if we actually do check for the component that is associated for that script. If a component is part of the markup that we don't check for (new component, maybe?), then we will also not detect the need for that script, and will consequently strip it if it was already included.

@westonruter
Copy link
Member

westonruter commented Nov 11, 2019

Replies courtesy of @honeybadgerdontcare.


Why does this apply only to some component scripts but not others? For example I can include amp-form script on the page or amp-video on the page without there being any <amp-form> or <amp-video> just fine. But for amp-facebook-like it is different.

The feature to enforce this was released after AMP was released and to enforce it on all previous scripts at that time would have broken many AMP documents. Instead those previous scripts were grandfathered in. More can be read at https://github.com/ampproject/amphtml/blob/master/validator/validator.proto#L430

Is there an enumeration of the component scripts that do not have associated components (or attributes) expected to be in the document? One example we're aware of is amp-dynamic-css-classes. What would the others be? Perhaps we should rather be using an enumerated list of the component scripts that specifically cause validation errors if present in the document without the components/attributes present. How can that list be derived?

These would be declared as requires_usage: NONE.

The list could be derived from parsing the protoascii and looking at the requires_usage portion of an extension_spec. https://github.com/ampproject/amphtml/blob/master/validator/validator.proto#L454

PR ampproject/amphtml#22528 introduced a JSON version of the validator spec that might be useful here.

@westonruter
Copy link
Member

PR #22528 introduced a JSON version of the validator spec that might be useful here. ampproject/amphtml#22528

For this see #2769.

These would be declared as requires_usage: NONE.
The list could be derived from parsing the protoascii and looking at the requires_usage portion of an extension_spec. https://github.com/ampproject/amphtml/blob/master/validator/validator.proto#L454

This is great. What we can do in the immediate term is update the spec parser Python script to extract the requires_usage information alongside where we extract the extension_spec info. Then this information would be available for the post-processor to query for the extensions that require a tag to be present and vice-versa.

…ve-duplicate-amp-scripts

* 'develop' of github.com:ampproject/amp-wp: (66 commits)
  Improve display of validation errors for scripts (#3722)
  Conditionally run E2E tests (#3723)
  Tidy up validation error details (#3721)
  Add missing space after sentence (#3720)
  Default to the homepage instead of fetching the first AMP compatible post to customize (#3715)
  Include text content of style element in validation error (#3717)
  Fix summarizing error sources both parent theme and child theme (#3709)
  Exclude WordPress.PHP.DisallowShortTernary phpcs sniff
  Fix phpcs issues with date() and current_time()
  Exclude Generic.Arrays.DisallowShortArraySyntax from WordPress-Core
  Update dependency wp-coding-standards/wpcs to v2.2.0
  Improve specificity of JS doc
  Fix identifying sources for validation errors coming child themes (#3708)
  Fix failing E2E tests (#3707)
  Remove amp_validate query var from Validated URL 'View' row action
  Re-factor get_html_attribute_pattern as match_element_attributes
  Quote variables added to regex pattern
  Replace incorrect usage of esc_url() with esc_url_raw()
  Remove empty alt attributes
  Add object-fit=contain to amp-youtube placeholder image
  ...
$src,
[ 'amp-runtime' ],
null
);

$wp_scripts->add_data( $extension_spec['name'], 'amp_requires_usage', ! empty( $extension_spec['requires_usage'] ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy with what I did here. The post-processor shouldn't be looking at the wp_scripts() object to determine later if this requires usage. It would be better if the extension_specs were obtained again during post-processing and the required_usage was obtained at that point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about even moving the entire registration to a point where we know what is needed and what is not? That would avoid registering scripts we don't need, and you could have all checks against the spec in one place...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These scripts are needing to be registered in WP_Scripts to ensure that they will be available to enqueue for AMP components used outside of AMP pages. For example, Newspack is using amp-carousel in its Carousel block and the block will wp_enqueue_script( 'amp-carousel' ) which will ensure it works in AMP pages and non-AMP pages alike. (Granted, this is not guaranteed to work prior to Bento AMP, but still, it works most of the time and is better than a blank element.)

$versions = $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec']['version'];
array_pop( $versions );
$extensions[ $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec']['name'] ] = array_pop( $versions );
if ( ! isset( $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec'] ) ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python parser should probably be smarter about extracting all of the extension specs up front to then provide them in a single call, like AMP_Allowed_Tags_Generated::get_extension_specs().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at one point rethink the way the spec is parsed. I've created an issue for that: #3730

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we plan out improving how the spec is parsed, I've implemented a AMP_Allowed_Tags_Generated::get_extension_specs() helper to improve things in the meantime. See 9956c90.

…ve-duplicate-amp-scripts

* 'develop' of github.com:ampproject/amp-wp:
  Manually add copies of AMP scripts rather than using wp_print_scripts()
  Add duplicate scripts to head as well
  Add test for removing duplicate scripts
  Bump stable tag to 1.4.1
  Update screenshots for 1.4.1
  Fix expected image name after upstream change (#3749)
  Use length property instead of count() method on DOMNodeList (#3727)
Copy link
Collaborator Author

@schlessera schlessera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, cannot request changes as the author of the PR... (silly limitation).

Comment on lines 221 to 229
public static function get_extension_specs() {
$extension_specs = [];
foreach ( self::get_allowed_tag( 'script' ) as $script_spec ) {
if ( isset( $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec'] ) ) {
$extension_specs[ $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec']['name'] ] = $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec'];
}
}
return $extension_specs;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to process this every time it is requested (which is at least twice per request).

public static function get_extension_specs() {
	static $extension_specs = [];

	if ( ! empty( $extension_specs ) ) {
		return $extension_specs;
	}

	foreach ( self::get_allowed_tag( 'script' ) as $script_spec ) {
		if ( isset( $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec'] ) ) {
			$extension_specs[ $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec']['name'] ] = $script_spec[ AMP_Rule_Spec::TAG_SPEC ]['extension_spec'];
		}
	}

	return $extension_specs;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Done in 16f35b4.

@westonruter westonruter added this to the v1.5 milestone Nov 15, 2019
@westonruter westonruter merged commit 451dd22 into develop Nov 15, 2019
@westonruter westonruter deleted the 3489-remove-duplicate-amp-scripts branch November 15, 2019 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Signed the Google CLA Enhancement New feature or improvement of an existing one Sanitizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove duplicate and unused AMP component scripts from the response
4 participants