Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test AMP compatibility of entire site #1183

Merged
merged 91 commits into from Aug 29, 2018

Conversation

Projects
None yet
4 participants
@kienstra
Copy link
Collaborator

commented May 30, 2018

Pull Request For #1007

This crawls the site, and test for AMP validity by template/content type.

  • Validate the homepage, as the existing query won't include it if it's not a static page.
  • Should this also validate the URLs of categories and tags?
  • Consider whether reusing AMP_Validation_Manager::validate_queued_posts_on_frontend() is the best way way to validate these URLs. It will probably need to be changed to batch the requests to the front-end, and I'm not sure if that will interfere with the existing functionality.
  • Consider whether this really needs its own class. At the moment, it's probably not big enough to need it.

Fixes #1007.

Begin testing of site urls, starting with helper method.
Define get_post_permalinks().
This gets permalinks for public posts,
other than attachments.
Another helper function can call
AMP_Validation_Manager::validate_url()
for each of these permalinks.

@kienstra kienstra changed the title [WIP] Ability to test AMP compatibility of entire site [WIP] Test AMP compatibility of entire site May 30, 2018

kienstra added some commits May 30, 2018

Change helper method to output IDs, for use with existing method.
validate_queued_posts_on_frontend() uses IDs,
which are stored in
$posts_pending_frontend_validation.
So simply return IDs from the helper function.
Change method name in test, assign $post_ids.
Before, $post_ids wasn't assigned as an array().
This addresses a failed Travis build.
Query for the latest posts, instead of by ID.
Querying for ID was simply to make testing easier.
So remove that, and fall back to the default
'orderby' => 'date'.
foreach ( $site_post_ids as $id ) {
AMP_Validation_Manager::$posts_pending_frontend_validation[ $id ] = true;
}
return AMP_Validation_Manager::validate_queued_posts_on_frontend();

This comment has been minimized.

Copy link
@westonruter

westonruter May 30, 2018

Member

As you suggested, this validate_queued_posts_on_frontend method is probably not relevant. The main thing will be to:

  1. Obtain the list of URLs on the site, including homepage, posts, pages, categories, archives, etc. A good sitemap XML generator plugin should have the logic you need for this.
  2. If in paired mode, append ?amp to each of the URLs discovered. This will be used instead of amp_get_permalink() because the endpoint will be eliminated when AMP theme support is present. See #1148. Some consideration in paired mode may be needed for whether AMP is available or not for a given post.
  3. For each AMP URL, call \AMP_Validation_Manager::validate_url() and pass the results into \AMP_Invalid_URL_Post_Type::store_validation_errors().
  4. Chunk the results to prevent timing out. This is particularly key in WP-Cron and Ajax. In WP-CLI this isn't relevant, and a progress meter should be shown.

This comment has been minimized.

Copy link
@kienstra

kienstra May 30, 2018

Author Collaborator

Thanks for your fast reply, @westonruter! These details really help.

kienstra added some commits May 30, 2018

Change helper method to get post permalinks.
This class is not going to use this method anymore:
validate_queued_posts_on_frontend().
So this will need the permalinks, not IDs.
Add a method to get links for public terms.
For example, http://example.org/?cat=2
This includes categories and tags,
and any more that are registered.
But it excludes post_format links.
* @param int $number_links The maximum amount of links to get (optional).
* @return string[] $links The term links in an array.
*/
public static function get_taxonomy_links( $number_links = 200 ) {

This comment has been minimized.

Copy link
@westonruter

westonruter May 30, 2018

Member

There should be pagination support here. The WP_Term_Query supports an $offset arg. I suggest that this method only return links for one specific taxonomy. The site crawler will probably need to make multiple calls to such a method to iterate over all of the taxonomy terms in batches. One way to do it would be to get the terms sorted by ID in ascending order. That ensures that when you paginate through the terms you won't miss terms that get added while you're processing through them.

This comment has been minimized.

Copy link
@kienstra

kienstra May 30, 2018

Author Collaborator

Thanks, good ideas. These commits apply your suggestions, as I understand them. This now has $taxonomy and $offset parameters.

'public' => true,
) );
// It doesn't seem necessary to get links for post format terms, like asides, galleries, or quotes.

This comment has been minimized.

Copy link
@westonruter

westonruter May 30, 2018

Member

Actually, I think it is necessary. There are archive links for post formats: https://codex.wordpress.org/Function_Reference/get_post_format_link

This comment has been minimized.

Copy link
@kienstra

kienstra May 30, 2018

Author Collaborator

Thanks, commit db54a removes the exemption of post_format terms.

@westonruter

This comment has been minimized.

Copy link
Member

commented May 30, 2018

I think the “Check Site” button would make sense on the “Invalid AMP Pages” screen. Clicking it should show a spinner and some progress indicator probably.

And as noted on #1184 (comment):

I've been considering that we should add a new AMP admin screen to manage settings regarding theme support, including:

  • Forcibly enable amp theme support for the current theme, even if it lacks add_theme_support('amp'). This will be particularly useful as the plugin will support all the core themes in AMP by default, and it would allow
  • Decide whether to have paired mode.
  • Preemptively opt-in to tree shaking, since it is going to be needed in the vast majority of cases anyway. Options here could be “Ask me” (block AMP without accepting), “As needed” (sometimes), and “Always”.

Having a “Check Site Compatibility” button would make a lot of sense on such a Theme Support admin screen. As part of this, we should allow for the amp_validate query parameter to force the plugin to behave as if amp theme support is present. In that way, a site could be checked for AMP compatibility before even enabling amp theme support via an opt-in toggle on that admin screen.

kienstra added some commits May 30, 2018

Also get links for post_format terms.
Before, I had unset() these from the array.
But as Weston pointed out,
post_format terms can have archive links.
Begin to apply review suggestion for get_taxonomy_links().
Add a $taxonomy parameter,
to only get the terms for a given taxonomy.
Add an $offset parameter to get_taxonomy_links().
As Weston mentioned,
this helps to get the links in batches.
Also, simplify the test.
Pass 'fields' => 'ids' to WP_Query().
Props 10up Engineering Best Practices.
The prevents the need to use
wp_list_pluck() to get the ids.
Allow paging through posts in get_post_permalinks().
A similar change to the one Weston proposed
for get_taxonomy_links().
Get permalinks by post types,
and allow paging through them.
This will allow iteration, in a way similar to
that in get_taxonomy_links().
Add a method to crawl the entire site.
Iterate over all of the terms and posts
from the existing helper methods.
@todo: consider self::BATCH_SIZE,
and whether the validation requests
should be throttled,
or somehow prevented from timing out WP.
Address Travis error by using full class name, not self::
In PHP 5.3, there was:
Fatal error: Cannot access self:: when no class scope is active.
https://travis-ci.org/Automattic/amp-wp/jobs/386045157.

This looks to be because self isn't
available inside the function in 5.3.
Another option might be to add $self to use().
But this seems simpler, if it works.
Force theme support of 'amp' on validation requests.
This will enable crawling the site in Native AMP.
This seems simpler than registering Paired Mode
support, and adding template_dir.
But it's still an open discussion as to
whether it should only be Native AMP.
If this already finds theme support,
it doesn't change it.
Add an 'amp' query arg if it's in Paired Mode.
In the validate_url() helper function,
find if it's in Paired Mode.
If so, set the $url to the amp endpoint.
Begin WP-CLI script to crawl the site.
This mainly calls the existing helper function.
But it also passes it $wp_cli_progress.
This advances the progress bar,
after each post type or taxonomy validation completes.
Display the number of validation issues in the sucess message:
Success: 196 URLs were crawled, and 196 have AMP validation issue(s).
Query for the validation error posts.
And output the number found.
Site crawling script: output 'more detail' link.
This drives to the page:
Invalid AMP Pages (URLs)

Also, output a message at the beginning:
Crawling the entire site to test for AMP validity.
This might take a while...
@kienstra

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 1, 2018

[WIP] WP-CLI Output
When you run wp eval-file bin/validate-site.php

wp-cli-output

And when it finishes:
p-cli-after

@westonruter

This comment has been minimized.

Copy link
Member

commented Jun 5, 2018

That UI for WP-CLI looks perfect 👌

} else {
echo "Please run this script with WP-CLI via: wp eval-file bin/validate-site.php\n";
exit( 1 );
}

This comment has been minimized.

Copy link
@westonruter

westonruter Jun 5, 2018

Member

Instead of adding this as a bin script it should get registered as an actual command with WP-CLI. For example, wp amp validate-site. The command needs to be included with the plugin when it is released, and the bin directory is just for development.

}
if ( $wp_cli_progress ) {
$wp_cli_progress->tick();

This comment has been minimized.

Copy link
@westonruter

westonruter Jun 5, 2018

Member

This could get de-coupled from WP-CLI by passing a $progress_callback function instead which is called, and then you could call the method like so:

\AMP_Site_Validation::validate_entire_site_urls( function() use ( $wp_cli_progress ) {
    $wp_cli_progress->tick()
} ) 
*/
function crawl_site() {
\WP_CLI::log( 'Crawling the entire site to test for AMP validity. This might take a while...' );
$count_post_types_and_taxonomies = count( get_post_types( array( 'public' => true ), 'names' ) ) + count( get_taxonomies( array( 'public' => true ) ) );

This comment has been minimized.

Copy link
@westonruter

westonruter Jun 5, 2018

Member

This should count the number of posts and terms, not just the number of types.

$offset = 0;
while ( ! empty( $permalinks ) ) {
self::validate_urls( $permalinks );

This comment has been minimized.

Copy link
@westonruter

westonruter Jun 5, 2018

Member

Per above, the tick should be done with each URL that is processed, not after each type is processed.

This comment has been minimized.

Copy link
@kienstra

kienstra Jul 12, 2018

Author Collaborator

Great idea. Commit 3d9a01 calls tick() for every URL validated.

@kienstra

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 7, 2018

Hi @westonruter,
Thanks for reviewing this. You make good points, like registering an actual WP-CLI command.

If it's alright, I'm going to leave this work for the next phase.

kienstra added some commits Jul 12, 2018

Update PR for latest changes to validator, merge develop.
3 assertions failed,
so fix these both in the tested class
and the PHPUnit class.
Register a WP-CLI command: wp amp validate-site
On Weston's suggestion.
Before, running the validation required:
wp eval-file bin/validate-site.php.
Call the WP-CLI tick() method for every URL validated.
This has a much better display of the activity.
On Weston's suggestion.
Create a new method to count all of the URLs
to be validated.
This should only be used for WP-CLI,
as it uses 'posts_per_page' => -1
@kienstra

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 24, 2018

Hi @westonruter,
That's a good point, and thanks for the steps to validate in 'Classic' mode. I'll work on that today if that's alright.

kienstra added some commits Aug 24, 2018

In Classic Mode, call WP_CLI::error() if the --force flag isn't present
On Weston's suggestion, as this would not be
able to validate any URL without the --force flag.
Allow validating the site in 'Classic' mode
As Weston suggested,
if in Classic mode and --force is passed,
force theme support in the crawled URLs.
This is now done in read_theme_support(),
thought there might be a better way.
@kienstra

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 24, 2018

Hi @westonruter, the 2 commits above apply your suggestions for forcing theme support when in Classic mode. I hope they capture what you had in mind.

Here's the error that appears when in 'Classic' mode and --force isn't present:

wp-cli-error

self::$force_crawl_urls = true;
}
if ( ! current_theme_supports( 'amp' ) ) {

This comment has been minimized.

Copy link
@kienstra

kienstra Aug 24, 2018

Author Collaborator

This conditional will capture the Template Mode being Classic, though maybe you had in mind a more narrow check, like:

! current_theme_supports( 'amp' ) && 'disabled' === AMP_Options_Manager::get_option( 'theme_support' ) 

westonruter added some commits Aug 28, 2018

Fix classic mode site validation by forcing native mode
* Only show admin URL to review results if theme support initially present.
* Skip showing table if no results obtained.
Unconditionally initialize validation manager w/ registered post type…
… and taxonomy

* Hide the admin menu links for invalid URLs and validation errors by default in classic mode.
* Invoking `wp amp validate-site` will cause the post type and taxonomy to be populated.
* When in classic mode, show links to invalid pages and validation errors with template mode as opposed to in admin menu.
* When viewing the admin screens for the post type and taxonomy, show the admin menu links for both.
@westonruter

This comment has been minimized.

Copy link
Member

commented Aug 29, 2018

The last piece here seemed to me to be providing a way for the user to access the validation results for their wp amp validate-site call even while AMP theme support is turned off. This is implemented in edcb23a, where the post type and taxonomy are unconditionally registered but they are only shown in the admin menu if theme support is present. Otherwise, links to the screens are shown if there are invalid URL posts that exist:

image

if ( current_theme_supports( 'amp' ) ) {
return true;
}
if ( 'edit.php' === $pagenow && ( isset( $_GET['post_type'] ) && self::POST_TYPE_SLUG === $_GET['post_type'] ) ) { // WPCS: CSRF OK.

This comment has been minimized.

Copy link
@hellofromtonya

hellofromtonya Aug 29, 2018

Contributor

Why not return the result of the conditional instead of explicitedly retuning true or false?

This comment has been minimized.

Copy link
@westonruter

westonruter Aug 29, 2018

Member

Sure. Done in a83a569.

if ( current_theme_supports( 'amp' ) ) {
return true;
}
if ( 'edit-tags.php' === $pagenow && ( isset( $_GET['taxonomy'] ) && self::TAXONOMY_SLUG === $_GET['taxonomy'] ) ) { // WPCS: CSRF OK.

This comment has been minimized.

Copy link
@hellofromtonya

hellofromtonya Aug 29, 2018

Contributor

Same here....return the result of the conditional expression instead of explicit trueor false. It’s less code and more clear.

westonruter added some commits Aug 29, 2018

WP_CLI::success(
sprintf(
/* translators: $1%d is the number of URls crawled, $2%d is the number of validation issues, $3%d is the number of unaccepted issues, $4%s is the list of validation by type, $5%s is the link for more details */
__( '%3$d crawled URLs have unaccepted issue(s) out of %2$d total which AMP validation issue(s); %1$d URLs were crawled.', 'amp' ),

This comment has been minimized.

Copy link
@kienstra

kienstra Aug 29, 2018

Author Collaborator

Maybe

total which AMP validation issue(s)

should be:

total with AMP validation issue(s)

@kienstra

This comment has been minimized.

Copy link
Collaborator Author

commented Aug 29, 2018

Commands Work As Expected

Hi @westonruter,
The WP-CLI commands here work as expected. Including:

wp amp validate-site

wp amp validate-site --include=is_author

wp amp validate-site --force (when in Classic mode)

wp amp validate-site --limit=2

limit-2

Also, great idea to add links for the invalid URLs and errors in Classic mode:

classic-validation

@westonruter westonruter merged commit c539a18 into develop Aug 29, 2018

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@westonruter westonruter removed this from In progress in v1.0 Aug 29, 2018

@westonruter westonruter deleted the valid-site branch Aug 29, 2018

@westonruter westonruter added this to the v1.0 milestone Aug 29, 2018

@postphotos

This comment has been minimized.

Copy link
Contributor

commented Aug 29, 2018

Hi @westonruter - really like the progress here and thanks for tagging this in #1359.

Also wanted to call out #1273 here - the idea I spelled out there could try to surface errors next to where toggles exist (so that a user is more likely to select modes when they might be afraid to do this). I'm not entirely sure this is necessary or how straightforward implementing it might be, but it's captured in the backlog regardless.

Both that feature (and #1273) are trying to take an approach to accomplish the inverse goals of #1366: Instead of surfacing settings info in the error listings, surface the error listings in the settings.

The key decisions I still think are up for discussion (past v1.0-stable) will reside around how much information a user should see and in what context that's necessary, but based on the work presented in #1006 I'm happy with the current efforts. Thanks for your help here!

cc: @jwold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.