Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to prevent storing parsed CSS in transients #4177

Merged
merged 33 commits into from
Mar 23, 2020

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Jan 25, 2020

Note: This PR's true author is @schlessera!

Summary

Fixes #2092

  • Adds a new amp_parsed_css_transient_caching_allowed filter which allows a site to prevent storing parsed CSS in transients, by filtering the value to be false. This is important on sites which have highly variable CSS, resulting in a site filling up its wp_options table with transients.

    Note that when transient storage is disabled, the parsed CSS is still cached in the object cache even when an external object cache is not available. This ensures that the same CSS appearing multiple times on a page will not have to be re-parsed.

  • Adds detection for a huge number of SELECT COUNT(*) FROM $wpdb->options WHERE option_name LIKE '_transient_amp-parsed-stylesheet%' and when a threshold is met, persists the fact that transient caching of stylesheets is disabled. This happens during a daily cron.

  • Adds Site Health warning when transient caching is disabled (and external object cache is not present).

  • Adds Site Health debugging information.

  • Adds a warning to the Network Plugins screen on every large sites, to inform the administrator that orphaned scheduled events will be left behind.

Site Health integration

Transient caching of stylesheets enabled

Screen Shot 2020-03-21 at 16 21 34

Transient caching of stylesheets disabled

Screen Shot 2020-03-21 at 16 22 54

Transient caching of stylesheets not applicable

Screen Shot 2020-03-21 at 16 22 11

Debug information

Image 2020-03-20 at 8 45 02 AM

Network plugins screen warning

image

See #2092.

Checklist

  • My pull request is addressing an open issue (please create one otherwise).
  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@schlessera schlessera force-pushed the add/disabling-parsed-css-transient-caching branch from 58c3ab7 to ece53ed Compare March 19, 2020 06:00
@schlessera
Copy link
Collaborator

schlessera commented Mar 19, 2020

⚠️ Force-pushing a rebase.

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no Has not signed the Google CLA and removed cla: yes Signed the Google CLA labels Mar 19, 2020
@schlessera
Copy link
Collaborator

@googlebot I consent.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes Signed the Google CLA and removed cla: no Has not signed the Google CLA labels Mar 19, 2020
@schlessera
Copy link
Collaborator

@westonruter To follow up with the discussion we just had in the sync call, we should talk about how best to handle this for now: https://github.com/ampproject/amp-wp/pull/4177/files#diff-4805c837d3390517773926fb26726bcdR306-R323

@schlessera
Copy link
Collaborator

Also, I'm open for any suggestions if you know a better way of testing the background task.

amp.php Outdated
Comment on lines 306 to 323
// @todo Quick fix for now, no centralized flow yet to register services.
foreach ( [
'AmpProject\AmpWP\BackgroundTask\MonitorCssTransientCaching',
] as $background_task ) {
try {
( new $background_task() )->register();
} catch ( Exception $exception ) {
// @todo No logger yet, dump to error log?
$exception_class = get_class( $exception );
$error_message = "Exception '{$exception_class}' thrown in background task '{$background_task}': {$exception->getMessage()}";
if ( class_exists( 'WP_CLI' ) ) {
WP_CLI::warning( $error_message );
} else {
error_log( $error_message ); // phpcs:ignore WordPress.PHP.DevelopmentFunctions.error_log_error_log
}
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To follow up with the discussion we just had in the sync call, we should talk about how best to handle this for now: https://github.com/ampproject/amp-wp/pull/4177/files#diff-4805c837d3390517773926fb26726bcdR306-R323

@schlessera I don't really have any better suggestion for the immediate term on where to put this, other than to plop it in amp.php like you've done. We'll need to look in the future about how better to organize the amp.php bootstrap as part of modularization effort.

The one thing that I think should be done, however, is to avoid calling this right here immediately when the bootstrap is being invoked. This will cause a DB write to occur and potentially race condition on a heavily-trafficked site. This would be adding one more place in addition to the frontend writes being called out in #3284.

Since register_activation_hook() doesn't work with multisite, what about using admin_init instead? This will ensure it only fires once a user has logged in. See: https://developer.wordpress.org/reference/functions/register_activation_hook/#comment-2100

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to put it inside of an amp_plugin_update action, but this still runs during frontend requests. As noted in #3284 (comment) it could also be moved to admin_init. In that way it would be similar to the trigger for the DB upgrade.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a DB write to occur and potentially race condition on a heavily-trafficked site.

AFAICT, right now, the DB writes are:

  • once per day in the register() task to schedule a new cron job if none is scheduled;
  • once per day in the process() that is run via cron.

Am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is the additional write to schedule the interval in the first place. That is, when register is called for the very first time upon upgrading, then wp_next_scheduled() will return false which will then cause wp_schedule_event() to be called:

$timestamp = wp_next_scheduled( $event_name );
if ( $timestamp ) {
return $timestamp;
}
if ( ! wp_schedule_event( time(), $interval_name, $event_name ) ) {

This will then cause _set_cron_array() to be called, which will do a DB write. If wp_schedule_event() is limited to only happen at admin_init, then no cron update would happen on the frontend. When an event transpires, the next event would get scheduled during the cron request, not during a frontend request.

By putting this logic into an admin_init hook, this DB write to schedule the event won't happen on the frontend. Granted, this would mean splitting the register method into two: one to schedule and one to add the hooks. The schedule part could be done at admin_init whereas the hook addition would need to be done with every request (or at least when doing cron).

Maybe I'm being overly cautious about this, and there invariably a lot of other DB writes that happen during frontend requests (especially when transients are used without a persistent object cache). But I'm just trying to think of any ways to make this more robust.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised that the schedule call should actually be done on plugin activation instead. That makes it more robust and ensures it is only run once, which is enough, as we are scheduling a recurring event.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixarntz But that will inevitably run into race conditions then, and we'll end up running the cron job multiple times per day, potentially disabling the transient caching much too soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: In this case, not running the cron job at all is highly preferable to running it multiple times because of a race condition.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it not be reliable to cycle over all blogs (once!) on plugin activation (something that is under the control of the administrator, and listening to site creation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schlessera Potentially yes, but we can reduce the chance for this to a minimum, e.g. we could only schedule the cron job if an administrator is logged in. I would say it's highly unlikely that a site has so many administrators that more than one loads an admin page in the same given second right in the moment where this action is executed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schlessera

Why would it not be reliable to cycle over all blogs (once!) on plugin activation (something that is under the control of the administrator, and listening to site creation?

Yes we could do that but then you'll run into scaling issues with larger multisites.

@schlessera
Copy link
Collaborator

So I've added a very, very small "Services" subsystem now to get around the hacky instantiation that we had in the previous iteration of this PR.

This subsystem:

  • is very lightweight and straight-forward
  • doesn't use any magic of heavy engineering concepts
  • can be used gradually and does not need to be enforced

More importantly, though, it gives structure to the way we register, activate or deactivate code.

For the next piece of logic we want to add like this:

  1. create a class that does its thing
  2. add needed hooks into the register() method
  3. if it needs to act on plugin activation, implement the HasActivation interface
  4. if it needs to act on plugin deactivation, implement the HasDeactivation interface

No change to amp.php or other parts of the code needed anywhere, the change will be completely self-contained (which should be the goal of maintainable code).

Please let me know what you think!

@schlessera
Copy link
Collaborator

I added a screenshot with the warning to the description up above.

@westonruter
Copy link
Member Author

Hummm. I'm seeing test failures/errors:

There was 1 error:

1) Test_Monitor_CSS_Transient_Caching::test_event_gets_scheduled_and_unscheduled
Error: Call to undefined method AmpProject\AmpWP\BackgroundTask\MonitorCssTransientCaching::activate()

/tmp/wordpress/src/wp-content/plugins/amp/tests/php/test-class-monitor-css-transient-caching.php:42

--

There were 2 failures:

1) Test_Monitor_CSS_Transient_Caching::test_event_can_be_processed
Failed asserting that false is not false.

/tmp/wordpress/tests/phpunit/includes/phpunit7/testcase.php:30
/tmp/wordpress/src/wp-content/plugins/amp/tests/php/test-class-monitor-css-transient-caching.php:66

2) Test_Monitor_CSS_Transient_Caching::test_transient_caching_is_disabled
false does not match expected type "array".

* Ensure that external object cache is disabled during tests.
* Replace removed use of activate() method with call to schedule_event().
* Pass required attribute to deactivate() method.
…ling-parsed-css-transient-caching

* 'develop' of github.com:ampproject/amp-wp:
  Fix passing of numbers to _n()
  Prevent case where validation error can be raised on already-removed node
  Fix MISSING_URL checks by inferring ALLOW_EMPTY from MANDATORY
  Tighten up validation error titles
  Remove markup from translations
  Make phpcs happy
  Remove DISALLOWED_TAG_MULTIPLE_CHOICES validation error code
  Replace quotes with code tags
  Make phpcs happy
  Add error titles for internal error codes
  Add error titles for error codes found in AMP specification
  Revert "Generate PHP class with error codes from AMP specification"
  Generate PHP class with error codes from AMP specification
*
* @package AmpProject\AmpWP
*/
interface HasActivation {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this interface is not currently used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not being implemented, but it's used in \AmpProject\AmpWP\Services::class:

if ( ! $service instanceof HasActivation ) {

Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor things I'd like to bring attention to, otherwise it looks good!

*
* @var string[]
*/
const DEFAULT_INTERVAL_NAMES = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently unused.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. When the cron-handling was still a bit more complex, this was used for validation. However, I ripped all of that out, and just assume someone using the class is using it correctly for now.

So this one can go.

return $actions;
}

$actions['deactivate'] = preg_replace( '#(?=</a>)#i', ' ' . self::get_warning_icon(), $actions['deactivate'] );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_warning_icon() is being called statically although the function is not.

Copy link
Member Author

@westonruter westonruter Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9ad9b10.

MonitorCssTransientCaching::DEFAULT_THRESHOLD
);

return "{$threshold} transients per day";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be translated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It's part of the Site Health info export.

MonitorCssTransientCaching::DEFAULT_SAMPLING_RANGE
);

return "{$sampling_range} days";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This as well, should it be translated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It's part of the Site Health info export.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK.

*
* @package AmpProject\AmpWP
*/
interface HasActivation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not being implemented, but it's used in \AmpProject\AmpWP\Services::class:

if ( ! $service instanceof HasActivation ) {

Co-Authored-By: Pierre Gordon <16200219+pierlon@users.noreply.github.com>
Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work done here @schlessera and @westonruter!

* Move functions to amp-helper-functions.php.
* Add amp_bootstrap_plugin() function to contain former top-level calls
amp.php Outdated Show resolved Hide resolved
schlessera and others added 2 commits March 23, 2020 12:38
Co-Authored-By: Weston Ruter <westonruter@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Signed the Google CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect high cache miss rate for obtaining parsed CSS from variable stylesheets
5 participants