Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve post processor response cache #1325

Merged
merged 26 commits into from
Aug 23, 2018

Conversation

hellofromtonya
Copy link
Contributor

@hellofromtonya hellofromtonya commented Aug 7, 2018

Randomly generated content can occur on a web page. When that happens, the response will differ for each request causing cache misses for the plugin's AMP response cache. As @westonruter notes in Issue #1239:

the object cache will get polluted with values that never get used.

This PR detects continual cache misses by caching the page's URL and then comparing the number of cached URLs to a threshold. When that threshold is exceeded, it disables the response cache and then notifies the admin.

TODO

  • Detect continual cache misses
  • Disable caching
  • Notify
  • Provide a UI to reset the option's value and re-enable response caching.

Closes #1239.

Tonya Mork added 3 commits August 7, 2018 17:11
We are capturing each response's URL and storing it into the cache.

@todo turn off response caching when the number of caches for the response's URL exceeds the cache miss threshold.
When the number of cached URLs for the given response exceeds the cache miss threshold, disable the response caching.  It also disables continuing to cache the response's URLs.

@todo add getter for response cache to test the response.
Added the caching keys as public properties.  Why?

1. Eliminates passing it into the caching closure.
2. Exposes both of them for testing.

Then we use these keys to valid the caching conditions before and after we hit the threshold.
@hellofromtonya
Copy link
Contributor Author

@westonruter Commits through to 1d1f399 are a prototype per our discussions here. It's implementing:

  1. Cache the URLs.
  2. Then determine if we should turn off the response caching.
  3. If caching is enabled, then inside of the caching closure, we'll cache the new URL.

Take a look and see what you think. BTW Notices are not included yet.

self::$post_processor_cache_key = null;
if ( true === $args['enable_response_caching'] ) {
self::$post_processor_cache_key = md5( $current_url );
$caches_for_url = wp_cache_get( self::$post_processor_cache_key, self::POST_PROCESSOR_CACHE_EFFECTIVENESS );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sone of the subsequent checks could perhaps be simplified by checking if this is an array here, and if not, just set it to be an empty array.

*
* @var string
*/
public static $response_cache_key;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? Aren't references limited to a single method? Since this is a static class it would be better of we limited the state that is persisted in it's class variables IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I added this public property for the following purposes:

  1. to give us the ability to deeply test by knowing what the cache key is and then checking for it in the cache.
  2. to simplify access to the logic within the caching closure, i.e. meaning that we don't have to pass it via use.

It serves no other external purpose as part of an API.

Let's think about it.

From a testing point of view, we can infer that we disabled the cache by the state of the post-processor cache. While that's not a direct test on the response cache, both are within the closure. Therefore, it's a minimal risk to test by inference.

Hmm, I like that approach better actually. We get the benefit only exposing what is needed externally while avoided the extra work of resetting persistent state.

I'll make that change.

if ( empty( $caches_for_url ) ) {
$caches_for_url = array( AMP_Theme_Support::$response_cache_key );
} else {
$caches_for_url[] = AMP_Theme_Support::$response_cache_key;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, this could be the sole line instead of the if/else.

Tonya Mork added 3 commits August 9, 2018 13:15
When there are no URLs cached, we get `false` back.  To simplify the code, if we do not get an array back, the variable is initialized to an empty array.
@westonruter westonruter changed the title [WIP] Improve post processor response cache Improve post processor response cache Aug 9, 2018
@westonruter westonruter changed the title Improve post processor response cache [WIP] Improve post processor response cache Aug 9, 2018
@westonruter
Copy link
Member

For the notice, are you thinking there would be a new key added to the amp-options option which would flag whether or not the response cache is enabled?

@hellofromtonya
Copy link
Contributor Author

hellofromtonya commented Aug 9, 2018

For the notice, are you thinking there would be a new key added to the amp-options option which would flag whether or not the response cache is enabled?

Well that's a good question @westonruter. I think to find the solution, we need to explore what we want to report back to the developer/admin.

We are disabling the cache on a page-by-page basis. Do we want to give a list of URLs (i.e. that have cache disabled)? Or do we want a general notice? Or both?

It's a more broad discussion for the DX/UX.

@westonruter
Copy link
Member

@hellofromtonya humm, how about showing the notice on the general settings page, similar to how we are showing a notice for not having the object cache active.

I didn't think about letting the cache be disabled on a page-by-page basis. I realize that is how it is implemented now. I think that once we get to the point where the uncached count is greater than CACHE_MISS_THRESHOLD we can just set an option to stop the caching. In the future we could go to a page-by-page basis, but I think that could be overkill for now.

@westonruter
Copy link
Member

Also, currently the $post_processor_cache_key is specific to the current URL, so there would be no way to query for all URLs that are surpassing the CACHE_MISS_THRESHOLD, since the object cache cannot be queried. Maybe a compromise is that we could store the URL that causes the CACHE_MISS_THRESHOLD to be surpassed, and then use that URL in the notice.

Something else: if there are logged-in users and the REST API is being used on the frontend, it is likely that the API nonce is going to be output. This nonce will be unique for each user. This being the case, it is very likely that the CACHE_MISS_THRESHOLD will be very quickly passed, since each user landing on the homepage would then cause the increment to go up. Maybe this is expected and is proper behavior, but it's also something to be aware of as it could make the cache very prone to being turned off.

@westonruter
Copy link
Member

Maybe such caching should just be disabled entirely if the user is logged in.

Also, I just realized something else, when we do:

$caches_for_url[] = $response_cache_key;

We could easily cause for the homepage, for example, to trigger the cache to get disabled once you've published 3 posts, each causing the homepage to change. So what I think we're missing here is a timestamp for when the cache MISS happened. And then what we'd want to do is only consider such entries that have happened within a recent time period (e.g. 10 minutes) for whether the CACHE_MISS_THRESHOLD has been surpassed.

@hellofromtonya
Copy link
Contributor Author

Also, currently the $post_processor_cache_key is specific to the current URL, so there would be no way to query for all URLs that are surpassing the CACHE_MISS_THRESHOLD, since the object cache cannot be queried.

Correct.

Maybe a compromise is that we could store the URL that causes the CACHE_MISS_THRESHOLD to be surpassed, and then use that URL in the notice.

I agree. That's the next step as we think about how to notify.

We could easily cause for the homepage, for example, to trigger the cache to get disabled once you've published 3 posts, each causing the homepage to change. So what I think we're missing here is a timestamp for when the cache MISS happened. And then what we'd want to do is only consider such entries that have happened within a recent time period (e.g. 10 minutes) for whether the CACHE_MISS_THRESHOLD has been surpassed.

How about instead if we set the cache expiration to say 10 minutes?

@westonruter
Copy link
Member

How about instead if we set the cache expiration to say 10 minutes?

Yeah, that sounds good.

@hellofromtonya
Copy link
Contributor Author

I didn't think about letting the cache be disabled on a page-by-page basis. I realize that is how it is implemented now. I think that once we get to the point where the uncached count is greater than CACHE_MISS_THRESHOLD we can just set an option to stop the caching. In the future we could go to a page-by-page basis, but I think that could be overkill for now.

Maybe a compromise is that we could store the URL that causes the CACHE_MISS_THRESHOLD to be surpassed, and then use that URL in the notice.

@westonruter okay so instead of page-by-page, once we hit the threshold, we:

  1. turn off the cache for all pages/URLs
  2. store that URL in an option
  3. display a notice with a link to that URL

That's a simple solution. And as you said, page-by-page is likely overkill right now.

Tonya Mork added 3 commits August 9, 2018 17:40
Instead of storing cache miss URLs page-by-page, this commit stores it on occurrence, using a key and group constant.
1. Refactored into a separate method as there's a lot going on.
2. Check if cache is turned off.
3. If no, check if we've exceeded the threshold.
4. If yes, store the URL.

printf(
'<div class="notice notice-warning"><p>%s <a href="%s">%s</a></p></div>',
esc_html__( 'Response caching was disabled due to exceeding the cache miss threshold.', 'amp' ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need to figure out a way to explain what this means to non-developers. Maybe there should be a link to the Wiki with more information about what causes this?

Also, should the notice be dismissible?

Should there be a checkbox to re-enable? If so, then the notice could be displayed inline with the checkbox.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably need to figure out a way to explain what this means to non-developers. Maybe there should be a link to the Wiki with more information about what causes this?

Agreed. We need more work on the messaging. I've noted it as a @todo.

should the notice be dismissible?

I think so. I think it's okay that the developer or admin dismisses this warning.

Thinking through it, I'd like to use this top-level warning as an alert: "Hey, we automatically turned off caching. Go to the AMP > General page for more information." Use it to alert them that we did something automatically and it needs their attention.

Should there be a checkbox to re-enable? If so, then the notice could be displayed inline with the checkbox.

Agreed. This is the next step, i.e. adding a re-enable checkbox that appears when caching is disabled.

printf(
'<div class="notice notice-warning is-dismissible"><p>%s <a href="%s">%s</a></p></div>',
esc_html__( "The AMP plugin's response cache disabled due to detecting randomly generated content.", 'amp' ),
esc_url( 'https://github.com/Automattic/amp-wp/wiki/Response-cache#automatically-disabling-of-the-response-cache' ),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I opened a new Wiki page and started the process of explaining the cache. Once we get the UI done, then screenshots can be added to the page to provide users/developers more clarity.

@@ -48,6 +49,7 @@ public static function register_settings() {

add_action( 'update_option_' . self::OPTION_NAME, array( __CLASS__, 'maybe_flush_rewrite_rules' ), 10, 2 );
add_action( 'admin_notices', array( __CLASS__, 'persistent_object_caching_notice' ) );
add_action( 'admin_notices', array( __CLASS__, 'render_cache_miss_notice' ) );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter As we are automatically turning off the response cache, I'm thinking we may want to notify not only on this settings page, but on other pages too. Why? We can't be certain the admin/developer will come to this page. At the very least, I'm thinking this notice should be on all AMP settings' pages. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we limit it to the general settings page but add an indicator to the admin menu item so that the user knows there has something changed on this page?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westonruter I like that approach in general. I think it spans more than just this PR and edge case. Thinking about, it seems we could have a general indicator to alert of something that needs attention. That indicator could link to the plugin's settings page where more information lives.

If we go for that approach, I think that's a separate feature, issue, and PR.

// Handle the caching option.
$options['enable_response_caching'] = ! empty( $new_options['enable_response_caching'] );
if ( $options['enable_response_caching'] ) {
do_action( 'amp_reenable_response_cache' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it not be better to call the method here rather than indirectly trigger the action that the method is is connected to? Is there any utility for the amp_reenable_response_cache action generally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon review, you're right. There is not need for a separation action here. Switching to call direct.

@@ -109,6 +109,17 @@ public function add_menu_items() {
)
);

add_settings_field(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting should only be added if wp_using_ext_object_cache() returns true, as we should only do response caching if we know that the cached value will persist across requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Good point.

@@ -31,6 +31,7 @@ class AMP_Options_Manager {
'disable_admin_bar' => false,
'all_templates_supported' => true,
'supported_templates' => array( 'is_singular' ),
'enable_response_caching' => true,
Copy link
Member

@westonruter westonruter Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value here can be the return value of wp_using_ext_object_cache() (which wouldn't work as-is right here in this static context)

Copy link
Contributor Author

@hellofromtonya hellofromtonya Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to accomplish this is to set the default state within get_options():

self::$defaults['enable_response_caching'] = wp_using_ext_object_cache();
return array_merge( self::$defaults, $options );

This change is implemented in commit 752de78.

@@ -198,6 +200,12 @@ public static function validate_options( $new_options ) {
// Store the current version with the options so we know the format.
$options['version'] = AMP__VERSION;

// Handle the caching option.
$options['enable_response_caching'] = ! empty( $new_options['enable_response_caching'] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition here should probably be wp_using_ext_object_cache() && ! empty( $new_options['enable_response_caching'] )

Tonya Mork added 4 commits August 22, 2018 15:29
…nse_caching'.

The $defaults array is a static property.  Therefore, we set the default value within the `get_options()` method just before we merge defaults and the stored options.
@hellofromtonya hellofromtonya changed the title [WIP] Improve post processor response cache Improve post processor response cache Aug 23, 2018
@hellofromtonya hellofromtonya added this to the v1.0 milestone Aug 23, 2018
@hellofromtonya hellofromtonya added the Bug Something isn't working label Aug 23, 2018
@hellofromtonya
Copy link
Contributor Author

hellofromtonya commented Aug 23, 2018

@westonruter When we changed the conditions by using wp_using_ext_object_cache(), the notices were displaying incorrectly compared to the state. In the commit e575cde, I fixed it so that the notice appears when:

  1. wp_using_ext_object_cache() is enabled.
  2. The enable_response_caching option is disabled.
  3. The cache miss threshold was exceeded.

@@ -355,8 +355,8 @@ public static function render_cache_miss_notice() {

printf(
'<div class="notice notice-warning is-dismissible"><p>%s <a href="%s">%s</a></p></div>',
esc_html__( "The AMP plugin's response cache disabled due to detecting randomly generated content.", 'amp' ),
esc_url( 'https://github.com/Automattic/amp-wp/wiki/Response-cache#automatically-disabling-of-the-response-cache' ),
esc_html__( "The AMP plugin's post-processor cache disabled due to the detection of highly-variable content.", 'amp' ),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better @westonruter. Thank you 👍

@westonruter westonruter merged commit 234b722 into develop Aug 23, 2018
@westonruter westonruter deleted the improve/post-processor-response-cache branch August 23, 2018 23:03
@westonruter
Copy link
Member

@hellofromtonya Great work. This was tough!

@hellofromtonya
Copy link
Contributor Author

Collaborative effort between both of us. Thanks for your help and for first identifying this as an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants