Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twenty Seventeen's use of uniqid() breaks post-processor response cache #1239

Closed
3 tasks
westonruter opened this issue Jul 2, 2018 · 13 comments
Closed
3 tasks
Assignees
Labels
Bug Something isn't working Release
Milestone

Comments

@westonruter
Copy link
Member

westonruter commented Jul 2, 2018

As a site admin running AMP on my WordPress site, I should be able to know when elements cannot be cached and I should be able to cache unique elements (via usage of uniqid() for example) when a WP theme or plugin generates this.

  • AC1: Using TwentySeventeen and its uniqid method break caching as described below. The plugin should automatically catch when a theme (such as this one) breaks caching, reporting that it cannot be fixed automatically.
  • AC2: Propose patch to Twenty Seventeen to use an alternative to use a counter or something more stable than uniqid().
  • AC3: Introduce new filter to apply to buffered content which can be used (sparingly) to fixup content that would otherwise break response caching.

Twenty Seventeen uses uniqid() to generate IDs for HTML elements:

https://github.com/WordPress/wordpress-develop/blob/8e96abbcc2214c3566b814619c4b038d59ad77d9/src/wp-content/themes/twentyseventeen/searchform.php#L13

https://github.com/WordPress/wordpress-develop/blob/8e96abbcc2214c3566b814619c4b038d59ad77d9/src/wp-content/themes/twentyseventeen/inc/icon-functions.php#L77-L94

Unfortunately this causes the response to differ for each request and causes a cache MISS for the plugin's AMP response cache. The result is that the post-processor will always be invoked on a site with Twenty Seventeen, and the object cache will get polluted with values that never get used.

I'm not sure if there is something we can do to fix this. It would have been preferable if Twenty Seventeen had instead introduced a unique-ID generator like:

function twentyseventeen_uniqid() {
    static $id = 0;
    return ++$id;
}

As this would mean the IDs would be unique within a response, but they would be repeated across multiple responses.

@westonruter westonruter added the Bug Something isn't working label Jul 2, 2018
@westonruter westonruter added this to the v1.0 milestone Jul 2, 2018
@westonruter westonruter changed the title Twenty Seventeen's use of uniqid() breaks response cache Twenty Seventeen's use of uniqid() breaks post-processor response cache Jul 2, 2018
@westonruter
Copy link
Member Author

We may have to facilitate turning off the response cache in such case. Currently it is conditional based on:

https://github.com/Automattic/amp-wp/blob/257c7ae4a5885d94b7573ae42f8dc4f74672764c/includes/class-amp-theme-support.php#L1611-L1615

We may need a filter here.

@westonruter
Copy link
Member Author

It may be beneficial to do better tracking of cache misses as part of this. If we're getting a lot of cache misses then maybe the response caching should turn itself off and a warning message should be displayed.

@postphotos
Copy link
Contributor

Hi @westonruter - I've written up a user story and some ACs here. It sounds like there are two parts to this puzzle - I've defined them as such and moved over to the "to do" lane. Let me know if there's any more work here needed on my side.

@westonruter
Copy link
Member Author

@postphotos:

AC2: Using the AMP plugin, a site running Native AMP should be cacheable in this method as described above and below.

I think that if if a theme outputs a random string then it would not be cacheable, period. Consider another possible common scenario where a site may output the current time via PHP.

So if there is a second AC then it should instead be:

AC2: Propose patch to Twenty Seventeen to use an alternative to use a counter or something more stable than uniqid().

I'll make that change.

If not, then there is one alternative AC to consider doing a special patch for Twenty Seventeen after output buffering to make the IDs consistent. This is a 3rd AC that I think we'll have to do given the popularity of Twenty Seventeen. What this could look like is we'd have to introduce some logic somewhere among these lines:

https://github.com/Automattic/amp-wp/blob/e3d036c2db60b215636a2b665414b0e599e9ad97/includes/class-amp-theme-support.php#L1550-L1587

The logic could look like doing a search for <label for="search-form-(.+?)"> to replace the variable pattern group with an incremented counter, and use this new ID in the corresponding label. The same would be needed for the theme-generated SVG.

This could potentially take the form of introducing a new filter which we could then hook into via AMP_Core_Theme_Sanitizer::add_buffering_hooks():

https://github.com/Automattic/amp-wp/blob/e3d036c2db60b215636a2b665414b0e599e9ad97/includes/sanitizers/class-amp-core-theme-sanitizer.php#L310-L324

The thing is that we'd want to discourage this from being used as it would introduce yet another way for a plugin/theme to mutate content being output. We already have (1) actions/filters running while generating the page, and we have (2) sanitizers running on the parsed DOM. Adding a third filter to mutate the response before passing to the sanitizers adds yet more complexity to the mix, yet another place to look when debugging. So I'd think that such a filter should be marked as only being used at a last resort. It is generally much better to do do everything needed on a document while the page is being generated, as that is the normal WordPress way of doing things. Anything after output buffering finishes is outside the domain of normal WordPress development.

@postphotos
Copy link
Contributor

postphotos commented Jul 28, 2018

The revision for AC2 here, looks good!

@westonruter -

  1. thanks for helping explain what you think is best here. I'd agree, if we can keep content mutation steps to only two at this time, that'd definitely be preferred.

  2. I do like the proposed "last resort" from AC3, but our general approach and advice should be to have theme authors.

  3. Do we want to do add this as one more AC to this ticket?

It may be beneficial to do better tracking of cache misses as part of this. If we're getting a lot of cache misses then maybe the response caching should turn itself off and a warning message should be displayed.

@westonruter
Copy link
Member Author

Do we want to do add this as one more AC to this ticket?

Yes.

@hellofromtonya hellofromtonya self-assigned this Aug 6, 2018
@hellofromtonya
Copy link
Contributor

hellofromtonya commented Aug 6, 2018

I don't think this problem is limited to the Twenty Seventeen theme. Rather, I suggest this will be a common issue on far too many sites.

Therefore, IMHO I think we should strive for a solution that:

  1. Catches these randomly generated ID strings.
  2. Automatically fixes them, as @westonruter notes above.

How? We'd have to decide on an approach such as:

  1. Look for ID patterns that contain a contiguous string of alphanumeric characters, as these are tell-tale signs of a randomly generated ID.
  2. Or automatically re-index all of the IDs on a page. <- though this option can cause unnecessary processing time.

We have control of the buffered content here. Therefore, we could scan all of the IDs and each reference to them within that buffer. Then we can do either of the approaches above.

Using this approach, we'd provide a stable and repeatable response for caching.

@westonruter
Copy link
Member Author

@hellofromtonya

Look for ID patterns that contain a contiguous string of alphanumeric characters, as these are tell-tale signs of a randomly generated ID.

I think such identifier-like strings are just what we've seen so far. In reality it could be any content that is randomly generated on the page. In fact, WordPress core itself allows header images to be randomized and this core feature would break the post-processor cache (or at least cause there to be multiple caches for each URL). Consider another case where someone has a widget that displays a random post from the site.

So in essence we'd need to continually diff the response with the previous response for a given URL in order to define the variable strings to substitute for placeholders, and then cache the response with the placeholders to then replace the placeholders with the current strings for any given response?

I think that we'd end up over-engineering a solution to preserve the post-processor cache. I think we'd be better off starting out with detection for continual cache misses, and in that case disable the caching.

This just brings to mind something from Batcache where a cache is only made of a page if the URL has been accessed more than once:

https://github.com/Automattic/batcache/blob/2b40b75142e4ae93d367347ed43b9581ffe61966/advanced-cache.php#L428-L430

It seems we could implement something similar, but instead of just the URL we also would take into account the hash of the content.

In the end the post-processor cache is primarily in place to improve performance for logged-in users browsing the site. For logged-out users, ideally a site would be already using full page caching (via Batcache or Varnish) and so the response cache wouldn't be hit at all. I think for now it is better to just suspend the post-processor cache if it is not being effectual.

@hellofromtonya
Copy link
Contributor

@westonruter

In reality it could be any content that is randomly generated on the page.

You're right. I agree.

I think that we'd end up over-engineering a solution to preserve the post-processor cache.

Agreed.

I think we'd be better off starting out with detection for continual cache misses, and in that case disable the caching.

Agreed. Therefore, our plan is to:

  1. Detect continual cache misses.
  2. Notify.
  3. Disable caching.

Batcache - It seems we could implement something similar, but instead of just the URL we also would take into account the hash of the content.

I was thinking about how to effectively correlate the changing content, its hashed cache key, and the URL. My initial thoughts are to hash the URL + the other parts of the response_cache_key minus the $response (i.e. content). Use that as our hash key and then compare for changing response_cache_key.

Let me prototype something here and we can continue the discussion.

@westonruter
Copy link
Member Author

I think we could go ahead and use the cache key but for hit/miss purposes it could become a value instead of a key, and the key can then be the URL (or a hash of the URL).

For example, consider we are grabbing a cached value for the URL:

$caches_for_url = wp_cache_get( $url, 'post_processor_cache_effectiveness' );

Each time that we write to the cache we could do:

$caches_for_url[] = $response_cache_key;
wp_cache_set( $url, 'post_processor_cache_effectiveness', $caches_for_url );

Then next time we want to serve a post-processor cache it could do:

$can_cache = ! ( is_array( $caches_for_url ) && count( $caches_for_url ) > self::CACHE_MISS_THRESHOLD );

Or alternatively, the $can_cache should perhaps be determined when writing, so that an option could be written to disable the cache.

@westonruter
Copy link
Member Author

AC2: Propose patch to Twenty Seventeen to use an alternative to use a counter or something more stable than uniqid().

Core patch has been proposed: https://core.trac.wordpress.org/ticket/44883

@kienstra
Copy link
Contributor

Moving To 'Ready For Merging'

If it's alright, I'm moving this to 'Ready For Merging,' as it looks like this doesn't need functional testing. Feel free to move it back.

@westonruter
Copy link
Member Author

Core patch for Twenty Seventeen has landed in trunk so it should at least be part of 5.0 but it could also still be part of 4.9.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Release
Projects
None yet
Development

No branches or pull requests

4 participants