Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect high cache miss rate for obtaining parsed CSS from variable stylesheets #2092

Closed
westonruter opened this issue Apr 8, 2019 · 11 comments · Fixed by #4404 or #4177
Closed

Detect high cache miss rate for obtaining parsed CSS from variable stylesheets #2092

westonruter opened this issue Apr 8, 2019 · 11 comments · Fixed by #4404 or #4177
Assignees
Labels
Bug Something isn't working Caching CSS P1 Medium priority
Projects
Milestone

Comments

@westonruter
Copy link
Member

When PHP is generated with random content (e.g. #1239), the post-processor cache gets disabled as of #1325 to prevent polluting the object cache with garbage cache values. This only matters when a persistent object cache is present, since that is when the post-processor cache is used.

When the style sanitizer comes across a stylesheet, it parses it and stores the cached result in a transient. This is done regardless of whether there is a persistent object cache present, since parsing CSS is expensive. If, however, the CSS has variable content in any way then this can result in a site with out a persistent object cache to have its wp_options table balloon to hundreds of thousands of garbage transients, as seen on this support topic.

I believe such variable stylesheets to be an uncommon occurrence, but when it does happen, we should warn the user and stop storing the parsed CSS in transients. There could also be a parameter to the style sanitizer to disable the transient cache, as well as provide a threshold for what constitutes an unacceptable rate of cache misses.

@westonruter westonruter added this to Backlog in Ongoing Aug 26, 2019
@MackenzieHartung MackenzieHartung moved this from Backlog to To Do in Ongoing Sep 5, 2019
@schlessera
Copy link
Collaborator

I suggest solving this like you avoid filling disks with log information: by using rotation.

We should define a limited number of slots to store cached CSS results in and use the content hash to check whether we need to use a new slot or not. So we could for example cache the last 10 versions of the CSS stylesheet.

This would avoid the worst case scenario where you just fill the DB until the server is taken down. We could also track the velocity of how fast we go through these to show a warning to the user if caching does not make sense.

@westonruter
Copy link
Member Author

That sounds great.

Note that each individual stylesheet (link, style element, style attribute) is stored in a separate transient. So the rotation should be expanded to include a large enough slots for all the variable number of stylesheets on a site. There could be quite a few. Perhaps default to 1000? Maybe it should be configurable?

@westonruter westonruter added the Bug Something isn't working label Sep 12, 2019
@schlessera schlessera moved this from To Do to In Progress in Ongoing Sep 12, 2019
@schlessera
Copy link
Collaborator

schlessera commented Sep 12, 2019

I'll work on this in two separate PRs:

@schlessera
Copy link
Collaborator

With #3300, the main problem should be resolved: we won't be able to completely fill up the database anymore.

To now detect high velocity cycling so that we can show a warning to the user and/or skip caching, I can think of two separate approaches:

  • After each request, check the distance that the index has travelled and compare it to a threshold. This is a simplistic approach and would probably detect most instances where we do useless caching, provided we get the threshold right. However, as we don't know any specifics, we can only disable the entire caching system for stylesheets. This would require us to add one more DB write to store information that allows us to calculate this delta.
  • Store not only hashes about what is cached, but also references about the file/source that this cache represents. This requires a bit more logic, but it would allow us to detect the specific culprit and only disable caching for that single stylesheet, keeping the caching enabled for the rest. This would probably double the size of the pool map, which is already a serialized array with 1000 elements by default.

@westonruter Thoughts?

@schlessera
Copy link
Collaborator

Alternatively, we could just say that this problem is exotic anyway, and leave it at the state we are at with #3300, where the worst case is avoided (taking the server down).

@westonruter
Copy link
Member Author

To now detect high velocity cycling so that we can show a warning to the user

Couldn't the warning simply be shown when the cache pool index rotates back to 0? https://github.com/ampproject/amp-wp/pull/3300/files#r326742376

@schlessera
Copy link
Collaborator

Couldn't the warning simply be shown when the cache pool index rotates back to 0? /pull/3300/files#r326742376

As stated in #3300 (comment) :
"No, this is only expressing the "distance travelled". Velocity is distance over time, we don't know anything about the time here. If we hit the end of the pool several times a day, we have a very high velocity. If we only hit it every 3 months, it might just be a site with lots of edits."

@swissspidy
Copy link
Collaborator

Someone I know has also seen a high cache miss rate with AMP_Image_Dimension_Extractor because they used some image URLs with randomized query args for cache busting. Perhaps the extractor could get smarter there. At the minimum, it would be nice if one could pass a list of query args to AMP_Img_Sanitizer that should be stripped from image URLs before hashing them in \AMP_Image_Dimension_Extractor::determine_which_images_to_fetch.

@westonruter
Copy link
Member Author

@swissspidy Yes, that's important to bring up. Can you share examples of the image URL with randomized query args?

Nevertheless, I don't stripping the query args from the hashed cache key is going to work. Consider two URLs:

  • https://i0.wp.com/example.com/wp-content/uploads/2018/12/hero.jpg?resize=700%2C700&ssl=1
  • https://i0.wp.com/example.com/wp-content/uploads/2018/12/hero.jpg?resize=350%2C350&ssl=1

If the query args were stripped, then only the correct dimensions will be stored for one of the images, with the other then getting larger or smaller dimensions depending on which one was seen first.

If we stripped query args, we'd need to be careful about it, such as selectively only certain named removing args (e.g. rand) or if the query string itself looks like a random number (e.g. ?2580138906).

A complication here is that such images are probably not even supposed to have dimensions in the first place. It may be an indication of a tracker. Having a random query string would imply that the image should not even be getting turned into an amp-img but rather an amp-pixel, but it's impossible to know for sure. (That being said, it is more reliable for a tracker image to have the actual random query arg added via JS, so this may not be relevant.) Nevertheless, if it is intended that the image be cache-busted, then perhaps this is an indication that the underlying image has actually changed and with that the underlying dimensions, and so actually the AMP plugin should be obligated to obtain the dimensions.


Maybe we need to treat such dimensionless images which have random-looking query args as being validation errors and go ahead and remove them from the DOM entirely, forcing the user to fix the underlying problem. It's either this, or we'll need to supply fallback dimensions.

@kmyram kmyram moved this from In Progress to Backlog in Ongoing Jan 23, 2020
@kmyram kmyram added P1 Medium priority Caching labels Jan 23, 2020
@westonruter
Copy link
Member Author

Based on our last discussion, the shorter-term solution to the problem appears to be to introduce a new filter which is used to allow a site to disable caching parsed CSS in transients. I've drafted a PR to do this in #4177. This may be better combined with the cache pool work in #3300 somehow.

Something else that comes to mind is that we could detect when there seems to be an extraordinary number of transients for parsed CSS and then automatically disable the transient storage. Consider a daily WP Cron that queries for for SELECT COUNT(*) FROM $wpdb->options WHERE option_name LIKE '_transient_amp-parsed-stylesheet-v24%' (the version should be taken into account, or else transients with old versions should be proactively deleted during the WP Cron event). If that number is over some threshold, transient caching could be automatically turned off. This WP Cron action would store the flag for disabling of the transient caching in an option.

An even more sophisticated approach would be to not just make a determination based on a single reading, but to capture the counts daily on an ongoing basis and then calculate the average rate of increase. That may or may not be helpful vs a single total count.

Extending this to support external object cache users would require incrementing a counter every time a cache MISS happens, and then the counter could be checked during the daily cron to see if it has gone too high, and then reset it back to zero. A downside to that is it means an extra cache-set for every cached stylesheet. See also @schlessera's comment in #2092 (comment).

My only concern with all of this is we want to avoid the current problems with the post-processor response caching, which gets disabled way to easily (see #4178 for proposal to remove it).

All of this also applies to the storage of image dimensions.

A Site Health warning when transient caching is disabled (and external object cache is not present). Not having a parsed CSS cache greatly slows down page generation.

@westonruter westonruter added this to the v1.5 milestone Mar 12, 2020
@kmyram kmyram moved this from Backlog to To Do in Ongoing Mar 12, 2020
@schlessera schlessera moved this from To Do to In Progress in Ongoing Mar 19, 2020
@schlessera schlessera moved this from In Progress to Ready for Review in Ongoing Mar 20, 2020
@westonruter
Copy link
Member Author

westonruter commented Mar 23, 2020

Fixed by #4177

@westonruter westonruter moved this from Ready for Review to Ready for QA in Ongoing Mar 23, 2020
@westonruter westonruter moved this from Ready for QA to Done in Ongoing Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Caching CSS P1 Medium priority
Projects
Ongoing
  
Done
5 participants