Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cache pool abstraction to cache stylesheets #3300

Closed
wants to merge 8 commits into from

Conversation

schlessera
Copy link
Collaborator

This PR adds a cache pool abstraction that abstracts away the differences between the object cache and transients when caching stylesheets.

The transient cache implementation provides a fixed-size pool of cache entries that are rotated to enforce a maximum number of transients and avoid filling up the database.

Note: For the architecture rethink, this should be an interface with 2 separate implementations that will get injected, instead of a single class that contains all implementations.

See #2092

@kienstra
Copy link
Contributor

Approved

Hi @schlessera,
This looks really good.

Once a parsed stylesheet is stored in a transient, this retrieves it from the transient, instead of parsing it again.

I set an Xdebug breakpoint and stepped through $this->cache_pool->set( $cache_key, $parsed ), and it looked to work as excepted.

@@ -399,6 +418,7 @@ public function __construct( DOMDocument $dom, array $args = [] ) {
$this->base_url = untrailingslashit( $guessurl );
$this->content_url = WP_CONTENT_URL;
$this->xpath = new DOMXPath( $dom );
$this->cache_pool = new AMP_Cache_Pool( self::CACHE_GROUP );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice dependency injection 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not being injected because I don't think the architecture is currently set up well for injecting dependencies.
For now, I wanted to keep it simple and just directly instantiated here. Hopefully, we'll have injection for v2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. I thought this was dependency injection.

Copy link
Collaborator Author

@schlessera schlessera Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency injection would be if exterior code would inject "some cache" and this code here would not need to know which exact implementation is being used. This would then decouple this code from the cache, and the cache could be changed/replaced without needing to adapt this code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks!

* @param string $group Optional. Group to use. Defaults to an empty string.
* @param int $size Optional. Size of the cache pool.
*/
public function __construct( $group = '', $size = self::DEFAULT_POOL_SIZE ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like so far, this class is only instantiated with one argument:

$this->cache_pool  = new AMP_Cache_Pool( self::CACHE_GROUP );

...though maybe later this class will be instantiated with 2 arguments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size is needed, and it felt obvious to allow this to be set through instantiation. This way, you could also create several pools with differing groups that might have different sizes as well.

*/
class Test_AMP_Cache_Pool extends WP_UnitTestCase {

public function test_transients_storing_a_value() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, very thorough testing 😄

} else {
$parsed = get_transient( $cache_group . '-' . $cache_key );
}
$parsed = $this->cache_pool->get( $cache_key );
Copy link
Contributor

@kienstra kienstra Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to work well.

Once the stylesheet is stored in a transient, the ->get() method retrieves it.

sanitizer-here

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Left a few comments. This may make sense to also use in the AMP_Image_Dimension_Extractor since images can also end up filling up the wp_options table.

$this->group = $group;
$this->size = $size;

$this->is_object_cache_available = wp_using_ext_object_cache();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using an external object cache, then essentially the cache pool is not used, is that right? This assumes that that the persistent object cache has implemented some algorithm for garbage collection.

Aside: Shouldn't WordPress transient system implement some cache replacement policy? If there was such a cache pool in core like you've implemented here, then there wouldn't be a need for this class, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the manual pooling/rotation is only used with transients. Proper object caches have built-in mechanisms for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think WordPress should use something like this with a separate table, transients as they are now are a hack with several performance issues.

includes/utils/class-amp-cache-pool.php Outdated Show resolved Hide resolved
includes/sanitizers/class-amp-style-sanitizer.php Outdated Show resolved Hide resolved
// The expiration is to ensure transient doesn't stick around forever since no LRU flushing like with external object cache.
set_transient( $cache_group . '-' . $cache_key, $parsed, MONTH_IN_SECONDS );
}
$this->cache_pool->set( $cache_key, $parsed );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. 👍

self::$pool_indexes[ $this->group ]++;

if ( self::$pool_indexes[ $this->group ] >= $this->size ) {
self::$pool_indexes[ $this->group ] = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regards to #2092 (comment), this would be where we could potentially show an error message to the user, warning the that stylesheet cache pool is overflowing, yes? This should issue a PHP warning to the error log?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is only expressing the "distance travelled". Velocity is distance over time, we don't know anything about the time here. If we hit the end of the pool several times a day, we have a very high velocity. If we only hit it every 3 months, it might just be a site with lots of edits.

Co-Authored-By: Weston Ruter <westonruter@google.com>
@schlessera
Copy link
Collaborator Author

schlessera commented Sep 20, 2019

This may make sense to also use in the AMP_Image_Dimension_Extractor since images can also end up filling up the wp_options table.

Yes, it was already built to be usable in multiple locations, with differing groups and sizes, and the tests already make sure this works as expected.

Co-Authored-By: Weston Ruter <westonruter@google.com>
@schlessera
Copy link
Collaborator Author

I wonder whether the default size of 1000 pool entries (per group) is not too high. The main issue I can see is that we serialize and unserialize the entire pool map. For 1000 entries, this might have a non-negligible performance impact.

@westonruter
Copy link
Member

It's hard to know. We somewhat need data to help inform that.

This provides the count:

wp transient list --fields=name --search='amp-parsed-stylesheet-*' --format=count

I've asked someone to run that to get a number from a production site that is not running an object cache.

@westonruter
Copy link
Member

Note that a the parsed stylesheets being stored here can come from:

  1. Stylesheets referenced by link elements.
  2. Stylesheets inline in style elements.
  3. Style properties stored in inline style attributes.

A transient will be created for each 👉 unique 👈 instance of the above to contain the parsed stylesheet information.

@schlessera
Copy link
Collaborator Author

On hold for now as we are awaiting an export of production transient data from an actual site to evaluate.

@schlessera
Copy link
Collaborator Author

schlessera commented Sep 26, 2019

Here's my initial observation from looking through the export we got from a bigger site:

  • Every value is a serialization of a serialization. Given that most of our memory usage is located in unserialize() according to initial profiling, I recommend we change this so that we only ever have one level of serialization.
  • The format is not very space-efficient, there's certainly improvements we can do here.
  • A large majority of the entries are every simple, often just a single property, like flex:0.3456 or text-align:center. I think we should discuss how to store this in a more optimized way and/or introducing utility classes. So, instead of having .amp-wp-4263548 { text-align: center; } .amp-wp-9263742 { text-align: center; } ..., we could have .amp-wp-tacenter { text-align: center; } instead which would deduplicate this. Similarly, we could have "calculated utility classes" that need to be calculated and tracked, but don't need to be stored. So, instead of .amp-wp-3856964 { flex: 2.564; } we could have .amp-wp-flex-2-564 { flex: 2.564; }. This would not deduplicate these (as they are quite individualized), but we would not need to store them in the database.
  • The fields that are almost always empty should not even be part of the serialization, like 'validation_results' or 'imported_font_urls'.

@schlessera
Copy link
Collaborator Author

Also, what is the reasoning behind extracting all the inline styles into <style amp-custom>? As far as I can tell, AMP does support inline styles...
Image 2019-09-26 at 4 45 29 PM

@kienstra
Copy link
Contributor

Also, what is the reasoning behind extracting all the inline styles into <style amp-custom>? As far as I can tell, AMP does support inline styles...

Ah, interesting. Maybe that's because AMP used to not support inline styles.

@westonruter
Copy link
Member

Ah, interesting. Maybe that's because AMP used to not support inline styles.

That's mostly why, but there are other reasons. See #1313 which suggests the elimination of moving inline style attributes to style rules, but this is problematic currently when a style rule contains an !important property, as that is how a style rule can override an inline style, and there's no way to override it otherwise, as far as I know. Please continue this discussion following #1313 (comment).

@westonruter
Copy link
Member

  • A large majority of the entries are every simple, often just a single property, like flex:0.3456 or text-align:center. I think we should discuss how to store this in a more optimized way and/or introducing utility classes. So, instead of having .amp-wp-4263548 { text-align: center; } .amp-wp-9263742 { text-align: center; } ..., we could have .amp-wp-tacenter { text-align: center; } instead which would deduplicate this. Similarly, we could have "calculated utility classes" that need to be calculated and tracked, but don't need to be stored. So, instead of .amp-wp-3856964 { flex: 2.564; } we could have .amp-wp-flex-2-564 { flex: 2.564; }. This would not deduplicate these (as they are quite individualized), but we would not need to store them in the database.

These are all coming from inline styles (in style attributes). This being the case, it may be better to parse the style rules at runtime rather than fire up the PHP CSS Parser. This was done recently in #3209 when converting elements for layout=fill. Perhaps this could be done here as well, and this would eliminate a lot of transients. We'd want to ensure the same sanitizations are done on the properties, however.

@schlessera
Copy link
Collaborator Author

These are all coming from inline styles (in style attributes). This being the case, it may be better to parse the style rules at runtime rather than fire up the PHP CSS Parser. This was done recently in #3209 when converting elements for layout=fill. Perhaps this could be done here as well, and this would eliminate a lot of transients. We'd want to ensure the same sanitizations are done on the properties, however.

I'm not quite sure I understand what you mean by the above, specifically the suggestion to "parse the style rules at runtime rather than fire up the PHP CSS Parser". Would that mean they are neither validated/sanitized, not tree-shaken anymore? And what would you then actually do with them at runtime? I seem to be missing something here.

@westonruter
Copy link
Member

westonruter commented Sep 27, 2019

The inline styles don't need to be tree-shaken, because we know they are going to exist in the page. What I mean is that in AMP_Style_Sanitizer::collect_inline_styles() it is currently constructing a style rule dynamically like so:

$class = 'amp-wp-' . substr( md5( $style_attribute->nodeValue ), 0, 7 );
$root = ':root' . str_repeat( ':not(#_)', self::INLINE_SPECIFICITY_MULTIPLIER );
$rule = sprintf( '%s .%s { %s }', $root, $class, $style_attribute->nodeValue );

And then the rule gets processed:

$processed = $this->process_stylesheet(
$rule,
[
'allowed_at_rules' => [],
'property_whitelist' => $this->style_custom_cdata_spec['css_spec']['declaration'],
]
);

This then causes the PHP CSS Parser to be fired up. The AMP_Style_Sanitizer::process_stylesheet() takes as input a stylesheet, meaning it has to handle arbitrary style rules. However, the data being processed here is simple style properties. So it seems this could do something faster by skipping the full parsing and instead extract the properties to then do the work in the AMP_Style_Sanitizer::process_css_delcaration_block(), which would be limited to just needing to do:

And I think that's all that needs to be done for properties in a style attribute.

@schlessera
Copy link
Collaborator Author

Okay, I understood now, the term "runtime" was confusing me. I'll open a new issue for that.

What is needed now to take this PR across the finish line?

  • Add size argument to $DEFAULT_ARGS so sites can override our default.
  • Add documentation about the transient pool and how to adapt its size.

Would this be enough to merge this for now?

@westonruter
Copy link
Member

Given that a large majority of the transients are for inline styles, if the PHP CSS Parser were bypassed in favor of a more lightweight parser (e.g. using regex, though this is less robust!) then we could avoid needing to store so many transients and there will be less need to increase the size of the pool.

@westonruter
Copy link
Member

I'm just concerned that if a site has 5000 transients and the pool size is 1000, but they don't know they have so many, then there will be very frequent cache misses and they wouldn't know it is happening.

@archon810
Copy link

This 👆.

@westonruter
Copy link
Member

I wonder if the PHP CSS Parser is actually sufficiently fast for the simple case if CSS properties in a style attribute.

@westonruter
Copy link
Member

Also because there is no need to parse the selector for tree shaking.

@westonruter
Copy link
Member

Closing in favor of #4177

@schlessera schlessera deleted the fix/2092-rotate-stylesheet-cache branch November 6, 2020 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working cla: yes Signed the Google CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants