Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static references / Browser caching #75

Closed
wants to merge 17 commits into from

Conversation

iwalz
Copy link

@iwalz iwalz commented Jan 20, 2013

This is a fundamental and functional implementation to benefit from the browsercache.
It also comes with an implementation to hash includes, that are handled via headLink() and headScript().

You can configure it via

array(
  'asset_manager' => array(
    'cache_control' => array(
      'lifetime' => '30m',          // 30 minutes browsercaching
      'etag' => true,                // Enable ETag validation
      'magicetag' => true        // Will hash the contents of the head* containers 
    )
  )
);

The implementation for the browsercache is not fully featured, it simply provides an Expires, Cache-Control (public, max-age) and Last-Modified header. Etag is seperated. But it's enough for caching on the browser.

The magic etag will change your includes to something like this:

http://example.com/css/bootstrap.css;ETag1-1d576-0000000050faf73c

With a lifetime of 150 days. The Last-Modified and ETag validation (especially on filtered elements) reduces the "overkill" of spawning PHP processes for asset serving a lot.

With the magicetag implementation, you can also configure a frontendcache with the pattern *;ETag and statuscode 304 to cache forever. Once the content changes, the URL changes and that's the "unique key" for most likely every frontend cache technology (F5 WA, Varnish, Traffic Server, Zend Page Cache,...). The files will remain under a different name on the clients pc for the next 150 days - but the performance benefit is incredible high. Even with apache and mod_cache. For a pure AssetManager implementation it means, that it always responds with a 304 without checking anything on the filesystem - a bit more overhead for rendering the base page (because of the include-rewrite) but nearly none for the asset serving itself.

There are still a few things planned:

  • Rewrite headLink and headScript output
  • Add fundamental browser cache implementation
  • Enable etag verification
  • documentation about the background idea / concept
  • Rewrite content to also change asset links with magic etags inside html, css and js
  • Read about the whole standard and provide "full featured" cache handling
  • Add configuration on an asset level
  • Calculate etag once and re-use it at the rewrite

Maybe something like this might be interesting too?

It's currently a proof of concept and 3am in the morning - will go through the code again once I got feedback :-)

ping @Ocramius

use Zend\ServiceManager\ServiceLocatorAwareInterface;
use Zend\ServiceManager\ServiceLocatorInterface;

class HeadScript extends StandardHeadScript implements ServiceLocatorAwareInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hate me, but can you add the ServiceLocatorInterface as a constructor parameter and get rid of the ServiceLocatorAwareInterface implementation please?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

use Zend\ServiceManager\ServiceLocatorInterface;

class HeadLink extends StandardHeadLink
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below for the ServiceLocatorAwareInterface question

* @param \Zend\ServiceManager\ServiceLocatorInterface $sl
*/
public function __construct(ServiceLocatorInterface $sl)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. This is factory logic

@iwalz
Copy link
Author

iwalz commented Jan 20, 2013

Will fix the design issues tomorrow - I'm sure it's not getting better today ;)

@@ -17,7 +18,8 @@
*/
class AssetManager implements
AssetFilterManagerAwareInterface,
AssetCacheManagerAwareInterface
AssetCacheManagerAwareInterface,
CacheControllerAwareInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd inject via constructor in the factory that produces the AssetManager. @RWOverdijk same should happen for the cache manager.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completly get rid of the aware interfaces? I know, you're not a fan of aware interfaces because of the lazy loading restriction ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not that. Here we're not even lazy loading, and the *Aware interfaces are totally useless. It's hard dependencies (not even checked against null): should be always set, which translates to "is a constructor parameter".

It's also a bit more perming in this particular case :)

@iwalz
Copy link
Author

iwalz commented Jan 20, 2013

Hmm, seems to be the same concept behind as discussed in #40

@Ocramius
Copy link
Contributor

Overall, this is a very good concept and I really like the view helpers producing an ETag. Can you ensure that this won't produce any stat calls?

Things you should be aware of before continuing this one (considering a production setup with file cache in place):

  1. file cache is usually hit by clients. This means that the PHP code that produces the correct headers will work only at first hit. You probably can even ignore that entire part, but I'd remove some clutter from onDispatch at least.
  2. it may be interesting to configure TTL per asset/collection
  3. how do you know if an asset was changed? Calculating ETag as of current implementation seems very very heavy. Consider hooking another cache in the CacheController and caching the ETag itself by URI.

{
$objectHash = spl_object_hash($asset);

if (isset($this->etagStorage[$objectHash])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this caching mechanism is useful at runtime, consider using a real Zend\Cache and store one ETag per URI in it?

@Ocramius
Copy link
Contributor

Also needs an awesome varnish example to show how powerful this logic is ;)

@RWOverdijk
Copy link
Owner

Am I missing the advantages of using this? I tend to use the static file cache, which makes caching a lot easier. There are also already a bunch of cachers available in assetic.

@iwalz
Copy link
Author

iwalz commented Jan 20, 2013

There is a good article of the "leaders" where it explains the benefit. This is inspired by the intelligent browser reference feature of F5 Big IP. You can't perform on that level without the usage of this background: https://www.dropbox.com/s/nth6vrm4twxam6r/eliminating-http-chattiness-wp.pdf cause it's related to a bottleneck in the HTTP protocol

@RWOverdijk
Copy link
Owner

Right but, you can configure your webserver to do the exact same thing. Unless I'm still not getting it, in that case, I blame the fact that it's sunday.

@iwalz
Copy link
Author

iwalz commented Jan 20, 2013

There is no way without rewriting the content (with the checksum in the content) and link that feature with the set of browsercache. I've talked to ocramius, that a cache integration here (especially for the etag calculation) is a must have and I agree. I attempt to maintain it further to have the biggest benefit you can have from a performance point of view. If it does exactly what you do without the need of sending the content it is indeed faster as every implementation you can have.

At work, the implementation of IBR on the F5 reduces our over-all traffic by more than 30%. This is an incredible number - and that's linked to the fact that every asset only needs to transfered the content once to every user. If it changes, a new "asset" is created by the different hash and it will re-fetch - only once per user.

@iwalz
Copy link
Author

iwalz commented Jan 20, 2013

I've added 2 new tasks to the pull request. You can decide if you want to have a look at all or get a step-by-step implementation. You can pick that and only use the lifetime, rest as false. Even with APC and stuff like that, you'll get a performancebenefit of at min 20% (I've noticed on a small vserver and locally, will provide real numbers in the next days)

$response->setStatusCode(304);
$responseHeaders = $response->getHeaders();
$responseHeaders->addHeaderLine('Cache-Control', '');
return $response;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline.

@iwalz
Copy link
Author

iwalz commented Jan 21, 2013

I had a look on how to use the Assetic Cache buster. It can be done with a few lines of code to calculate the hash:

$factory = new \Assetic\Factory\AssetFactory($asset->getSourceRoot());
$checksum = $factory->generateAssetName($asset, $filterManager->getFilters($asset->getSourceRoot(), $asset));

But the Cache Buster seem to be only in dev-master since 5 months, it's not even on the latest alpha tag.
How to procede? Rely on dev-master? Or continue with a self-made etag and call it cache busting inside the module?

I still continue the baseline testing to see the benefit even with native php for the browsercaching stuff. I could not even set any headers without having the files in /public - so you slow down you're application from an end user point of view a lot without having a frontendcache.

@Ocramius
Copy link
Contributor

@iwalz did you try asking directly @stof if an intermediate release is possible?

@iwalz
Copy link
Author

iwalz commented Jan 21, 2013

You pinge'd him already - sooo is there any way to push the cachebusting somehow to a stable version of Assetic @stof ? I don't need to build the logic on my own if there's something similar inside Assetic - but relying on a non stable version is IMHO not the best idea.

@RWOverdijk
Copy link
Owner

@iwalz I'm really happy that you decided to take a look. That's probably the best attitude I've ever seen with a developer, and I salute you for it. I think we should wait and see what he (@stof) says. Otherwise we can bump this to v2.0 and add cache warmup, too. I'm actually pretty excited about that idea personally.

@iwalz
Copy link
Author

iwalz commented Jan 21, 2013

I introduce a new manager for that, AssetCacheBustingManager. That will come with a helper to calculate the checksum, this can either be selfwritten or using the Assetic one. And I get rid of the magic etag naming. I think it's not the "official" naming for this technology, everyone has another name .. But we should use the Assetic convention.

Are you fine with using this checksum also to sent as ETag? You don't need 2 hashes for the same job (a content checksum) I guess. The way how the ETag is calculated is not really a standard but everyone adopted the technique apache is using. And we need another stat call for the "original" calculation - this is using the 'inode' which is not provided inside Assetic.

@RWOverdijk
Copy link
Owner

I almost agree. We could (should) use assetic, and use STRATEGY_MODIFICATION. This would be lightweight for files. I would rethink the implementation though, because now we have caching, filters and soon cache busting. Perhaps we should discuss this on irc some time. I want to do this the right way, so we won't hate ourselves later on :) But I would not use the etag implementation, because like this we won't have to. But we should really talk about this some time.

@iwalz
Copy link
Author

iwalz commented Jan 21, 2013

I agree. I'd clean that up and implement an APC solution for content checksum caching, configurable per asset and global. Once @stof gave feedback, I'll change that piece.

There are a few pro's, why it's helpful to use the cache headers inside an application - the cache technology in front can simply be configured to interact as a proxy and honor the cache headers. Than it's up to the developers to decide what's going to be cached (and you don't need to touch the system if something changes). Maybe it's a good idea, maybe not - I'd love so see at least a very small solution inside this module. Even if it's off per default and configurable. I've noticed yesterday, that there is not even a way with mod_expires to set the headers at the apache level :-(

With the content thing, you'll take note of differently compiled less files (e.g.) and they will remain in separate caches - whereas the other one will not even notice that something has changed (due to the filter interaction). On a collection, it checks all files for modification time and picks the latest. I'm not sure if that's a good idea with filters too. But you're right - it's more lightweight. Maye use the modification by default and make some exception by mime type (and/or filters)?

@iwalz
Copy link
Author

iwalz commented Jan 22, 2013

Let's close this PR, but I'll still continue on this basis. I'll write down a document the next days and provide it to you via mail to get a "clean roadmap" for this - and I'll provide references for the information I've read in the last days. I'm sure some amazing stuff can be done here :-)

@iwalz iwalz closed this Jan 22, 2013
@RWOverdijk RWOverdijk mentioned this pull request May 28, 2014
Closed
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants