New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin cache invalidation system aka make API Platform fast as hell #952

Merged
merged 1 commit into from May 23, 2017

Conversation

@dunglas
Member

dunglas commented Feb 20, 2017

Q A
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets n/a
License MIT
Doc PR todo

The usual quote:

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

Well, API Platform is an awesome name, so this PR try to solve the other hard thing: caching to make your API as fast as possible.

It introduces a builtin mechanism to always serve API responses from a cache (Varnish and CloudFlare are targeted), and invalid stale data in real time when a resource is updated, deleted or created.
With this mechanism, on my computer and using Docker, large and complex responses are served in ~15ms instead of ~700ms without cache.

The API Platform serializer has been tweaked to store the list of all resources included in a given response (the root document, but also embedded documents and documents appearing in lists).

The response is marked with all resources it contains in the Cache-Tags HTTP header. A md5 hash of all included IRIs is generated to prevent collisions and reduce the header size.
Then, all API's responses are stored with a high expiration time in the proxy cache. On all subsequent (read) requests from a client, the response is served from the proxy, the PHP application is not touched.

When a resource is modified (only changes made to Doctrine entities are supported for now), all responses containing or referencing it are purged from the cache.
A class to purge Varnish is provided in this PR, and a class to purge CloudFlare (enterprise plans only) will be provided later. An interface allows to add support for other cache providers.

Paged collections are also handled: if a resource is added or removed, all collection pages are purged. If a resource is edited, pages (and all other API responses, including those embedding it as a nested document) are purged.

Enabling this new feature doesn't require to change existing code. The following config enable the mechanism and mades your API instantly blazing fast:

api_platform:
    http_cache:
        enable_tags: true
        varnish_url: 'http://my-varnish-proxy'
        shared_max_age: 3600

I've also opened a PR to add Varnish with a compatible setup to the API Platform Docker setup: api-platform/api-platform#238.
Last but not least, this PR introduces some config options to set global cache settings. Example:

api_platform:
    http_cache:
        max_age: 60
        shared_max_age: 3600
        vary: ['Content-Type', 'Cookie']

Note: for advanced needs, prefer the awesome FosHttpCache library.

As stated in the quote, cache invalidation is a hard thing and this PR probably contains bugs and edge cases. Please test it and report any problem.

TODO:

  • Add unit tests

@dunglas dunglas referenced this pull request Feb 20, 2017

Merged

Add a Varnish container and enable cache invalidation #238

1 of 1 task complete
@fbourigault

This comment has been minimized.

Show comment
Hide comment
@fbourigault

fbourigault Feb 20, 2017

This looks amazing, but does providing a solution coupled to specific cache solution as first class citizen a good idea?

Maybe an intermediate solution build using standard Http headers would improve interoperability.

fbourigault commented Feb 20, 2017

This looks amazing, but does providing a solution coupled to specific cache solution as first class citizen a good idea?

Maybe an intermediate solution build using standard Http headers would improve interoperability.

@bendavies

This comment has been minimized.

Show comment
Hide comment
@bendavies

bendavies Feb 20, 2017

Contributor

Nice work @dunglas.

This is a very similar strategy to what i've implemented previously.
A few comments:

  1. why not just use FosHttpCache instead of implementing your own tagging/purgers etc?
  2. It does not look like this will work with authenticated apis that don't use cookies, i.e just header auth or query param (FosHttpCache has provisions for user specific caching). I'm guessing this would be for the user to implement?
  3. why do you md5 the tags?
Contributor

bendavies commented Feb 20, 2017

Nice work @dunglas.

This is a very similar strategy to what i've implemented previously.
A few comments:

  1. why not just use FosHttpCache instead of implementing your own tagging/purgers etc?
  2. It does not look like this will work with authenticated apis that don't use cookies, i.e just header auth or query param (FosHttpCache has provisions for user specific caching). I'm guessing this would be for the user to implement?
  3. why do you md5 the tags?
Show outdated Hide outdated composer.json
@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Feb 20, 2017

Member

@fbourigault there is an abstraction layer and and implementation (Varnish). Basically, anyone can add support for any cache provider supporting cache invalidation (it's a matter of implementing an interface). I plan to add support CloudFlare in core too.

There is no standard (HTTP headers) for cache invalidation (only expiration is supported in RFCs) but I choose to use Cache-Tags because it's the one used by CloudFlare.

@bendavies regarding FosHttpCache, for 2 reasons:

  • my implementation is a bit different (the md5 hash) and tight to the concept of "resources" and "IRIs", not present in FosHttpCache
  • the implementation is trivial (and it's possible to bridge it with FosHttpCache if wanted), it will ease our maintenance process to not have a dependency to a 3rd party library we don't maintain (our soft dependency to FosUser is a pain to maintain...).

It can work with any mechanism of authentification (it's why I've introduced a way to configure the Vary HTTP header). If the resource varies depending of the Authorization header, just set api_platform.http_cache.vary to ['Content-Type', 'Authorization'] (you'll need to tweak the Varnish config too). A key concept of REST is being stateless, a resource identified by an IRI (and some headers specified in Vary) should not vary depending of the current logged in user (in this case, the URL should not be the same).

3/ To avoid collisions (if the a resource is tagged with /foos/,/foos1 and another with only /foos/2, sending a BAN request with /foos as parameter will ban both, and it's a bug. Using md5 hashes prevent this problem. A CRC32 hash should do the trick too, but md5 is usually faster on modern servers.

Member

dunglas commented Feb 20, 2017

@fbourigault there is an abstraction layer and and implementation (Varnish). Basically, anyone can add support for any cache provider supporting cache invalidation (it's a matter of implementing an interface). I plan to add support CloudFlare in core too.

There is no standard (HTTP headers) for cache invalidation (only expiration is supported in RFCs) but I choose to use Cache-Tags because it's the one used by CloudFlare.

@bendavies regarding FosHttpCache, for 2 reasons:

  • my implementation is a bit different (the md5 hash) and tight to the concept of "resources" and "IRIs", not present in FosHttpCache
  • the implementation is trivial (and it's possible to bridge it with FosHttpCache if wanted), it will ease our maintenance process to not have a dependency to a 3rd party library we don't maintain (our soft dependency to FosUser is a pain to maintain...).

It can work with any mechanism of authentification (it's why I've introduced a way to configure the Vary HTTP header). If the resource varies depending of the Authorization header, just set api_platform.http_cache.vary to ['Content-Type', 'Authorization'] (you'll need to tweak the Varnish config too). A key concept of REST is being stateless, a resource identified by an IRI (and some headers specified in Vary) should not vary depending of the current logged in user (in this case, the URL should not be the same).

3/ To avoid collisions (if the a resource is tagged with /foos/,/foos1 and another with only /foos/2, sending a BAN request with /foos as parameter will ban both, and it's a bug. Using md5 hashes prevent this problem. A CRC32 hash should do the trick too, but md5 is usually faster on modern servers.

@fbourigault

This comment has been minimized.

Show comment
Hide comment
@fbourigault

fbourigault Feb 20, 2017

There is no standard (HTTP headers) for cache invalidation (only expiration is supported in RFCs) but I choose to use Cache-Tags because it's the one used by CloudFlare.

By standard HTTP headers, I mean revalidation. But maybe in such case, caching is not efficient.

fbourigault commented Feb 20, 2017

There is no standard (HTTP headers) for cache invalidation (only expiration is supported in RFCs) but I choose to use Cache-Tags because it's the one used by CloudFlare.

By standard HTTP headers, I mean revalidation. But maybe in such case, caching is not efficient.

@teohhanhui

This comment has been minimized.

Show comment
Hide comment
@teohhanhui

teohhanhui Feb 20, 2017

Member
Member

teohhanhui commented Feb 20, 2017

@teohhanhui

This comment has been minimized.

Show comment
Hide comment
@teohhanhui

teohhanhui Feb 20, 2017

Member
Member

teohhanhui commented Feb 20, 2017

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Feb 20, 2017

Member

I'll make the header configurable but I'll keep CloudFlare compatibility by default, it's an invaluable feature. By the way this header should not be exposed to the end client.

Regarding Vary, it's exactly what I explain in my post 🙂

Regarding the hit rate, doing more advanced things like hashing cannot be automated on the API Platform side, it requires a custom development and it's easy to implement using the new vary option.
Using FosHTTPCache for such usages is also an option, it's up to the developer.

Member

dunglas commented Feb 20, 2017

I'll make the header configurable but I'll keep CloudFlare compatibility by default, it's an invaluable feature. By the way this header should not be exposed to the end client.

Regarding Vary, it's exactly what I explain in my post 🙂

Regarding the hit rate, doing more advanced things like hashing cannot be automated on the API Platform side, it requires a custom development and it's easy to implement using the new vary option.
Using FosHTTPCache for such usages is also an option, it's up to the developer.

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Feb 20, 2017

Member

Regex (at least simple regexes) doesn't fix the issue in my example. /foos will match /foos/*, not only the collection response.

Member

dunglas commented Feb 20, 2017

Regex (at least simple regexes) doesn't fix the issue in my example. /foos will match /foos/*, not only the collection response.

@teohhanhui

This comment has been minimized.

Show comment
Hide comment
@teohhanhui

teohhanhui Feb 20, 2017

Member
Member

teohhanhui commented Feb 20, 2017

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Feb 20, 2017

Member

But maybe a more complex regex can do the trick, I'll give it a try (I agree that plain tags are better than hashes for debug).

Member

dunglas commented Feb 20, 2017

But maybe a more complex regex can do the trick, I'll give it a try (I agree that plain tags are better than hashes for debug).

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Feb 20, 2017

Member

The comma must be handled too.

Member

dunglas commented Feb 20, 2017

The comma must be handled too.

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Feb 20, 2017

Member

Thank you for the reviews everyone!

  • Plain IRIs are now used instead of md5 hashes (and the regex have been updated accordingly)
  • Collections are purged even when a PUT is done
Member

dunglas commented Feb 20, 2017

Thank you for the reviews everyone!

  • Plain IRIs are now used instead of md5 hashes (and the regex have been updated accordingly)
  • Collections are purged even when a PUT is done
@Simperfit

Simperfit approved these changes Feb 22, 2017 edited

👍

@soyuka

Would be nice to add unit tests to all those listeners!

@jderusse

This comment has been minimized.

Show comment
Hide comment
@jderusse

jderusse Mar 22, 2017

Contributor

beware, this PR sends cache headers on error 500 too

curl test2_nginx_1.docker/foos -I
HTTP/1.1 500 Internal Server Error
Content-Type: application/ld+json; charset=utf-8
Cache-Control: public, s-maxage=3600
Link: <http://test2_nginx_1.docker/docs.jsonld>; rel="http://www.w3.org/ns/hydra/core#apiDocumentation"
Vary: Content-Type
Cache-Tags: /foos
Date: Wed, 22 Mar 2017 07:20:44 GMT
Contributor

jderusse commented Mar 22, 2017

beware, this PR sends cache headers on error 500 too

curl test2_nginx_1.docker/foos -I
HTTP/1.1 500 Internal Server Error
Content-Type: application/ld+json; charset=utf-8
Cache-Control: public, s-maxage=3600
Link: <http://test2_nginx_1.docker/docs.jsonld>; rel="http://www.w3.org/ns/hydra/core#apiDocumentation"
Vary: Content-Type
Cache-Tags: /foos
Date: Wed, 22 Mar 2017 07:20:44 GMT
@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas Apr 10, 2017

Member

I'll refactor this PR to get rid of the stateful service.

Member

dunglas commented Apr 10, 2017

I'll refactor this PR to get rid of the stateful service.

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas May 2, 2017

Member

About

Custom HTTP headers should be prepended by X-. CloudFlare is flaunting
the spec (bad).

This convention has been deprecated in RFC 6648. See https://developer.mozilla.org/en-US/docs/Setting_HTTP_request_headers and https://specs.openstack.org/openstack/api-wg/guidelines/headers.html. I'll remove all X- prefix from this PR.

Member

dunglas commented May 2, 2017

About

Custom HTTP headers should be prepended by X-. CloudFlare is flaunting
the spec (bad).

This convention has been deprecated in RFC 6648. See https://developer.mozilla.org/en-US/docs/Setting_HTTP_request_headers and https://specs.openstack.org/openstack/api-wg/guidelines/headers.html. I'll remove all X- prefix from this PR.

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas May 15, 2017

Member

All comments handled. Can you make a last review?

Member

dunglas commented May 15, 2017

All comments handled. Can you make a last review?

Show outdated Hide outdated features/http_cache/tags.feature
!$request->isMethodCacheable()
|| !$response->isCacheable()
|| (!$attributes = RequestAttributesExtractor::extractAttributes($request))
|| !$resources = $request->attributes->get('_resources')

This comment has been minimized.

@soyuka

soyuka May 15, 2017

Member

no parenthesis here, but the line above yes?

@soyuka

soyuka May 15, 2017

Member

no parenthesis here, but the line above yes?

This comment has been minimized.

@dunglas

dunglas May 16, 2017

Member

The parenthesis on the previous line are mandatory because of operator priorities.

@dunglas

dunglas May 16, 2017

Member

The parenthesis on the previous line are mandatory because of operator priorities.

@@ -94,7 +95,7 @@ public function testNormalize()
);
$normalizer->setSerializer($serializerProphecy->reveal());
$this->assertEquals(['name' => 'hello'], $normalizer->normalize($dummy));
$this->assertEquals(['name' => 'hello'], $normalizer->normalize($dummy, null, ['resources' => []]));

This comment has been minimized.

@soyuka

soyuka May 15, 2017

Member

Shouldn't we avoid changing those tests? It feels like the tests (this one and the others) won't pass without 2 more arguments (eg, null, ['resources' => []]). Is it the case / Isn't this breaking things somehow?

@soyuka

soyuka May 15, 2017

Member

Shouldn't we avoid changing those tests? It feels like the tests (this one and the others) won't pass without 2 more arguments (eg, null, ['resources' => []]). Is it the case / Isn't this breaking things somehow?

This comment has been minimized.

@dunglas

dunglas May 16, 2017

Member

It passes without the arguments and without change, but I've modified it to test the new behavior.

@dunglas

dunglas May 16, 2017

Member

It passes without the arguments and without change, but I've modified it to test the new behavior.

This comment has been minimized.

@soyuka

soyuka May 19, 2017

Member

Okay as long as there is a test that doesn't adds up those arguments.

@soyuka

soyuka May 19, 2017

Member

Okay as long as there is a test that doesn't adds up those arguments.

@Simperfit

This comment has been minimized.

Show comment
Hide comment
@Simperfit

Simperfit May 19, 2017

Member

I'm going to test this with a project, ill tell if there are some bugs

Member

Simperfit commented May 19, 2017

I'm going to test this with a project, ill tell if there are some bugs

@EmiiKhaos

This comment has been minimized.

Show comment
Hide comment
@EmiiKhaos

EmiiKhaos May 23, 2017

May you use http://httplug.io/ instead of guzzle directly?

EmiiKhaos commented May 23, 2017

May you use http://httplug.io/ instead of guzzle directly?

@teohhanhui

This comment has been minimized.

Show comment
Hide comment
@teohhanhui

teohhanhui May 23, 2017

Member

May you use http://httplug.io/ instead of guzzle directly?

It has been raised before, and I'd like to once again echo this.

Member

teohhanhui commented May 23, 2017

May you use http://httplug.io/ instead of guzzle directly?

It has been raised before, and I'd like to once again echo this.

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas May 23, 2017

Member

HTTPPlug introduces a lot of complexity (including an extra bundle to configure... until we get Flex) for no gain here. I'm still 👎 for now.
I understand the problem with Guzzle 3, but API Platform is younger than Guzzle 3, every bundle I use are compatible with Guzzle 6, and AFAIK, there is no projet to develop Guzzle 7 soon.

Member

dunglas commented May 23, 2017

HTTPPlug introduces a lot of complexity (including an extra bundle to configure... until we get Flex) for no gain here. I'm still 👎 for now.
I understand the problem with Guzzle 3, but API Platform is younger than Guzzle 3, every bundle I use are compatible with Guzzle 6, and AFAIK, there is no projet to develop Guzzle 7 soon.

@dunglas

This comment has been minimized.

Show comment
Hide comment
@dunglas

dunglas May 23, 2017

Member

And by the way, Guzzle is a soft dependency here. So someone wanting to use another client can do it, he just have to implement by himself the PurgerInterface, not a big deal.

Member

dunglas commented May 23, 2017

And by the way, Guzzle is a soft dependency here. So someone wanting to use another client can do it, he just have to implement by himself the PurgerInterface, not a big deal.

@theofidry

This comment has been minimized.

Show comment
Hide comment
@theofidry

theofidry May 23, 2017

Member

As long as Guzzle is a soft dependency it's ok

Member

theofidry commented May 23, 2017

As long as Guzzle is a soft dependency it's ok

@dunglas dunglas merged commit 5a81357 into api-platform:master May 23, 2017

4 checks passed

SensioLabsInsight Code quality OK.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on master at 97.032%
Details

@dunglas dunglas deleted the dunglas:cache_tags branch May 23, 2017

hoangnd25 pushed a commit to hoangnd25/core that referenced this pull request Feb 23, 2018

Merge pull request #952 from dunglas/cache_tags
Builtin cache invalidation system aka make API Platform fast as hell
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment