Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Varnish cache tags with a big number of entities #3168

Closed
bastnic opened this issue Oct 11, 2019 · 9 comments
Closed

Varnish cache tags with a big number of entities #3168

bastnic opened this issue Oct 11, 2019 · 9 comments

Comments

@bastnic
Copy link
Contributor

bastnic commented Oct 11, 2019

On a biiig api platform website with lots and lots of nested entities, we need varnish to be fully working.

I talked with @alanpoulain last night and we concurred that I don't have an usual setup but the experience must be shared as it can helps some people.
I already have fixed all my issues on my side, but cool if we can fix the official release.

There is multiples troubles:

  • too much entities means too much iris, iris are quite long so with more than one hundred iris it crashes because Cache-Tags headers are too long. More on that later.
  • too much entities updated means too much iris to clean on VarnishPurger, it crashes and do not clean anything.
  • we also had Generate iri for child related resources #2905 but it's fixed now.

Too much cache tags to add on the response

On the biggest response i found, the Cache-Tags header weight 100k chars. It's quite long.
I tried multiples approaches:

  • increase (a lot) varnish headers (hi http_resp_hdr_len, http_resp_size): doesn't scale as much as I want
  • chunk the headers, as it's allowed by http (hi http_max_hdr): it's seems to work at first but in fact when the purge happens, it only use the Cache tags of the first header line. Maybe a bug on my side
  • on a collection, just strip all the tags of the items of that collection as when we do an operation on a resource, the iri of the collection is also given. It reduces a LOT!
  • BUT, it only removes the iris of the resource itself. My entities are quite nested so I also have as much iris as different subsresources. So much that I can almost said that this is a collection itself.

Brace yourself, awful code:

// in AddTagsListener
$posibleCollections = [];

        // first, attempt to get all iri's prefix to check if collections emerged
        foreach ($resources as $resource) {
            if (preg_match('#(.*)/.*-.*-.*.*$#', $resource, $matches)) {
                if (!isset($posibleCollections[$matches[1]])) {
                    $posibleCollections[$matches[1]] = 0;
                }
                ++$posibleCollections[$matches[1]];
            }
        }

        // extract all collections (more than XX iris of the same type)
        $posibleCollections = array_filter($posibleCollections, function ($count, $collection) {
            return $count > self::MAX_IRIS_TO_BE_CONSIDERED_AS_COLLECTION; // magic number
        }, ARRAY_FILTER_USE_BOTH);

        // then remove the corresponding iris
        $resources = array_filter($resources, function ($item) use ($posibleCollections) {
            foreach ($posibleCollections as $collection => $count) {
                if (strpos($item, $collection) === 0) {
                    return false;
                }
            }

            return true;
        });

        // add the collection
        foreach ($posibleCollections as $collection => $count) {
            $resources[$collection] = $collection;
        }

with this patch, I' finish with only 4-5 collections on some big list, and I'm ok with that.

One issue I may have is that I'm using ORMBehaviors\Translatable and when I update a translation only and nothing on the entity, the entity itself is not updated (only the translation), but the translation is not an api platform resource. In PurgeHttpCacheListener, the main entity is seen as a "relationTag" and so the collection iri is not added to the list of iri to purge. I will fix that with another doctrine listener that update the main entity updated at field and so the entity will be seen as updated.

Too much cache tags to add on the response

When I insert a lot of entities at once, the purge cannot works as the regexp is waaaaaaaaay too long.

Multiple approaches too:

  • chunk the iris, and make XX BAN calls
  • if more than XX (magic number) iris, considers it's a big flush and wipe everything. That's the current approach I use (mainly because i'm lazy)

The first approach can add some safety and fix #1856.

WDYT?

cc @teohhanhui

@bastnic
Copy link
Contributor Author

bastnic commented Nov 22, 2019

Follow up of a discussion at SymfonyCon:

  • if cache is not exact, there is more chance that cache will be disabled, so back to php performance and apip is not crasy about it.
  • if cache is flushed a little too much, but is exact, it's not optimal but at least we can check on the hit ratio on Varnish, and maybe analyse later which resource should not be embeded in another one to avoid flushing list we don't want.

I'm all in favor of a strategy to reduce cache tags sent to Varnish, to be able to work with default nginx / varnish config.

@Wirone
Copy link

Wirone commented Jan 14, 2020

We're facing this issue currently with HAProxy rejecting our Nginx (with bumped buffer size) response because of Cache-Tags size. I've tried many different HAProxy setups but none worked. It makes part of our production service unusable after new version was deployed... :(

@AltumSonatur
Copy link

AltumSonatur commented Mar 31, 2020

I stumbled upon the same need of "disabling" varnish for a specific collection, because of the huge number of IRIs retrieved, not to mention IRIs of entities nested in each one of them. The Cache-Tags header was so long that it was crashing (error 500).

The built-in features of API Platform apparently do not provide an option to disable a particular collection. The fix proposed by @bastnic was a bit complicated for my needs, but inspired me to create a very simple EventSubscriber.

I thought interesting to share it if someone encounters the same issue. You just have to copy-paste the code in a new file src/EventSubscriber/CacheTagsSubscriber.php, then replace your_collection_name by the entity you need to disable, and it works :

<?php
namespace App\EventSubscriber;

use Symfony\Component\EventDispatcher\EventSubscriberInterface;
use Symfony\Component\HttpKernel\KernelEvents;
use Symfony\Component\HttpKernel\Event\ResponseEvent;

final class CacheTagsSubscriber implements EventSubscriberInterface
{
    public static function getSubscribedEvents()
    {
        return [
            KernelEvents::RESPONSE => ['removeCacheTags', -2],
        ];
    }

    public function removeCacheTags(ResponseEvent $event)
    {
        $request = $event->getRequest();
        $response = $event->getResponse();

        if($request->getPathInfo() === '/your_collection_name) {
            $response->headers->remove('Cache-Tags');
        }

    }
}

@bastnic
Copy link
Contributor Author

bastnic commented Apr 6, 2020

Thanks @arthurpietruch for your feedback!

I agree this is too complicated, but it maintains the "exact" nature of the cache while still being somewhat cached (it's purged too much but I prefer this ratio for now).

I'm waiting (searching) for a better fix ;).

@lwillems
Copy link

Hi @bastnic

Did you find a cleaner way to handle this ?
Got the exact same issue with huge API and relationships

Example calling my categories endpoint leads to 8K Cache-tags headers with 500 error.
Screenshot from 2022-01-31 15-37-42

Regards

@bastnic
Copy link
Contributor Author

bastnic commented Jan 31, 2022

Hi @lwillems, in my todo list, it's right in the bottom :p. It works for now so and looking at your cardinality, it should work for you too.

@tjveldhuizen
Copy link

I'm facing issues with large cache-tags header too, at the moment. In my case, the main cause is the length of my IRI's: using slugs makes them really long. I think it might be a solution to create an option to configure custom cache tags, next to the IRI of an API resource. If that is possible, I can shorten my cache tag from /api/news/this-is-an-article-with-a-rather-long-title to /api/news/123. Of course, situations with thousands of tags will still be problematic, but it might help in a lot of cases, already

@soyuka
Copy link
Member

soyuka commented Oct 17, 2023

we're working on this at #5758

@soyuka soyuka closed this as completed Oct 17, 2023
@vasilvestre
Copy link

we're working on this at #5758

How does this PR reduce the headers size of Cache-tags ? I'm not sure what the collector is doing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants