Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a way for users to clear cache keys #10494

Open
3 tasks done
EmilRex opened this issue Aug 24, 2023 · 7 comments
Open
3 tasks done

Expose a way for users to clear cache keys #10494

EmilRex opened this issue Aug 24, 2023 · 7 comments
Labels
enhancement An improvement of an existing feature from:sales Submitted by a sales engineer needs:design Blocked by a need for an implementation outline

Comments

@EmilRex
Copy link
Contributor

EmilRex commented Aug 24, 2023

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar request and didn't find it.
  • I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

It is often the case that when users first use Prefect's task caching feature, they do not set a cache expiration. This means that there may be unwanted cached results for all future runs of a flow. Locally this can be overcome by resetting the database (prefect server database reset), but this also destroys any other metadata and is not an option with Cloud. As far as I can tell, there is not a method for clearing cache keys via the API.

Describe the proposed behavior

It would be super useful to be able to clear cache keys via the API, CLI, UI, or ideally all three. In most practical scenarios, keys need to be cleared on the flow or deployment level, not necessarily the individual key level. With that being the case, ideally cache keys could be cleared based on a flow name or a flow and deployment name combo.

Example Use

As an illustration of the above:

prefect flow clear-cache --name "my-flow"

prefect deployment clear-cache --name "my-flow/my-deployment"

Additional context

No response

@EmilRex EmilRex added enhancement An improvement of an existing feature needs:triage Needs feedback from the Prefect product team labels Aug 24, 2023
@billpalombi billpalombi added needs:design Blocked by a need for an implementation outline and removed needs:triage Needs feedback from the Prefect product team labels Aug 24, 2023
@OptimeeringBigya
Copy link

From what I know cache keys are not bound to flows or deployments.

Additionally, there is not API to check if cache key is still valid (i.e. not expired).

@ymtricks
Copy link

ymtricks commented Sep 11, 2023

The problem with current cache behavior is that cache keys are not unique. They are essentially just tags on the task results - there can be multiple results with the same key and different TTLs or no TTL at all. BTW this could be better documented and explained.
Because of this design, there's no way to evict a cache entry via just running a task and that's why at least some complementary mechanism is required. The way we are working around the issue is by appending a "cache version" to each key, so whenever we want to evict old cache we just bump the version.

@mgsnuno
Copy link

mgsnuno commented Oct 11, 2023

@ymtricks in a similar way we ended up using this to mitigate our caching issues:

from prefect.context import FlowRunContext
from prefect.tasks import task_input_hash


def _cache_key_fn(context, parameters):
    flow_run = FlowRunContext.get().flow_run
    cache_key = (
        f"{context.task.name}-{flow_run.flow_id}-{flow_run.flow_version}-"
        f"{flow_run.deployment_id}-{task_input_hash(context, parameters)}"
    )
    return cache_key

@j-tr
Copy link
Contributor

j-tr commented Nov 1, 2023

It would be extremely helpful to have some functionality that also clears the remote storage for cleared and expired cache keys. we are piling up significant amounts of cache data in an s3 bucket and there's no way to delete it without risking to run into the issue outlined in #8892

@limx0
Copy link

limx0 commented Nov 8, 2023

I would like to see this implemented (and preferably a way to clear individual task keys also).

@N-Demir
Copy link
Contributor

N-Demir commented Jan 13, 2024

+1, cache management is very difficult in prefect and it makes using caching basically impossible. And, worst of all, you don't realize the scale of the limitations until you're heavily using it.

@taylor-curran taylor-curran added the from:sales Submitted by a sales engineer label Jan 15, 2024
@Ben-Epstein
Copy link

Adding to this, something I've noticed, which is a bit confusing in the prefect cloud case, is that the task cache seems to be bifurcated between two places

  1. the prefect cloud database (which we have no control over)
  2. our specified cache storage location (we control)

One might think that, in order to clear the cache, one could delete the cache data in the cache storage location (say, s3, for example). But if you do that, prefect will

  1. check the cache key in the prefect cloud db
  2. see a cache hit, check s3
  3. no data found in s3
  4. raise exception

Since we can't control the database, one simple solution would be to change the behavior (or enable alternative behavior) such that if there's no data in the specified cache location, it invalidates the cache. This would align more closely with an actual cache, such that it would be a "cache miss". Since the cache has this bifurcation, giving the user control over a cache miss would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement of an existing feature from:sales Submitted by a sales engineer needs:design Blocked by a need for an implementation outline
Projects
None yet
Development

No branches or pull requests

10 participants