-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a way for users to clear cache keys #10494
Comments
From what I know cache keys are not bound to flows or deployments. Additionally, there is not API to check if cache key is still valid (i.e. not expired). |
The problem with current cache behavior is that cache keys are not unique. They are essentially just tags on the task results - there can be multiple results with the same key and different TTLs or no TTL at all. BTW this could be better documented and explained. |
@ymtricks in a similar way we ended up using this to mitigate our caching issues: from prefect.context import FlowRunContext
from prefect.tasks import task_input_hash
def _cache_key_fn(context, parameters):
flow_run = FlowRunContext.get().flow_run
cache_key = (
f"{context.task.name}-{flow_run.flow_id}-{flow_run.flow_version}-"
f"{flow_run.deployment_id}-{task_input_hash(context, parameters)}"
)
return cache_key |
It would be extremely helpful to have some functionality that also clears the remote storage for cleared and expired cache keys. we are piling up significant amounts of cache data in an s3 bucket and there's no way to delete it without risking to run into the issue outlined in #8892 |
I would like to see this implemented (and preferably a way to clear individual task keys also). |
+1, cache management is very difficult in prefect and it makes using caching basically impossible. And, worst of all, you don't realize the scale of the limitations until you're heavily using it. |
Adding to this, something I've noticed, which is a bit confusing in the prefect cloud case, is that the task cache seems to be bifurcated between two places
One might think that, in order to clear the cache, one could delete the cache data in the cache storage location (say, s3, for example). But if you do that, prefect will
Since we can't control the database, one simple solution would be to change the behavior (or enable alternative behavior) such that if there's no data in the specified cache location, it invalidates the cache. This would align more closely with an actual cache, such that it would be a "cache miss". Since the cache has this bifurcation, giving the user control over a cache miss would be helpful. |
First check
Prefect Version
2.x
Describe the current behavior
It is often the case that when users first use Prefect's
task
caching feature, they do not set a cache expiration. This means that there may be unwanted cached results for all future runs of a flow. Locally this can be overcome by resetting the database (prefect server database reset
), but this also destroys any other metadata and is not an option with Cloud. As far as I can tell, there is not a method for clearing cache keys via the API.Describe the proposed behavior
It would be super useful to be able to clear cache keys via the API, CLI, UI, or ideally all three. In most practical scenarios, keys need to be cleared on the flow or deployment level, not necessarily the individual key level. With that being the case, ideally cache keys could be cleared based on a flow name or a flow and deployment name combo.
Example Use
As an illustration of the above:
Additional context
No response
The text was updated successfully, but these errors were encountered: