Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support disabling checkpointing at the flow level #4201

Closed
marvin-robot opened this issue Mar 3, 2021 · 2 comments
Closed

Support disabling checkpointing at the flow level #4201

marvin-robot opened this issue Mar 3, 2021 · 2 comments
Labels
status:stale This may not be relevant anymore

Comments

@marvin-robot
Copy link
Member

Opened from the Prefect Public Slack Community

david.harrington: Hi we are running flows via a k8s agent and we are trying to disable checkpointing. We are using a k8s yaml template that sets the environment variable PREFECT__FLOWS__CHECKPOINTING: false for the flow. Based on the logs/errors we are seeing - the flow is still trying to serialize and write results. Two questions:

  1. if we disable checkpointing at the flow level should prefect core not attempt to serialize the result
  2. what is the appropriate way to disable checkpointing at the flow level
    Relevant traceback included as a comment from when we ran a job where the env var was set to false

david.harrington:

[2021-03-03 17:53:38+0000] ERROR - prefect.CloudTaskRunner | Unexpected error: TypeError("can't pickle _thread.lock objects")
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/prefect/engine/runner.py", line 48, in inner
    new_state = method(self, state, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/prefect/engine/task_runner.py", line 891, in get_task_run_state
    result = self.result.write(value, **formatting_kwargs)
  File "/usr/local/lib/python3.7/dist-packages/prefect/engine/results/s3_result.py", line 100, in write
    binary_data = new.serializer.serialize(new.value)

jim: Hmmm, we currently hardcode that to on, but I'm not sure why.

jim: I assume you didn't set the S3Result yourself, and its instead set automatically from S3 storage?

david.harrington: Yeah exactly

david.harrington: We got around this by setting each task to checkpoint=False

jim: We should be able to support this though. I'll raise an issue.

david.harrington: Awesome, thanks!

jim: <@ULVA73B9P> open "Support disabling checkpointing at the flow level"

Original thread can be found here.

@jcrist
Copy link

jcrist commented Mar 3, 2021

This is currently hardcoded in the task runner (

context.update(checkpointing=True)
), which makes it tricky to change. I think the original intent here was that checkpointing was off by default for local flow runs, and on otherwise.

We could move this setting to the agent (like we do for cloud logs/cloud secrets), which makes it easier to override (since user-provided environment variables/context will take precedence). The downside is that this will break things for users that are using an agent pre-this-change with a flow running prefect post-this-change (checkpointing=True will no longer be set).

@jcrist
Copy link

jcrist commented Mar 3, 2021

Ok, after some internal discussion we're going to solve this by deprecating the old checkpointing config field and renaming it to

[results]
# Whether result storage is enabled
enabled = false

In the new model this will be set via the agent (so its simpler to override), but lets us deprecate old users without breaking things.

@cicdw cicdw added the status:stale This may not be relevant anymore label Aug 31, 2022
@cicdw cicdw closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:stale This may not be relevant anymore
Projects
None yet
Development

No branches or pull requests

3 participants