Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a "Skip" or "Do not run" option for a list of Tasks Instances #262

Closed
r39132 opened this issue Aug 14, 2015 · 4 comments
Closed

Comments

@r39132
Copy link
Contributor

r39132 commented Aug 14, 2015

Currently, we are able to mark a DAG instance (a.k.a. run) as successful -- see the image below. This keeps it from being run by the scheduler and simultaneously conveys that this DAG instance or run was successful. We find that sometimes a run is not successful but rerunning it does not make sense. Sure, we could let the run eventually error out, but what if there are 30 days of runs that have this problem and we have retries set to 3? We don't want to wait for them to all error out, generating error notification emails and unnecessarily doing runs that will eventually fail.

I would like a "Do Not Run" or "Skip" button with the same options as "clear" -- do not run past/future/upstream/downstream.

This will allow me to skip broken downstream parts of DAGs for a date range.

This also provides an important option to ops, instead of relying on a developer who developed the DAG, with respect to managing the execution of DAGs.

screenshot 2015-08-14 14 07 46

@thibault-ketterer
Copy link
Contributor

won't it be solved by "only_run_latest" #59 ?

@r39132
Copy link
Contributor Author

r39132 commented Mar 16, 2016

@thibault-ketterer It won't. The goals are different.

Only_run_latest will be used for a job like a daily db snapshot job. Typically, this type of job needs to run once a day. If it doesn't run for 3 days because the DAG was paused, when the DAG is unpaused, the job should not run 3 times. It should only run for the latest time interval.

What I propose above is to mark entries manually as "skipped or do not run" so that the scheduler or backfill job does not attempt them. This could be because we know there is a data problem in the time range.

@r39132
Copy link
Contributor Author

r39132 commented Mar 16, 2016

This depends on #1155

@pkexcellent
Copy link

Hi, @r39132
I want to ask if the feature of only_run_latest is exist in current release of Airflow? I need to use this feature, How should I do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants