-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_active_runs = 1 can still create multiple active execution runs #9975
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
The problem is, we don't have a state that describes DAG Run that are saved but not running. All DAG Run have running state initially. If we want to fix this bug we have to add a new dag state. |
I am running into the exact same issue. |
The same issue here |
Would someone be able to test if this specific case still happens on Airflow 2.0.0alpha1? (A few things about how we created DagRuns changed so this might have been fixed, but I didn't specifically set out to fix this. |
Read the reproduction steps, and this bit sounds bang on:
Yes, looking at the code that sounds right, and also hasn't changed in 2.0.0alpha1, the same logic is used. |
Same issue here. Causing a lot of issues for my backfills... |
@natejenkins21 Can you provide reproduction steps please? |
For what it's worth, this doesn't seem to be an issue on I created this DAG, switched it on and allowed it to catchup and then cleared the status of all tasks.
As you can see, all the tasks are now running at the same time. |
@nathadfield Does it occur without clearing tasks instances? |
Yes. I removed |
@nathadfield What metadata DB do you use? Just wondering if it is related to #13278 |
Cool, I am looking at this today and tomorrow |
This also happens when triggering DAGs manually with I've also tried creating the run with state Here's the code for two DAGs to reproduce this: https://gist.github.com/kamac/7112af78f1a9004142903d4fe6e387d4 |
Right now the only states for Dag Runs are None, "running", or "failed" -- that's why the scheduler is never picking up that dag run. |
@ashb I've tried creating DagRuns with state I'm on airflow 1.10.13, SequentialExecutor |
🤔 1.10.13 ?? Just noticed the original issue creator was on 1.10.11 so this is not a regression in 2.0 -- I though I read somewhere that it worked correctly for you @nathadfield on 1.10.x -- can you confirm please. |
@kaxil It is definitely an issue on 1.10.14 as that is what we're running on our production system at the moment. It does also seem to be an issue on 2.0 that I've got running on a locally running dev instance of our setup, but I could not replicate the problem on |
I might ping you on Slack for more ad-hoc discussion, I was able to replicate it on Master too: #9975 (comment) |
@kaxil Is this likely to only be fixed in 2.1 or might we also see it in a new 1.10 version for those people who are affected by this issue but cannot (or don't want to) move to 2.0 yet? |
This issue is affecting 1.10.14. Upgrading to 2.0 is not possible for us at this time since we have a number of internal blockers on plugins that we wrote that require refactoring. We cleared a DAG run for 2019-01-01 and selected the all future tasks option, We did not experience this issue in 1.10.9. Instead it is now trying to run a single task for each of the active runs in parallel instead of waiting for the first run to complete. What I expect to happen is for it to clear the tasks and then complete each run sequentially, our system requires data be loaded in that order, having multiple parallel runs is causing huge issues for us. |
looks like max_active_runs was broken further in 1.10.13 -- This will be fixed in 1.10.15 by #13803 . However like I mentioned in #9975 (comment) not all use-cases will be fixed. Complete fix will require adding a new state to DagRun -- 'queued' -- similar to Taskinstance |
This change adds queued state to DagRun. Newly created DagRuns start in the queued state, are then moved to the running state after satisfying the DAG's max_active_runs. If the Dag doesn't have max_active_runs, the DagRuns are moved to running state immediately Clearing a DagRun sets the state to queued state Closes: #9975, #16366 (cherry picked from commit 6611ffd)
This change adds queued state to DagRun. Newly created DagRuns start in the queued state, are then moved to the running state after satisfying the DAG's max_active_runs. If the Dag doesn't have max_active_runs, the DagRuns are moved to running state immediately Clearing a DagRun sets the state to queued state Closes: #9975, #16366 (cherry picked from commit 6611ffd)
This change adds queued state to DagRun. Newly created DagRuns start in the queued state, are then moved to the running state after satisfying the DAG's max_active_runs. If the Dag doesn't have max_active_runs, the DagRuns are moved to running state immediately Clearing a DagRun sets the state to queued state Closes: #9975, #16366 (cherry picked from commit 6611ffd)
This change adds queued state to DagRun. Newly created DagRuns start in the queued state, are then moved to the running state after satisfying the DAG's max_active_runs. If the Dag doesn't have max_active_runs, the DagRuns are moved to running state immediately Clearing a DagRun sets the state to queued state Closes: #9975, #16366 (cherry picked from commit 6611ffd)
…runs * EWT-797: Fixing max_active_runs to control the number of runs running simultaneously * Bumped the version Co-authored-by: Shruti Mantri <smantri@twitter.com> I referred to the git issue: apache#9975 which in its first line itself tells that the issue that happens related to max_active_runs in 1.10.14 is a separate one and is discussed here: apache#13802 This issue was resolved but the fix went in the 1.10.15 version. I took the same fix: https://github.com/apache/airflow/pull/13803/files and have applied it in this commit (5e87232). The resolution worked and I did the testing on gke devel cluster: https://airflow-smantri--devel--etl-workflow-users.service.qus1.twitter.biz/admin/airflow/tree?dag_id=bq_dal_integration_dag Prior to the fix, if the max_active_runs = 1, and if you start more than 1 run simultaneously, all of them will start their tasks, and would not honour max_active_runs configuration. With this fix, if the max_active_runs = 1, and one of the runs is going on while the second run is triggered, the second run starts with the start_date being the time when it is triggered, but the second run won't start any of its tasks until the prior run is complete.
Observing the same behavior on 2.1.2 with catchup=False, this has been blowing through my quotas. |
Is there a way to prevent the scheduler to |
@argemiront, If you are on 2.1.4 you can change this: |
thank you so much! |
What you see there is queued run. The currently active run is 1 but there's also a queued run which doesn't count as an active run. |
It is worth noting for anyone (such as us) that heavily relies on |
This is now fixed |
How can this be solved on a fully managed setup such as MWAA? MWAA only supports 1.10.12 and 2.0.2? |
@DanielMorales9 Not easily I'm afraid - by asking AWS to provide a more recent version, or use a different method than MWAA that providers quicker update cycles. |
Same problem in 2.1.3 with manually triggered dags. All of them run simultaneously. I will try 2.2.0 |
@stroykova Please let us know. (I'd try 2.2.2 than 2.2.0) |
2.2.2 is fine with this 🥳 |
If you are on an older version of airflow that has this problem you can add concurrency setting to you dag (ie concurrency = some_num ). That or depends_on_past = True |
@alexstrimbeanu Coming in here with an attitude like that is unacceptable and is not going to help your cause but I'll give you the decency of replying. You may notice that this particular issue is closed and, afaik, there isn't currently an open issue that documents this as a problem. Maybe you would like to open one and provide it will all the necessary information so that someone can replicate the scenario? |
Edit: There is a separate issue affecting max_active_runs in 1.10.14. That regression is fixed in 1.10.15.
Edit2: Version v2.1.3 contains some fixes but also contains bad regressions involving max_active_runs. Use v2.14 for the complete fixes to this issue
Edit3: Version 2.2.0 contains a fix for max_active_runs using
dags trigger
command orTriggerDagRunOperator
. #18583--
Apache Airflow version: 1.10.11, localExecutor
What happened:
I have max_active_runs = 1 in my dag file (which consists of multiple tasks) and I manually triggered a dag. While it was running, a second execution began under its scheduled time while the first execution was running.
I should note that the second execution is initially queued. It's only when the dag's 1st execution moves to the next task that the second execution actually starts.
My dag definition. The dag just contains tasks using pythonOperator.
What you expected to happen:
Only one execution should run. A second execution should be queued but not begin executing.
How to reproduce it:
In my scenario:
Anything else we need to know:
I think the second execution begins in between the task1 and task2 of execution1. I think there's a few second delay there and maybe that's when Airflow thinks there's no dag execution? That's just a guess.
Btw, this can have potentially disastrous effects (errors, incomplete data without errors, etc)
The text was updated successfully, but these errors were encountered: