-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't ignore legacy concurrency
dag parameter
#18730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be included in Airflow 2.2.0 which should be released next week.
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
@zbstof While the fix is correct the description was not, I just updated it If you check the diff in this PR, it is |
Check the following line for instance from 2.1.4 -- so I am still not sure why this affected your production system if you were running 2.1.4 -- Are you sure you are not running from 2.2.0b2? : https://github.com/apache/airflow/blob/2.1.4/airflow/models/dag.py#L281 Line 281 in 614858f
|
Thank you, my mistake. I copied the wrong line when compiling the description
You're completely right, this has nothing to do with Thank you for the corrections! |
No worries, thanks for the PR :) |
Retriggering CI. |
(rebased on main to fix issues with dependency -- fixed by #18695) |
Not sure if I should (could?) do something to fix |
No, don't worry about that, we will take care of it. |
Currently, even if legacy `concurrency` dag parameter is specified, it is always ignored because `max_active_runs` is always initialized from `core.max_active_runs_per_dag`. `concurrency` parameter should always be used, if provided, as this preserves backward compatibility for users performing the migration. For this reason, this patch should be backported to 2.1 branch as well. Tested locally: ``` DAG( dag_id='xxx', ... concurrency=1, ) ``` Before fix: ``` {scheduler_job.py:413} INFO - DAG xxx has 1/16 running and queued tasks ``` After fix: ``` {scheduler_job.py:413} INFO - DAG xxx has 1/1 running and queued tasks ```
Currently, even if legacy
concurrency
dag parameter is specified, it is always ignored becausemax_active_tasks
is always initialized fromcore.max_active_tasks_per_dag
.In our case this caused unexpected throttling of the task concurrency on production and performance issues.
concurrency
parameter should always be used, if provided, as this preserves backward compatibility for users performing the migration.Tested locally:
Before fix:
After fix:
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.