Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't ignore legacy concurrency dag parameter #18730

Merged
merged 1 commit into from
Oct 5, 2021
Merged

Conversation

zbstof
Copy link
Contributor

@zbstof zbstof commented Oct 5, 2021

Currently, even if legacy concurrency dag parameter is specified, it is always ignored because max_active_tasks is always initialized from core.max_active_tasks_per_dag.
In our case this caused unexpected throttling of the task concurrency on production and performance issues.

concurrency parameter should always be used, if provided, as this preserves backward compatibility for users performing the migration.

Tested locally:

DAG(
    dag_id='xxx',
...
    concurrency=1,
)

Before fix:

{scheduler_job.py:413} INFO - DAG xxx has 1/16 running and queued tasks

After fix:

{scheduler_job.py:413} INFO - DAG xxx has 1/1 running and queued tasks

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be included in Airflow 2.2.0 which should be released next week.

@kaxil kaxil added this to the Airflow 2.2.0 milestone Oct 5, 2021
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 5, 2021
@github-actions
Copy link

github-actions bot commented Oct 5, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@kaxil
Copy link
Member

kaxil commented Oct 5, 2021

@zbstof While the fix is correct the description was not, I just updated it

If you check the diff in this PR, it is not max_active_tasks that has been removed, which has not been released in an Airflow version. The description said max_active_runs_per_dag which is unrelated and affects DagRuns not number of running tasks.

@kaxil
Copy link
Member

kaxil commented Oct 5, 2021

Check the following line for instance from 2.1.4 -- so I am still not sure why this affected your production system if you were running 2.1.4 -- Are you sure you are not running from 2.2.0b2? :

https://github.com/apache/airflow/blob/2.1.4/airflow/models/dag.py#L281

self._concurrency = concurrency

@zbstof
Copy link
Contributor Author

zbstof commented Oct 5, 2021

While the fix is correct the description was not, I just updated it

Thank you, my mistake. I copied the wrong line when compiling the description

so I am still not sure why this affected your production system if you were running 2.1.4

You're completely right, this has nothing to do with 2.1 branch, edited the description. We're using v2.2.0.dev0. I didn't check where the issue was introduced and assumed it was in 2.1 branch.

Thank you for the corrections!

@ashb ashb added the kind:bug This is a clearly a bug label Oct 5, 2021
@kaxil
Copy link
Member

kaxil commented Oct 5, 2021

Thank you, my mistake. I copied the wrong line when compiling the description

No worries, thanks for the PR :)

@ashb
Copy link
Member

ashb commented Oct 5, 2021

Retriggering CI.

@kaxil
Copy link
Member

kaxil commented Oct 5, 2021

(rebased on main to fix issues with dependency -- fixed by #18695)

@zbstof
Copy link
Contributor Author

zbstof commented Oct 5, 2021

Not sure if I should (could?) do something to fix Build Images / Build CI images 3.7 🤔

@kaxil
Copy link
Member

kaxil commented Oct 5, 2021

Not sure if I should (could?) do something to fix Build Images / Build CI images 3.7 🤔

No, don't worry about that, we will take care of it.

@kaxil kaxil closed this Oct 5, 2021
@kaxil kaxil reopened this Oct 5, 2021
Currently, even if legacy `concurrency` dag parameter is specified, it is always ignored because `max_active_runs` is always initialized from `core.max_active_runs_per_dag`.

`concurrency` parameter should always be used, if provided, as this preserves backward compatibility for users performing the migration.
For this reason, this patch should be backported to 2.1 branch as well.

Tested locally:
```
DAG(
    dag_id='xxx',
...
    concurrency=1,
)
```
Before fix:
```
{scheduler_job.py:413} INFO - DAG xxx has 1/16 running and queued tasks
```
After fix:
```
{scheduler_job.py:413} INFO - DAG xxx has 1/1 running and queued tasks
```
@kaxil kaxil merged commit 1697617 into apache:main Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
full tests needed We need to run full set of tests for this PR to merge kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants