Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-6856] BugFix: Paused Dags still Scheduled #7578

Merged
merged 1 commit into from
Feb 28, 2020

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Feb 28, 2020

#7476 introduced a bug due to which Paused Dags where still Scheduled.

The Bug is the following query returns a list of sets:

Query:

paused_dag_ids = (
            session.query(DagModel.dag_id)
            .filter(DagModel.is_paused.is_(True))
            .filter(DagModel.dag_id.in_(dagbag.dag_ids))
            .all()
        )

Result:

[('example_bash_operator',)]

Hence in _find_dags_to_process() (below):

        if len(self.dag_ids) > 0:
            dags = [dag for dag in dags
                    if dag.dag_id in self.dag_ids and
                    dag.dag_id not in paused_dag_ids]
        else:
            dags = [dag for dag in dags
                    if dag.dag_id not in paused_dag_ids]
        return dags

following happens:

dags = [dag for dag in dags if "example_bash_operator" not in [('example_bash_operator',)] ] 

This evaluates to false. Instead paused_dag_ids should be {'example_bash_operator'} (A set) or just a list ['example_bash_operator']

Simplified Problem and solution:

In [1]: a = [('example_bash_operator',)]

In [2]: b = 'example_bash_operator'

In [3]: set(aa for aa in a)
Out[3]: {('example_bash_operator',)}

In [4]: b in a
Out[4]: False

In [5]: set(aa for aa, in a)
Out[5]: {'example_bash_operator'}

In [6]: b in set(aa for aa, in a)
Out[6]: True

Issue link: AIRFLOW-6856

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@kaxil kaxil requested review from ashb and mik-laj February 28, 2020 03:43
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Feb 28, 2020
Copy link
Member

@zhongjiajie zhongjiajie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found out this bug just now

@kaxil kaxil merged commit d7f7a28 into apache:master Feb 28, 2020
@kaxil kaxil deleted the AIRFLOW-6856]--bugfix branch February 28, 2020 10:31
@mik-laj
Copy link
Member

mik-laj commented Feb 28, 2020

Should we add tests to avoid regression?

@kaxil
Copy link
Member Author

kaxil commented Feb 28, 2020

Should we add tests to avoid regression?

Created a new PR: #7587 for it

galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants