Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-2923][AIRFLOW-1784] Implement LatestOnlyOperator as BaseBranchOperator #5970

Merged
merged 1 commit into from
Jan 27, 2020

Conversation

m1racoli
Copy link
Contributor

LatestOnlyOperator is a special case of a BranchOperator, thus it should inherit from it.
This fixes an issue where the skipping behaviour of LatestOnlyOperator is inconsistent with other operators,
by forcefully skipping all downstream tasks recursively ignoring trigger rules.

Make sure you have checked all steps below.

Jira

Description

  • Here are some details about my PR, including screenshots of any UI changes:

LatestOnlyOperator is a special case of a BranchOperator, thus it should inherit from it.
This fixes an issue where the skipping behaviour of LatestOnlyOperator is inconsistent with other operators by forcefully skipping all downstream tasks recursively, ignoring trigger rules.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Extended tests.operators.test_latest_only_operator.py to cover downstream children with trigger rules. Furthermore fixed the test to not skip task in externally triggered DagRuns.

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
    • If you implement backwards incompatible changes, please leave a note in the Updating.md so we can assign it to a appropriate release

Code Quality

  • Passes flake8

Copy link
Member

@feluelle feluelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good to me but can you also update the docs accordingly? :)

Your changes on the one hand are fixing the issues but on the other hand are changing the expected behavior related to the docs

task3 is downstream of task1 and task2 and because of the default trigger_rule being all_success will receive a cascaded skip from task1. task4 is downstream of task1 and task2. It will be first skipped directly by LatestOnlyOperator, even its trigger_rule is set to all_done.

(https://airflow.apache.org/concepts.html#latest-run-only)
In both cases it should skip downstream tasks regardless of its trigger_rules.

Can you also fix its pylint issues if there are any. It is currently not being checked by pylint due to pylint_todo.txt. Do you mind removing the related lines in there (latest_only_operator and test_latest_only_operator) and fix issues if there are any? :) Thanks.

@feluelle
Copy link
Member

I think it would also be nice if we have some documentation about the expected behavior of dagruns executed with external_trigger=True.

@OmerJog
Copy link
Contributor

OmerJog commented Nov 17, 2019

@m1racoli are you working on this PR?

@m1racoli
Copy link
Contributor Author

Hey @OmerJog,
I just came back from vacation. Yes, I'll work on this PR. I should find some time in the next weeks.

@stale
Copy link

stale bot commented Jan 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 1, 2020
@VigoTen
Copy link

VigoTen commented Jan 4, 2020

The LatestOnlyOperator is indeed inconsistent with how other operators handle the tasks downstream i'm really looking forward for it to be fixed

@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 7, 2020
@m1racoli
Copy link
Contributor Author

m1racoli commented Jan 7, 2020

Documentation updated and pylint issues fixed.

@codecov-io
Copy link

codecov-io commented Jan 7, 2020

Codecov Report

Merging #5970 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5970      +/-   ##
==========================================
+ Coverage   85.34%   85.34%   +<.01%     
==========================================
  Files         791      791              
  Lines       40128    40127       -1     
==========================================
+ Hits        34247    34248       +1     
+ Misses       5881     5879       -2
Impacted Files Coverage Δ
airflow/operators/latest_only_operator.py 100% <100%> (+10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c890d06...eba7aaf. Read the comment docs.

@feluelle
Copy link
Member

@m1racoli do you think we should add a note to the Updating.md so that people are aware that now the trigger rules will be checked correctly/differently ?

``task2``. It will be first skipped directly by ``LatestOnlyOperator``,
even its ``trigger_rule`` is set to ``all_done``.
``task2``, but it will not be skipped, since its ``trigger_rule`` is set to
``all_done``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be added to the Updating.md (see my comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something users should be aware of when updating Airflow. Don't you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @feluelle , let's add this to Updating.md

Good work @m1racoli

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and thanks! Will update Updating.md.

@m1racoli m1racoli force-pushed the feature/AIRFLOW-2923 branch 2 times, most recently from 3dfafa2 to eba7aaf Compare January 22, 2020 21:56
@m1racoli
Copy link
Contributor Author

Comments for updating have been added.

UPDATING.md Outdated

In previous versions the `LatestOnlyOperator` forcefully skippded all (direct and undirect) downstream tasks on it's own. From this version on the operator will **only skip direct downstream** tasks and the scheduler will handle skipping any further downstream dependencies.

No change is needed, if only the default trigger rule `all_success` is beeing used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
No change is needed, if only the default trigger rule `all_success` is beeing used.
No change is needed if only the default trigger rule `all_success` is being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

UPDATING.md Outdated
@@ -943,6 +943,16 @@ The following metrics are deprecated and won't be emitted in Airflow 2.0:
- `dag.loading-duration.<basename>` -- use `dag_processing.last_duration.<basename>` instead
- `dag_processing.last_runtime.<basename>` -- use `dag_processing.last_duration.<basename>` instead

### Changes to skipping behaviour of LatestOnlyOperator

In previous versions the `LatestOnlyOperator` forcefully skippded all (direct and undirect) downstream tasks on it's own. From this version on the operator will **only skip direct downstream** tasks and the scheduler will handle skipping any further downstream dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In previous versions the `LatestOnlyOperator` forcefully skippded all (direct and undirect) downstream tasks on it's own. From this version on the operator will **only skip direct downstream** tasks and the scheduler will handle skipping any further downstream dependencies.
In previous versions, the `LatestOnlyOperator` forcefully skipped all (direct and undirect) downstream tasks on its own. From this version on the operator will **only skip direct downstream** tasks and the scheduler will handle skipping any further downstream dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…chOperator

LatestOnlyOperator is a special case of a BranchOperator, thus it should inherit from it.
This fixes an issue where the skipping behaviour of LatestOnlyOperator is inconsistent with other operators,
by forcefully skipping all downstream tasks recursively ignoring trigger rules.
@kaxil kaxil merged commit b568f74 into apache:master Jan 27, 2020
@kaxil
Copy link
Member

kaxil commented Jan 27, 2020

Good work @m1racoli

@m1racoli m1racoli deleted the feature/AIRFLOW-2923 branch January 30, 2020 12:35
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…chOperator (apache#5970)

LatestOnlyOperator is a special case of a BranchOperator, thus it should inherit from it.
This fixes an issue where the skipping behaviour of LatestOnlyOperator is inconsistent with other operators,
by forcefully skipping all downstream tasks recursively ignoring trigger rules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants