Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-5500] Fix the trigger_dag api in the case of nested subdags #8081

Merged
merged 2 commits into from
Jun 5, 2020
Merged

[AIRFLOW-5500] Fix the trigger_dag api in the case of nested subdags #8081

merged 2 commits into from
Jun 5, 2020

Conversation

cBournhonesque
Copy link
Contributor

@cBournhonesque cBournhonesque commented Apr 2, 2020

Like the jira issue states (https://issues.apache.org/jira/browse/AIRFLOW-5500), the experimental api is broken when one tries to trigger a dag that contains nested subdags.

The reason for this is that calling dag.subdags returns a list of ALL the subdags for the dag (even the nested ones), instead of the direct child subdags.
(here: https://github.com/apache/airflow/blob/master/airflow/models/dag.py#L756)

My fix directly changes the trigger_dag.py, but a better solution is probably to make dag.subdags return only the child subdags. I didn't know if the current behaviour was chosen for a specific reason so I decided to just change the experimental api.


Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg
Copy link

boring-cyborg bot commented Apr 2, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://apache-airflow-slack.herokuapp.com/

@Devarajegowda
Copy link

@cBournhonesque Wanted to understand why all nested subdags are stored in dag.subdags level,
Ex: dag1-> subdag_dag1 -> subdag_subdag_dag1
dag.subdags contain both subdag_dag1 , subdag_subdag_dag1, I was wondering if it make sense to store 1st level of subdags inside dag.subdags.

@cBournhonesque
Copy link
Contributor Author

@Devarajegowda I think it would make sense to only have dags.subdags have the 1st level of subdags; but I thought that maybe this choice was made for a specific reason?
Personally, on the airflow instance I am using, I modified the dags.subdags code to only return the 1st level subdags, and everything seems to be working, so it might make more sense to just modify the code there?

@Devarajegowda
Copy link

@Devarajegowda I think it would make sense to only have dags.subdags have the 1st level of subdags; but I thought that maybe this choice was made for a specific reason?
Personally, on the airflow instance I am using, I modified the dags.subdags code to only return the 1st level subdags, and everything seems to be working, so it might make more sense to just modify the code there?

Yes even I believe so, if we make those changes change in this commit may not be needed at all.
Appreciate if others can give us some more idea on why all nested subdags are stored under dag.subdags.

@cBournhonesque
Copy link
Contributor Author

It seems like one side-effect is that clicking on 'clear tasks recursive' on the UI only clears the tasks of the direct subdags, not all nested subdags.

@samdjstephens
Copy link

I've run into this issue too.. Looking at places DAG.subdags is called:

  • set_is_paused
  • DagBag

Clearly DAG.subdags was intended to return all subdags and nested subdags, so I think fixing trigger_dag to respect that behaviour is the right way to go

@cBournhonesque
Copy link
Contributor Author

Actually, I think so too. Changing dag.subdags directly also change the behaviour of clearing a dag recursively in the UI.

@kylegao91
Copy link

Really hope this can be merged into 1.10.11, very important for our tasks.

@potiuk potiuk added this to the Airflow 1.10.11 milestone Jun 1, 2020
@potiuk
Copy link
Member

potiuk commented Jun 1, 2020

I think that one neds some better understanding about consequences of subdag triggering - it looks legit for me (and I marked it for 1.10.11) but maybe others @kaxil? @milton0825 can have a look and see if there is no problem with it. It's rather small change.

@kaxil
Copy link
Member

kaxil commented Jun 5, 2020

Can you rebase on latest master please @cBournhonesque

@cBournhonesque
Copy link
Contributor Author

@kaxil Thanks, I just did.

@kaxil kaxil merged commit 16e06f8 into apache:master Jun 5, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 5, 2020

Awesome work, congrats on your first merged pull request!

@cBournhonesque cBournhonesque deleted the fix-trigger-dag branch June 5, 2020 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants