Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execute SubDAG tasks as part of parent DAG #8078

Closed
dimberman opened this issue Apr 2, 2020 · 15 comments
Closed

Execute SubDAG tasks as part of parent DAG #8078

dimberman opened this issue Apr 2, 2020 · 15 comments
Assignees
Labels
kind:feature Feature Requests Roadmap The issues /PR that are big changes on the roadmap

Comments

@dimberman
Copy link
Contributor

Description

Currently, the SubDagOperator launches a completely different DAG and then monitors it as a separate entity. This has lead to all sorts of edge case (e.g. when workers have different executors than the scheduler).

This Issue suggests that we instead merge all tasks from the subdag into the original DAG with an extra "original_dag" label. this means at the UI level we can still condense all subdag tasks into a single task, but a single scheduler will still perform all task launching.

Use case / motivation

We want SubDag tasks to act exactly the same as non-subdag tasks

Related Issues

@dimberman dimberman self-assigned this Apr 2, 2020
@dimberman
Copy link
Contributor Author

@xinbinhuang

@xinbinhuang
Copy link
Contributor

+1

@dimberman dimberman assigned xinbinhuang and dimberman and unassigned dimberman Apr 2, 2020
@dimberman
Copy link
Contributor Author

I think it's totally reasonable to say that SubDag separation should be more of a UI feature than an execution feature, so I think if we can identify which tasks are "subdag" tasks, we can condense them in the UI.

@dimberman
Copy link
Contributor Author

@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?

@xinbinhuang
Copy link
Contributor

xinbinhuang commented Apr 2, 2020

Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.

I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the dag factory to return a list of tasks. While the SubDagOperator (this become more like a SubTasksOperator in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

@kaxil
Copy link
Member

kaxil commented Apr 2, 2020

Yeah, totally agree. I don't think the added complexity with nested dags gives more values than errors.

I actually have a more simplistic thought. Instead of returning a DAG from the dag factory, we can just ask the dag factory to return a list of tasks. While the SubDagOperator (this become more like a SubTasksOperator in this case) can be just used to add an annotation to that group of tasks that will be later used to render the DAG graph in the UI.

I like the idea.

@kaxil does this mess with DAG serialization? Would this subdag task adding happen at DAG parsing time?

No that should work fine

@xinbinhuang
Copy link
Contributor

@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.

@kaxil
Copy link
Member

kaxil commented Apr 2, 2020

@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.

Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.

@xinbinhuang
Copy link
Contributor

@kaxil @dimberman Do we need an AIP for this? This seems to change the whole behavior of the SubDagOperator or even renaming/removing it.

Yes, please. This definitely needs voting and some design discussion. Can you please create an AIP and start a discussion thread once you have that AIP doc ready.

Sounds good. I will draft an AIP within this week and open a thread for discussion.

@dimberman
Copy link
Contributor Author

@xinbinhuang SGTM! I'm gonna play around with a Draft PR just to see what I can jerry-rig, but it will be more based around "what CAN we do" than "what WILL we do"

@kaxil
Copy link
Member

kaxil commented Apr 2, 2020

@kaxil I think I need permission to create an AIP at Confluence. Can you give me permission?

Done

@turbaszek turbaszek added kind:feature Feature Requests Roadmap The issues /PR that are big changes on the roadmap labels Apr 3, 2020
@xinbinhuang
Copy link
Contributor

xinbinhuang commented Apr 7, 2020

@kaxil @turbaszek @dimberman I found this PR #5498 when I was working on the AIP draft. It fixed the SubDagOperator by using the scheduler instead of backfill, so SubDags will use the same executor as the parent dag. I guess this solves the biggest problem of SubDagOperator.

It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

@xinbinhuang
Copy link
Contributor

xinbinhuang commented Apr 7, 2020

It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat subdag as in subgraph to graph rather than a vertex as right now.

@kaxil
Copy link
Member

kaxil commented Apr 7, 2020

It seems to me that the problem has been fixed? I wonder if it is still useful for the suggested change and proceed to finish the AIP draft?

I think I will proceed with finishing the AIP and then let the community decide. I think it makes more sense to treat subdag as in subgraph to graph rather than a vertex as right now.

Yes, please do 👍

@kaxil kaxil added this to the Airflow 2.0.0 milestone Aug 24, 2020
kaxil pushed a commit that referenced this issue Sep 19, 2020
…ubDagOperator (#10153)

This commit introduces TaskGroup, which is a simple UI task grouping concept.

- TaskGroups can be collapsed/expanded in Graph View when clicked
- TaskGroups can be nested
- TaskGroups can be put upstream/downstream of tasks or other TaskGroups with >> and << operators
- Search box, hovering, focusing in Graph View treats TaskGroup properly. E.g. searching for tasks also highlights TaskGroup that contains matching task_id. When TaskGroup is expanded/collapsed, the affected TaskGroup is put in focus and moved to the centre of the graph.


What this commit does not do:

- This commit does not change or remove SubDagOperator. Although TaskGroup is intended as an alternative for SubDagOperator, deprecating SubDagOperator will need to be discussed/implemented in the future.
- This PR only implemented TaskGroup handling in the Graph View. In places such as Tree View, it will look like as-if 
- TaskGroup does not exist and all tasks are in the same flat DAG.

GitHub Issue: #8078
AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
@ashb
Copy link
Member

ashb commented Sep 23, 2020

Closing in favour of #10153

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests Roadmap The issues /PR that are big changes on the roadmap
Projects
None yet
Development

No branches or pull requests

5 participants