Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare taskgroup and subdag #20700

Merged
merged 5 commits into from
Jan 9, 2022
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/apache-airflow/concepts/dags.rst
Original file line number Diff line number Diff line change
Expand Up @@ -582,13 +582,15 @@ Note that SubDAG operators should contain a factory method that returns a DAG ob
:start-after: [START subdag]
:end-before: [END subdag]


This SubDAG can then be referenced in your main DAG file:

.. exampleinclude:: /../../airflow/example_dags/example_subdag_operator.py
:language: python
:start-after: [START example_subdag_operator]
:end-before: [END example_subdag_operator]


You can zoom into a :class:`~airflow.operators.subdag.SubDagOperator` from the graph view of the main DAG to show the tasks contained within the SubDAG:

.. image:: /img/subdag_zoom.png
Expand All @@ -609,6 +611,37 @@ Note that :doc:`pools` are *not honored* by :class:`~airflow.operators.subdag.Su
resources could be consumed by SubdagOperators beyond any limits you may have set.


TaskGroups vs SubDAGs
----------------------

SubDAGs, while serving a similar purpose as TaskGroups, introduces both performance and functional issues due to its implementation.

* The SubDagOperator starts a BackfillJob, which ignores existing parallelism configurations potentially oversubscribing the worker environment.
* SubDAGs have their own DAG attributes. When the SubDAG DAG attributes are inconsistent with its parent DAG, unexpected behavior can occur.
* Unable to see the "full" DAG in one view as SubDAGs exists as a full fledged DAG.
* SubDAGs introduces all sorts of edge cases and caveats. This can disrupt user experience and expectation.

TaskGroups, on the other hand, is a better option given that it is purely a UI grouping concept. All tasks within the TaskGroup still behave as any other tasks outside of the TaskGroup.

You can see the core differences between these two constructs.

+--------------------------------------------------------+--------------------------------------------------------+
| TaskGroup | SubDAG |
+========================================================+========================================================+
| Repeating patterns as part of the same DAG | Repeating patterns as a separate DAG |
+--------------------------------------------------------+--------------------------------------------------------+
| One set of views and statistics for the DAG | Separate set of views and statistics between parent |
| | and child DAGs |
+--------------------------------------------------------+--------------------------------------------------------+
| One set of DAG configuration | Several sets of DAG configurations |
+--------------------------------------------------------+--------------------------------------------------------+
| Honours parallelism configurations through existing | Does not honour parallelism configurations due to |
| SchedulerJob | newly spawned BackfillJob |
+--------------------------------------------------------+--------------------------------------------------------+
| Simple construct declaration with context manager | Complex DAG factory with naming restrictions |
+--------------------------------------------------------+--------------------------------------------------------+


wolfier marked this conversation as resolved.
Show resolved Hide resolved
Packaging DAGs
--------------

Expand Down