Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare taskgroup and subdag #20700

Merged
merged 5 commits into from
Jan 9, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 41 additions & 2 deletions docs/apache-airflow/concepts/dags.rst
Original file line number Diff line number Diff line change
Expand Up @@ -605,8 +605,47 @@ Some other tips when using SubDAGs:

See ``airflow/example_dags`` for a demonstration.

Note that :doc:`pools` are *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so
resources could be consumed by SubdagOperators beyond any limits you may have set.

.. note::

Parallelism is *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so resources could be consumed by SubdagOperators beyond any limits you may have set.



TaskGroups vs SubDAGs
----------------------

SubDAGs, while serving a similar purpose as TaskGroups, introduces both performance and functional issues due to its implementation.

* The SubDagOperator starts a BackfillJob, which ignores existing parallelism configurations potentially oversubscribing the worker environment.
* SubDAGs have their own DAG attributes. When the SubDAG DAG attributes are inconsistent with its parent DAG, unexpected behavior can occur.
* Unable to see the "full" DAG in one view as SubDAGs exists as a full fledged DAG.
* SubDAGs introduces all sorts of edge cases and caveats. This can disrupt user experience and expectation.

TaskGroups, on the other hand, is a better option given that it is purely a UI grouping concept. All tasks within the TaskGroup still behave as any other tasks outside of the TaskGroup.

You can see the core differences between these two constructs.

+--------------------------------------------------------+--------------------------------------------------------+
| TaskGroup | SubDAG |
+========================================================+========================================================+
| Repeating patterns as part of the same DAG | Repeating patterns as a separate DAG |
+--------------------------------------------------------+--------------------------------------------------------+
| One set of views and statistics for the DAG | Separate set of views and statistics between parent |
| | and child DAGs |
+--------------------------------------------------------+--------------------------------------------------------+
| One set of DAG configuration | Several sets of DAG configurations |
+--------------------------------------------------------+--------------------------------------------------------+
| Honors parallelism configurations through existing | Does not honor parallelism configurations due to |
| SchedulerJob | newly spawned BackfillJob |
+--------------------------------------------------------+--------------------------------------------------------+
| Simple construct declaration with context manager | Complex DAG factory with naming restrictions |
+--------------------------------------------------------+--------------------------------------------------------+

.. note::

SubDAG is deprecated hence TaskGroup is always the preferred choice.



wolfier marked this conversation as resolved.
Show resolved Hide resolved
Packaging DAGs
Expand Down