Skip to content

Optimize TaskGroup.topological_sort for reverse-declared Dags#67688

Open
shahar1 wants to merge 2 commits into
apache:mainfrom
shahar1:optimize-dag-topological-sort-rebase
Open

Optimize TaskGroup.topological_sort for reverse-declared Dags#67688
shahar1 wants to merge 2 commits into
apache:mainfrom
shahar1:optimize-dag-topological-sort-rebase

Conversation

@shahar1
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 commented May 29, 2026

Further optimization of TaskGroup.topological_sort to handle reverse-declared Dags (where many children are declared before their dependencies) efficiently. This is the follow-up to PR #67288.

Summary

Addresses the O(N²) worst-case behavior of the greedy-sweep approach on adversarial Dag shapes such as reverse-insertion chains.

Uses a hybrid strategy:

  • Forward-declared Dags (common case): greedy multi-pass sweep, O(V + E)
  • Reverse-declared Dags (many back edges): pass-number traversal via Kahn's algorithm, O((V + E) log V)
  • Mixed Dags where independent children dilute the ratio: the dispatcher also flips once the group has at least 32 back-edge nodes, which catches the padded reverse-chain case raised in review

Both approaches emit the same order: level-by-legacy-pass, ties broken by insertion order.

Benchmark Results

Run the benchmark with: uv run --project task-sdk python dev/bench_topological_sort_comparison.py

See the gist for the benchmark script.

Reverse-Chain Speedup (Worst Case)

N Sweep-only (ms) Hybrid (ms) Speedup
100 0.29 0.07 4.35x
500 6.01 0.43 13.88x
1000 24.15 0.75 32.06x
2000 96.79 1.77 54.58x

Padded Reverse-Chain (Review Case)

Shape Sweep-only (ms) Hybrid (ms) Speedup
1000 reverse + 1000 independent 24.35 1.18 20.72x

Performance Progression

The dispatcher now switches when a group is clearly back-heavy either by ratio or by an absolute back-edge count, so padded reverse-declared Dags no longer fall back to the quadratic sweep.

Test Plan

  • Reran the benchmark from the gist after rebasing onto main
  • Added a reverse-declared order-equivalence regression that asserts _sort_via_pass_numbering matches _sweep_projection
  • Added padded reverse-chain dispatch regressions for both TaskGroup and SerializedTaskGroup
  • uv run --project task-sdk pytest task-sdk/tests/task_sdk/definitions/test_taskgroup.py -k 'reverse_declared_order_matches_sweep or padded_reverse_chain_uses_pass_numbering or topological_sort_shape_correctness' -xvs
  • uv run --project airflow-core pytest airflow-core/tests/unit/utils/test_task_group.py -k 'serialized_padded_reverse_chain_uses_pass_numbering or topological_sort_serialized_layered' -xvs

Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Haiku 4.5)

Generated-by: Claude Code (Haiku 4.5) following the guidelines

@shahar1 shahar1 force-pushed the optimize-dag-topological-sort-rebase branch from 645a59d to 8506a3f Compare May 29, 2026 07:50
@shahar1 shahar1 added backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch and removed backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels May 29, 2026
@shahar1 shahar1 changed the title Optimize TaskGroup.topological_sort for reverse-declared DAGs (follow-up #67288) Optimize TaskGroup.topological_sort for reverse-declared Dags May 29, 2026
@shahar1 shahar1 marked this pull request as draft May 29, 2026 07:52
@shahar1 shahar1 marked this pull request as ready for review May 29, 2026 07:56
@shahar1 shahar1 added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label May 29, 2026
pending = next_pending
return order

def _sort_via_pass_numbering(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new branch has no test pinning its emission order against the sweep. The reverse-chain cases in test_topological_sort_shape_correctness route through here, but _assert_valid_topological_order only checks the result is a valid topological sort, not that it matches what _sweep_projection emits. The order-sensitive tests (test_topological_sort1/2, test_topological_nested_groups) use forward-declared DAGs and route through the sweep, so nothing pins the "both branches produce identical order" invariant this PR rests on.

Worth a test that builds a reverse-declared DAG and asserts _sort_via_pass_numbering and _sweep_projection return identical orders. I checked equivalence empirically across ~150k random DAGs and it holds, so this is a coverage gap rather than a bug, but it's the property most likely to silently break in a future refactor.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reverse-declared regression that directly asserts _sort_via_pass_numbering() emits the same order as _sweep_projection(), so that invariant is now pinned independently of the validity-only shape tests.

if any(d > i for d in deps):
nodes_with_back_edge += 1

if nodes_with_back_edge * 2 > n:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodes_with_back_edge * 2 > n is a ratio over all children, so a long reverse chain padded with independent children silently stays on the O(N²) sweep, the exact case this PR targets.

Concretely: a reverse chain r0 >> r1 >> ... >> r_{L-1} on its own (n=L) has L-1 back-edge nodes, so (L-1)*2 > L is true and it correctly takes the fast path. Add ~L isolated or forward-declared tasks and now n≈2L while the back-edge count is still L-1, so the ratio drops below 0.5 and nodes_with_back_edge * 2 > n is false, falling back to the quadratic sweep. A 2000-node DAG with a 1000-long reverse chain plus 1000 independent tasks stays quadratic.

Worth dispatching on something the dilution can't defeat (an absolute back-edge count, or reverse-chain depth) in addition to / instead of the ratio, with a one-line comment on what shape the cutoff discriminates. Mirror the change in airflow-core/src/airflow/serialization/definitions/taskgroup.py:244 and add a padded-reverse-chain dispatch regression.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an absolute back-edge cutoff (>= 32) and padded reverse-chain regressions; the diluted review case now takes the fast path.

shahar1 added 2 commits May 29, 2026 18:05
A long reverse-declared run could fall back to the sweep when independent children diluted the back-edge ratio. Add an absolute back-edge cutoff and pin the order/dispatch invariants in both TaskGroup implementations.
@shahar1 shahar1 force-pushed the optimize-dag-topological-sort-rebase branch from 8506a3f to a948df8 Compare May 29, 2026 15:25
@shahar1 shahar1 requested a review from kaxil May 29, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:DAG-processing area:task-sdk backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants