Skip to content

Conversation

@Arunodoy18
Copy link
Contributor

This PR implements the depends_on_previous_task_ids parameter for tasks, allowing a task to depend on the successful completion of multiple specific tasks from the previous DAG run (within the same DAG).

Problem Statement

Previously, depends_on_past=True only allowed a task to depend on the same task in the previous dag_run. There was no native way for a task to depend on multiple specific tasks from the previous dag_run.

Solution

Added a new parameter depends_on_previous_task_ids to the BaseOperator that accepts a list of task IDs. When set (along with depends_on_past=True), the task will only run if all specified tasks in the previous DAG run have succeeded.

Changes Made

1. task-sdk/src/airflow/sdk/bases/operator.py

  • Added depends_on_previous_task_ids parameter to BaseOperator.__init__()
  • Added validation to ensure depends_on_past=True when using depends_on_previous_task_ids
  • Added parameter to template_fields for templating support
  • Added comprehensive docstring

2. airflow-core/src/airflow/ti_deps/deps/prev_dagrun_dep.py

  • Extended PrevDagrunDep._get_dep_statuses() method to check dependencies on specified tasks from previous DAG run
  • Added logic to verify that all tasks in depends_on_previous_task_ids exist and have succeeded in the previous run
  • Provides detailed error messages when dependencies are not met

3. airflow-core/tests/ti_deps/deps/test_prev_dagrun_dep_specific_tasks.py

  • Added comprehensive unit tests for the new functionality
  • Tests cover success scenarios, failure scenarios, first-run behavior, and validation

Usage Example

``python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

with DAG('example_dag', start_date=datetime(2022, 1, 1), schedule_interval='@daily') as dag:
task_a = PythonOperator(task_id='task_a', python_callable=lambda: print('A'))
task_b = PythonOperator(task_id='task_b', python_callable=lambda: print('B'))

# task_c will only run if both task_a and task_b succeeded in the previous DAG run
task_c = PythonOperator(
    task_id='task_c',
    python_callable=lambda: print('C'),
    depends_on_past=True,
    depends_on_previous_task_ids=['task_a', 'task_b'],
)

``

Behavior

  • First DAG run: Not blocked (no previous run to check)
  • Subsequent runs: Task waits until all specified tasks from the previous run have succeeded
  • Validation: Raises AirflowException if depends_on_previous_task_ids is set without depends_on_past=True

Testing

  • Unit tests added in test_prev_dagrun_dep_specific_tasks.py
  • Tests validate correct behavior for:
    • First run (should not be blocked)
    • Dependencies met (all previous tasks succeeded)
    • Dependencies not met (one or more previous tasks failed)
    • Invalid configuration (missing depends_on_past=True)

Closes

Closes #60328

This metric was accidentally disabled in Airflow 3.0 with a TODO comment
referencing AIP-66. The dag_processing.last_run.seconds_ago metric continued
to work, but dag_processing.last_duration was completely missing.

This fix:
- Uncomments the Stats.timing() calls for the last_duration metric
- Updates the code to use the correct relative_fileloc parameter
- Re-enables the previously skipped test

Closes apache#60325
…pend on specific tasks from previous DAG run

This commit adds the depends_on_previous_task_ids parameter to BaseOperator,
allowing a task to depend on the successful completion of multiple specific
tasks from the previous DAG run (within the same DAG).

Previously, depends_on_past=True only allowed a task to depend on the same
task in the previous dag_run. This enhancement enables more flexible
cross-task dependencies on previous runs.

Changes:
- Added depends_on_previous_task_ids parameter to BaseOperator
- Validation ensures depends_on_past=True when using depends_on_previous_task_ids
- Extended PrevDagrunDep to check dependencies on specified tasks
- Added comprehensive unit tests

Closes apache#60328
@Arunodoy18
Copy link
Contributor Author

Hey, A review is Highly appreciated.
Cheers

@ashb
Copy link
Member

ashb commented Jan 13, 2026

Honeslty I'm not sure we want to support this -- it seems like quite an uncommon use case, and I worry about the maintainability of yet another feature, in terms of code, API surface area and proliferation of options for users.

I don't see from your usage example why this is actually needed, and why the depends_on_past won't fit?

@potiuk
Copy link
Member

potiuk commented Jan 13, 2026

I don't see from your usage example why this is actually needed, and why the depends_on_past won't fit?

Indeed @bujji8411 - as the author of the issue - can you please add more explanation about the concrete use case you had ? That would be easier to understand why it's needed and why currrent depends_on_past is needed. On slack you mentioned that you have very concrete examples of such functionality needed.

@github-actions
Copy link

This pull request has been automatically marked as stale because the author has not responded to a request for more information. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 23, 2026
@bujji8411
Copy link

Concrete use case and why depends_on_past is insufficient

Thanks for the feedback, let me clarify the concrete use case and why depends_on_past does not fit here.

Pipeline scenario:

This is a stateful file-based ingestion workflow:

extract → validate_count → stp → archive → cleanup

Key characteristics:

  1. The same source file name is used on every run (upstream system creates/overwrites it)
  2. The file must not be re-processed until the previous run has fully completed
  3. Partial success leaves the system in an inconsistent external state

Failure scenario

dag_run_1

extract → success (file created)
validate_count → success
stp → failed
archive, cleanup → not executed
External state after failure

Key points:

  1. Extracted file still exists with the same name
  2. Data may be partially processed
  3. File is neither archived nor cleaned up
  4. Why depends_on_past does not work

If we set:

stp.depends_on_past = True

Then in dag_run_2:

Only stp is blocked
extract and validate_count will still run
This re-extracts or overwrites the same file and breaks idempotency

What we actually need is:

Start extract in dag_run_2 only if
extract, validate_count, stp, archive, and cleanup
in dag_run_1 all succeeded.

This is a group-level previous-run dependency, not a single-task dependency.

Summary

  1. depends_on_past handles task-local idempotency
  2. It does not handle external state correctness
  3. The requested feature is a declarative alternative to custom sensors
  4. The behavior is opt-in and backward-compatible

@github-actions github-actions bot removed stale Stale PRs per the .github/workflows/stale.yml policy file pending-response labels Jan 24, 2026
@potiuk
Copy link
Member

potiuk commented Jan 26, 2026

I am closing all your PRs @Arunodoy18 - ydespite earlier warnings, you are not looking at your PRs, submit multiple unrelated, AI generated changes without even looking at them. If this continues to happen, we will ask ASF infra to block your user for any ASF contributions.

@potiuk potiuk closed this Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow task to depend on specific tasks from previous dag_run

5 participants