Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets - trigger DAG run when any dataset is updated #34534

Open
2 tasks done
westonplatter opened this issue Sep 21, 2023 · 7 comments
Open
2 tasks done

Datasets - trigger DAG run when any dataset is updated #34534

westonplatter opened this issue Sep 21, 2023 · 7 comments
Assignees
Labels
area:datasets Issues related to the datasets feature area:Scheduler Scheduler or dag parsing Issues kind:feature Feature Requests

Comments

@westonplatter
Copy link

westonplatter commented Sep 21, 2023

Description

On this doc page, Airflow explicitly says all datasets need to be updated before a DAG runs is triggered.

When using datasets, in this first release (v2.4) waiting for all datasets in the list to be updated is the only option when multiple datasets are consumed by a DAG. A later release may introduce more fine-grained options allowing for greater flexibility.

I would like to see the schedule param logic configurable so that all or any dataset being updated triggers a DAG run.

Use case/motivation

Configurable so that a DAG run is triggered when either all or any dataset is updated.

To keep the API consistent, code changes would keep all as the default functionality.

Related issues

Tangential, but not directly related

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@westonplatter westonplatter added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Sep 21, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 21, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@Amar1404
Copy link

I think so We need it partial Aware Scheduling here. Let say we are dependent on 3 dataset.
Let say we update each dataset every hour. And we want one dataset should be partial dataset. Like lets say it was updated last if updated upto 6 hours ago then it is fine go ahead and trigger the dag. Otherwise some callback.

It will be a great feature

@jscheffl jscheffl added area:Scheduler Scheduler or dag parsing Issues area:datasets Issues related to the datasets feature and removed needs-triage label for new issues that we didn't triage yet labels Sep 25, 2023
@yermalov-here
Copy link
Contributor

(in case it helps) Directly related:
closed PR - #28333
discussion - #28253

@dstandish
Copy link
Contributor

@sunank200 and I are working on this right now actually

@harveymarshall
Copy link

@dstandish has there been on movement on this issue? I will be following closely.

@gabrielrmn
Copy link

any updates here?
looking forward for this improvement!

@dstandish
Copy link
Contributor

Yes there's a number of PRs that have been merged for this but this is the one that show you the syntax #37101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:datasets Issues related to the datasets feature area:Scheduler Scheduler or dag parsing Issues kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

8 participants