Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIP-5 Remote DagFetcher #9555

Closed
ismailsimsek opened this issue Jun 28, 2020 · 4 comments
Closed

AIP-5 Remote DagFetcher #9555

ismailsimsek opened this issue Jun 28, 2020 · 4 comments
Labels
kind:feature Feature Requests

Comments

@ismailsimsek
Copy link

ismailsimsek commented Jun 28, 2020

Description
By allowing Airflow to fetch DAG files from a remote source outside the file system local to the service, this grant a much greater flexibility, eases implementation, and standardizes ways to sync remote sources of DAGs with Airflow.

Use case / motivation
deploying dag from remote (s3,git) location

Related Issues
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+DagFetcher
#3138
#8657

@ismailsimsek ismailsimsek added the kind:feature Feature Requests label Jun 28, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 28, 2020

Thanks for opening your first issue here! Be sure to follow the issue template!

@adamkozuch
Copy link

Hi, I wonder what is the status of this issue? I am interested in this. Can I for instance start to work on this or is it still to decide if we want this feature at all in airflow?

@potiuk
Copy link
Member

potiuk commented Dec 26, 2020

I believe this should be discussed at the devlist on whether and how to implement this one. This is a big change to Airflow and the current consensus is that fetching Dags is "external" to Airflow - there are multiple solutions for fetching the DAGs (Git Sync, GCS/S3 sync, shared volumes, sync sidecars in Kubernetes etc. I think we are rather far from reaching common understanding and consensus on:

  • whether we should do anything at all in Airflow
  • whether it should handle pull, push or both
  • whether it should be part of a scheduler or an external entity doing the fetching
  • should we have an API for that (for sure needed if the push model is to be supported)
  • How to deal with "code packages" (i.e. how to assure atomicity of several DAGs + dependencies)

And last but not least - how it plays together with Dag Versioning.

DAG versioning which is another AIP, much closer to being fully fleshed out and it is much closer to reach the consensus - it was dropped from 2.0 release only because we wanted to make sure we deliver 2.0 this year. Some of the questions there (especially atomicity of changes in several dependent files) are common between DAGFetcher and DAGVersioning and need to be answered together I believe.

@eladkal
Copy link
Contributor

eladkal commented May 28, 2022

Closing as this is a discussion for the mailing list not for Github issue.

@eladkal eladkal closed this as completed May 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants