Replies: 2 comments
-
|
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
|
This is at most a discussion not feature. And taking the caliber of it, this is something that should be discussed in devlist https://lists.apache.org/list.html?dev@airflow.apache.org and result in Airflow Improvement Proposal - https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals - with all the details and consequences hashed out. Note that before attempting it, you should look at the previous attempts of proposing siumilar issues https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-20+DAG+manifest - https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher (you can see linked discussions in devlist for them) So far the history shows that those efforts were discontinued (those proposals are in DRAFT stage) - the original autors have not completed them enough to get it through voting and convince community to do so. But if you would like to have a complete proposal - you are absolutely welcome to start and lead discussion on that and hopefully get the Airflow Improvement Proposal that is voted, approved and then lead it to completion (this is how things work in the OSS project like airflow). Due to the caliber of it however, it needs to be done via devlist and AIP - not by feature or even discussion in GitHub. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
Just like the logs, it would be interesting and very useful to allow the DAGS_FOLDER & PLUGINS_FOLDER to refer to a path in any cloud provider. Ideally, a connection to the cloud provider should be created only when the DAGs are being parsed. A possible implementation I thought about is to create temp dirs with the downloaded contents of the DAGS_FOLDER & PLUGINS_FOLDER buckets, refer to these temp dirs by DAGS_FOLDER & PLUGINS_FOLDER and every time that the dags are parsed, the contents from the cloud are downloaded to other dynamically-created temp dir and compared with the ones in use, replacing eventual dags or plugins that may have changed.
That is
Use case/motivation
First motivation is that it would make it easier to deal with airflow on kubernetes, where both the scheduler and worker need to have access to up-to-date dags & plugins folders and as of now it is not straightforward to set it up, specially considering that the current best approach involves gitSync, which sometimes may not work due to some blocks from the company's cluster (which is my case). By having those folders reachable from a cloud provider, airflow setup becomes easier as a whole.
Second motivation is that it is a very elegant way of decoupling airflow into infrastructure & components. Many organisations have a unified git repo containing airflow dags & plugins plus infra-specific files (Dockerimage, Docker compose yaml file, .txt files listing libraries, etc), and it would be nice to at least have the possibility of separating that.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions