Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a configuration option to enable Airflow to look for DAGs in a specified S3 bucket. #8657

Closed
DmitryRusakovKodiak opened this issue Apr 30, 2020 · 3 comments
Labels
invalid kind:feature Feature Requests

Comments

@DmitryRusakovKodiak
Copy link

Description
Add a configuration option to use S3 as a DAGs storage/provider.

Use case / motivation
Currently Airflow assumes that all DAGs are stored in /some/path/dags folder which is present on a file system of web ui, scheduler and workers components. As a consequence DAGs are tight coupled with Airflow itself what makes the independent deployment of Airflow and DAGs quite complicated. It would be great to have a configuration option which enable Airflow to look for DAGs in a specified S3 bucket.

@DmitryRusakovKodiak DmitryRusakovKodiak added the kind:feature Feature Requests label Apr 30, 2020
@ismailsimsek
Copy link

ismailsimsek commented May 3, 2020

related to [AIRFLOW-2221] Create DagFetcher abstraction #3138

@mik-laj
Copy link
Member

mik-laj commented Aug 9, 2020

Such a feature is not planned. It is recommended to set up a separate process (e.g. sidecar) that will be responsible for file synchronization. However, I would be happy if there was a guide in the documentation for this.

@potiuk
Copy link
Member

potiuk commented Aug 9, 2020

Yep. There was an extensive discussion about it at the devlist https://lists.apache.org/thread.html/224d1e7d1b11e0b8314075f21b1b81708749f2899f4cce5af295e8a8%40%3Cdev.airflow.apache.org%3E and there is a long discussion in the wiki page: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher. I believe the current result of the discussion is that Airflow should not implement DAG fetcher for now and (as @mik-laj mentioned) it can be done with side-cars rather easily (and in the way that will be good for particular cases). Another option will be to add some options in the Helm Chart where we have side-cars already and git-sync is implemented as one. Adding and S3 side-car there might be a good idea.

@DmitryRusakovKodiak @ismailsimsek - since you are interested in it - please feel free to open an issue for Helm Chart extension (or even donate one) or re-open a discussion in the devlist, but for now I am closing this one.

@potiuk potiuk closed this as completed Aug 9, 2020
@potiuk potiuk added the invalid label Aug 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid kind:feature Feature Requests
Projects
None yet
Development

No branches or pull requests

4 participants