-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update git-sync description in Helm Chart documentation #32181
Update git-sync description in Helm Chart documentation #32181
Conversation
Hey @jedcunningham (@dstandish @ephraimbuddy ) - we've been discussing this in the past, I tried to capture all my knowledge about side-effects of using git-sync and persistence together for DAGs in the way that might (possibly) help the deployment managers to be able to choose the right approach (or discourage the use of git-sync, if they see that it is not as "straightforward" decision). This was another discussion when I had to again explain the side effects that people might not be aware of. I propose this one instead of #28822 which I am closing now (I kept it in draft and thought what is the best approach) - I think giving a bit more information and letting the users choose, while giving them a chance to learn of the consequences and warn them that they have to monitor their persistent solution if they choose to do so (and might have to bear higher cost in the future to keep it running) is a much better solution than straight forbidding it. Also more background on that in my old blog post as background refresher https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca |
There are quite a few recurring themes when it comes to using git-sync for DAG synchronisation and this documentation is an attempt to capture results of a number of discussions and conversations. It adds some notes that might make it possible to make more informed decisions by our users and Deployment managers who want to make decisions on how they should synchronize their DAGs. The changes include: * notes on potential side-effects one has to be aware when using both git-sync and persistence together (there are some unobvious operations performed by git-sync that might affect performances of persistence solutions) * notes on how you can use multiple git repositories with git-sync using submodule approach - including link to a real-life use case from Airflow summit where it has been used in production for 100s of repositories.
382a019
to
4912254
Compare
Also #32146 (comment) - this is what triggered this PR when I discussed it with - apparently rather knowledgeable - user. And the user came to the conclusion "I stop thinking about persistence...." I will copy there the findings of the user re: Azure File System:
|
:D ? |
There are quite a few recurring themes when it comes to using git-sync for DAG synchronisation and this documentation is an attempt to capture results of a number of discussions and conversations. It adds some notes that might make it possible to make more informed decisions by our users and Deployment managers who want to make decisions on how they should synchronize their DAGs. The changes include: * notes on potential side-effects one has to be aware when using both git-sync and persistence together (there are some unobvious operations performed by git-sync that might affect performances of persistence solutions) * notes on how you can use multiple git repositories with git-sync using submodule approach - including link to a real-life use case from Airflow summit where it has been used in production for 100s of repositories. (cherry picked from commit b6ca28e)
There are quite a few recurring themes when it comes to using git-sync for DAG synchronisation and this documentation is an attempt to capture results of a number of discussions and conversations. It adds some notes that might make it possible to make more informed decisions by our users and Deployment managers who want to make decisions on how they should synchronize their DAGs.
The changes include:
notes on potential side-effects one has to be aware when using both git-sync and persistence together (there are some unobvious operations performed by git-sync that might affect performance of persistence solutions)
notes on how you can use multiple git repositories with git-sync using submodule approach - including link to a real-life use case from Airflow summit where it has been used in production for 100s of repositories.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.