Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update git-sync description in Helm Chart documentation #32181

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Jun 27, 2023

There are quite a few recurring themes when it comes to using git-sync for DAG synchronisation and this documentation is an attempt to capture results of a number of discussions and conversations. It adds some notes that might make it possible to make more informed decisions by our users and Deployment managers who want to make decisions on how they should synchronize their DAGs.

The changes include:

  • notes on potential side-effects one has to be aware when using both git-sync and persistence together (there are some unobvious operations performed by git-sync that might affect performance of persistence solutions)

  • notes on how you can use multiple git repositories with git-sync using submodule approach - including link to a real-life use case from Airflow summit where it has been used in production for 100s of repositories.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jun 27, 2023

Hey @jedcunningham (@dstandish @ephraimbuddy ) - we've been discussing this in the past, I tried to capture all my knowledge about side-effects of using git-sync and persistence together for DAGs in the way that might (possibly) help the deployment managers to be able to choose the right approach (or discourage the use of git-sync, if they see that it is not as "straightforward" decision).

This was another discussion when I had to again explain the side effects that people might not be aware of.

I propose this one instead of #28822 which I am closing now (I kept it in draft and thought what is the best approach) - I think giving a bit more information and letting the users choose, while giving them a chance to learn of the consequences and warn them that they have to monitor their persistent solution if they choose to do so (and might have to bear higher cost in the future to keep it running) is a much better solution than straight forbidding it.

Also more background on that in my old blog post as background refresher https://medium.com/apache-airflow/shared-volumes-in-airflow-the-good-the-bad-and-the-ugly-22e9f681afca

There are quite a few recurring themes when it comes to using
git-sync for DAG synchronisation and this documentation is an
attempt to capture results of a number of discussions and
conversations. It adds some notes that might make it possible to
make more informed decisions by our users and Deployment managers
who want to make decisions on how they should synchronize their
DAGs.

The changes include:

* notes on potential side-effects one has to be aware when
  using both git-sync and persistence together (there are some
  unobvious operations performed by git-sync that might affect
  performances of persistence solutions)

* notes on how you can use multiple git repositories with git-sync
  using submodule approach - including link to a real-life use
  case from Airflow summit where it has been used in production
  for 100s of repositories.
@potiuk potiuk force-pushed the update-git-sync-description-in-chart branch from 382a019 to 4912254 Compare June 27, 2023 12:38
@potiuk
Copy link
Member Author

potiuk commented Jun 27, 2023

Also #32146 (comment) - this is what triggered this PR when I discussed it with - apparently rather knowledgeable - user.

And the user came to the conclusion "I stop thinking about persistence...."

I will copy there the findings of the user re: Azure File System:

Azure File shares got 62% fail rate in pjdfstest...
The failures can be classified as below:

image

The shared folder and file has mode 777, owner is root, cannot be changed.
And support neither hard link nor symbol link.

@potiuk
Copy link
Member Author

potiuk commented Jun 29, 2023

:D ?

@eladkal eladkal added this to the Airflow Helm Chart 1.11.0 milestone Jun 30, 2023
@potiuk potiuk merged commit b6ca28e into apache:main Jun 30, 2023
42 checks passed
@potiuk potiuk deleted the update-git-sync-description-in-chart branch June 30, 2023 09:14
potiuk added a commit that referenced this pull request Jul 2, 2023
There are quite a few recurring themes when it comes to using
git-sync for DAG synchronisation and this documentation is an
attempt to capture results of a number of discussions and
conversations. It adds some notes that might make it possible to
make more informed decisions by our users and Deployment managers
who want to make decisions on how they should synchronize their
DAGs.

The changes include:

* notes on potential side-effects one has to be aware when
  using both git-sync and persistence together (there are some
  unobvious operations performed by git-sync that might affect
  performances of persistence solutions)

* notes on how you can use multiple git repositories with git-sync
  using submodule approach - including link to a real-life use
  case from Airflow summit where it has been used in production
  for 100s of repositories.

(cherry picked from commit b6ca28e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants