Fix corrupted bare Git repository recovery in DAG bundles#56206
Fix corrupted bare Git repository recovery in DAG bundles#56206potiuk merged 3 commits intoapache:mainfrom
Conversation
prdai
left a comment
There was a problem hiding this comment.
Not required, but just a thought, we could use tenacity, here instead of a custom retry loop. It might make the retry/backoff logic easier to read/maintain. For example:
@retry(stop=stop_after_attempt(3), wait=wait_exponential(), reraise=True)
def clone_bare_repo(url, path, env=None):
return Repo.clone_from(url, path, bare=True, env=env)|
cc: @jedcunningham @ephraimbuddy @kaxil @jscheffl When using gitbundles with edge workers (EdgeExectutor) it could happen that the git connection could be unstable causing git clone/ git bare clone to fail. If the bare clone is broken, all subsequent tasks on the worker fail unless you manually ssh onto the machine to tidy up the bare repo. This is an attempt to self heal Im hoping to get this in the next wave of provider release. cc: @eladkal
|
good idea! |
When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass
8465619 to
1b238ca
Compare
|
@potiuk this looks like an unrelated failure? |
1b238ca to
03de448
Compare
jscheffl
left a comment
There was a problem hiding this comment.
For me this looks good. I would have preferred to catch other AirflowExceptions directly in the PR and not reverting these - breaking change doe not apply in my eyes also it is no Dag or user code touching these exceptions.... Now some other PR needs to clean this... but anyway LGTM.
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions
* Fix corrupted bare Git repository recovery in DAG bundles When using git DAG bundles, corrupted bare repositories can cause all tasks landing on a host to fail with InvalidGitRepositoryError. This adds retry logic that detects corrupted bare repositories, cleans them up, and attempts to re-clone them once before failing. Changes: - Add InvalidGitRepositoryError handling in _clone_bare_repo_if_required() - Implement cleanup and retry logic with shutil.rmtree() - Add comprehensive tests for both successful retry and retry failure scenarios - Ensure all existing tests continue to pass * Refactor git clone retry logic to use tenacity * Ephraims suggestions

When using git DAG bundles, corrupted bare repositories can cause all tasks
landing on a host to fail with InvalidGitRepositoryError. This adds retry
logic that detects corrupted bare repositories, cleans them up, and attempts
to re-clone them once before failing.
Changes: