Skip to content

Fix recursive stale DAG cleanup in S3 local sync#63040

Open
siewcapital wants to merge 1 commit intoapache:mainfrom
siewcapital:fix-s3dagbundle-stale-recursive-62622
Open

Fix recursive stale DAG cleanup in S3 local sync#63040
siewcapital wants to merge 1 commit intoapache:mainfrom
siewcapital:fix-s3dagbundle-stale-recursive-62622

Conversation

@siewcapital
Copy link

What this PR does

S3Hook.sync_to_local_dir(..., delete_stale=True) currently checks only the top-level entries of local_dir when deleting stale files. As a result, stale DAG files inside nested folders are never removed.

This PR makes stale cleanup recursive by traversing all descendants of local_dir (deepest path first), so stale nested files can be deleted and then their empty parent directories removed.

Why this change

This addresses #62622 where S3DagBundle keeps stale DAGs in subfolders because Path.iterdir() is not recursive.

Tests

Updated test_sync_to_local_dir_behaviour to cover nested stale cleanup:

  • creates stale/nested/dag_stale.py locally (not in S3)
  • verifies stale nested file is deleted
  • verifies both stale/nested and stale empty directories are deleted

I also validated syntax compilation for the touched files locally.

Closes #62622

@siewcapital siewcapital requested a review from o-nikolas as a code owner March 7, 2026 15:36
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 7, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

S3DagBundle does not delete stale dag recursively

2 participants