Skip to content

Add Git Sparse Checkout to Git Dag Bundle#67047

Merged
potiuk merged 3 commits into
apache:mainfrom
jscheffl:feature/add-git-sparse-to-git-bundle
May 17, 2026
Merged

Add Git Sparse Checkout to Git Dag Bundle#67047
potiuk merged 3 commits into
apache:mainfrom
jscheffl:feature/add-git-sparse-to-git-bundle

Conversation

@jscheffl
Copy link
Copy Markdown
Contributor

Git Dag Budle does currently always a full clone of the Git repo. Not good in cases when you run Airflow on a big monorepo.

This PR adds support for Git Sparse Checkout.

I experimented a bit around, initial clone is still large as if first bare close in made with --filter=blob:none which would be optimal, then the local clones of the bare are not able to resolve the object SHAs and miss reference. Due to structure with clone from bare clone I have no idea to slim initial clone down. Might be future improvement for a Git expert. Or strategy with clone from bare clone need to be revised.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional Git sparse-checkout support to GitDagBundle so monorepo users can materialize only selected directories in the working tree (cone mode). When sparse_dirs is provided, the local clone from the bare repo is performed with --sparse --no-checkout, followed by git sparse-checkout init --cone / set <dirs> and a checkout of the tracking ref. The initial bare clone is still full; the PR description acknowledges this as a future improvement.

Changes:

  • New sparse_dirs: list[str] | None kwarg on GitDagBundle, threaded into log context and the clone logic.
  • _clone_repo_if_required now conditionally adds --sparse --no-checkout clone options and configures cone-mode sparse checkout.
  • Docs example updated to mention sparse_dirs; new unit test verifying that only files under the configured sparse dir are present.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
providers/git/src/airflow/providers/git/bundles/git.py Adds sparse_dirs parameter and sparse-checkout setup after cloning from the bare repo.
providers/git/docs/bundles/index.rst Documents the new sparse_dirs kwarg in the JSON config example.
providers/git/tests/unit/git/bundles/test_git.py Adds test_sparse_checkout and type annotations to the git_repo fixture.

Comment thread providers/git/docs/bundles/index.rst Outdated
Comment thread providers/git/tests/unit/git/bundles/test_git.py
jscheffl and others added 3 commits May 17, 2026 21:38
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@jscheffl jscheffl force-pushed the feature/add-git-sparse-to-git-bundle branch from f991e18 to 834db3d Compare May 17, 2026 19:38
@potiuk potiuk merged commit 4cf176e into apache:main May 17, 2026
139 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants