Skip to content

fix: per-agent snapshot clones for git mirrors#3789

Merged
DrJosh9000 merged 1 commit intomainfrom
ps-1577-snapshot-clones
Apr 14, 2026
Merged

fix: per-agent snapshot clones for git mirrors#3789
DrJosh9000 merged 1 commit intomainfrom
ps-1577-snapshot-clones

Conversation

@DrJosh9000
Copy link
Copy Markdown
Contributor

@DrJosh9000 DrJosh9000 commented Mar 31, 2026

Description

Fix the Big Git Mirrors Problem a different way.

Context

https://linear.app/buildkite/issue/PS-1577
cc #2208

Changes

When updating a git mirror, the update now creates a snapshot of the mirror, which is just a bare clone in a neighbouring directory, before releasing the update file lock. Done this way, the snapshot should share the underlying Git objects via hardlinks, so it can be made quite quickly and with negligible extra storage (at least until the mirror starts drifting). The checkout (a reference clone) will thus not be affected if the mirror's object start drifting.

This only applies if clean checkout is enabled, because that makes it easy to decide when to clean up each snapshot. It's recommended to use clean checkouts with git mirrors. We can figure out the other cases later.

Also refactored umask logic from internal/artifact into internal/osutil to share with the new logic here.

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go tool gofumpt -extra -w .)

Disclosures / Credits

I did not use AI tools to create this PR - aside from Codex reviewing it (whether or not I want it to, but all changes in response to Codex's comments were still written by me).

@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch 6 times, most recently from db1d936 to 1ef8708 Compare April 7, 2026 05:55
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from 1ef8708 to 3491f9e Compare April 13, 2026 03:45
@DrJosh9000 DrJosh9000 marked this pull request as ready for review April 13, 2026 04:00
@DrJosh9000 DrJosh9000 requested review from a team as code owners April 13, 2026 04:00
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from 3491f9e to afc407b Compare April 13, 2026 04:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: afc407b351

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/job/checkout.go
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from afc407b to 8aea398 Compare April 13, 2026 04:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8aea3980ab

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/job/checkout.go
@DrJosh9000 DrJosh9000 marked this pull request as draft April 13, 2026 04:59
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from 8aea398 to c5aebaf Compare April 13, 2026 05:10
@DrJosh9000 DrJosh9000 marked this pull request as ready for review April 13, 2026 05:10
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

return mirrorDir, nil

P2 Badge Snapshot freshly cloned mirrors before returning

In the cache-miss branch, updateGitMirror clones mirrorDir and returns it directly instead of routing through snapshotMirror. That means the first build that populates a mirror (with CleanCheckout=true) still performs checkout/submodule clones against the mutable shared mirror, so a concurrent job can fetch/gc that mirror and invalidate alternates during this build—the same churn failure this change is meant to avoid. Returning a per-agent snapshot here would close that remaining race window.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from c5aebaf to 204fab0 Compare April 13, 2026 05:18
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 204fab0334

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/job/checkout.go
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch 2 times, most recently from 2830ee2 to fd1d36b Compare April 13, 2026 05:32
Copy link
Copy Markdown
Contributor

@zhming0 zhming0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand the code change, I mainly have questions. I think it'd be better if we describe the failure domain and explain how this fix will change that.

Comment thread internal/job/checkout.go
Comment thread internal/job/checkout.go Outdated
Comment thread internal/job/checkout.go Outdated
Comment thread internal/job/checkout.go
Comment on lines +535 to +537
if !e.CleanCheckout {
return mirrorDir, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fix only applicable to clean checkout?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If clean checkout is disabled, the agent will try to reuse an existing checkout in later jobs for the same repo. That checkout could exist forever. If we used snapshots for that case, the checkout will refer to that snapshot created when the checkout was cloned, potentially forever. So we can't delete the snapshot, or the checkout would be corrupted. Since the snapshots and mirror will drift as the mirror is updated over time, the snapshots (that we can't delete) will start taking space. Plus if the mirror is updated, the snapshot won't be updated, so any update to the checkout will redundantly fetch from upstream - so we'd want more logic to avoid updating the mirror unless the checkout is a fresh clone.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the rationale now though I wonder if it means a missed opportunities for all ephemeral agents?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps... though I'm not entirely sure what you mean.

We can enable snapshotting for more cases later on?

@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch 2 times, most recently from de4d40a to d796ca5 Compare April 14, 2026 04:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: de4d40a600

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/job/checkout.go Outdated
Copy link
Copy Markdown
Contributor

@zhming0 zhming0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Only have a question around if it can be used more broadly.

@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from d796ca5 to 2128514 Compare April 14, 2026 06:17
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2128514bba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/osutil/umask.go
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from 2128514 to 05ba7c0 Compare April 14, 2026 06:23
@DrJosh9000 DrJosh9000 force-pushed the ps-1577-snapshot-clones branch from 05ba7c0 to 169b5ee Compare April 14, 2026 07:04
@DrJosh9000 DrJosh9000 enabled auto-merge April 14, 2026 07:08
@DrJosh9000 DrJosh9000 merged commit 6932f51 into main Apr 14, 2026
3 checks passed
@DrJosh9000 DrJosh9000 deleted the ps-1577-snapshot-clones branch April 14, 2026 07:09
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 169b5eed1b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread internal/job/checkout.go
Comment on lines +594 to +595
// Automatically remove it during teardown
e.cleanupDirs = append(e.cleanupDirs, snapshotDir)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve snapshot lifetime across split job phases

snapshotMirror always queues the snapshot for teardown via cleanupDirs, but tearDown runs at the end of each executor invocation, including checkout-only runs. In phase-split environments (checkout and command in separate containers), this deletes the snapshot before the command phase starts; with git-mirror-checkout-mode=reference (the default), the checkout still relies on that snapshot through alternates, so later git operations in the command phase can fail with missing-object errors. This affects jobs with git mirrors + clean checkout enabled.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually a good point - guess we should disable snapshots in split-phase scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants