Skip to content

fix(gitclone): refresh GitHub App token mid-subprocess via file-based credential helper#321

Closed
jrobotham-square wants to merge 7 commits into
mainfrom
fix-git-credential-helper-token-refresh
Closed

fix(gitclone): refresh GitHub App token mid-subprocess via file-based credential helper#321
jrobotham-square wants to merge 7 commits into
mainfrom
fix-git-credential-helper-token-refresh

Conversation

@jrobotham-square
Copy link
Copy Markdown
Contributor

Problem

git subprocesses spawned by cachew (clone, fetch, ls-remote, and
especially git lfs fetch) are configured with a credential.helper
that embeds the GitHub App installation token as a literal inside a
shell-function closure at exec time:

credHelper := "!f() { test \"$1\" = get && echo username=x-access-token && printf 'password=%s\\n' '" + escapedToken + "'; }; f"

GitHub App installation tokens are valid for one hour. So any
subprocess that runs longer than that — which routinely happens for
git lfs fetch on large LFS-heavy repositories — keeps presenting
its now-expired token to GitHub on every retry and loops on
Bad credentials until git-lfs's own retry budget gives up, typically
many hours later. The TokenManager refreshes its in-process cache
fine, but the running subprocess has no way to see the new value.

This was first surfaced as flapping shadow tests in staging but
verified to affect production too. Failure logs look like:

git lfs fetch: Fetching reference refs/heads/main
batch response: Bad credentials
batch response: Bad credentials
...
error: failed to fetch some objects from 'https://github.com/<org>/<repo>.git/info/lfs'
: exit status 2, elapsed=25h58m40s

Self-recovery happens only because the next periodic invocation gets a
fresh token and (if the LFS delta has shrunk enough) finishes inside
the 1 h window. No human action is involved.

Fix

Switch to a file-based credential helper. The token is written to a
0600 temporary file and exposed to git as:

credential.helper=!cat '<path>'

Git re-invokes the helper on every credential query, so it reads the
file fresh each time. A background goroutine bound to the returned
cleanup re-fetches the token on a 30 s ticker and rewrites the file
atomically (write-temp + rename) whenever the value changes. Most
ticks are a cheap cached lookup in TokenManager; only the pre-expiry
refresh actually hits the GitHub API. The atomic rename means a
concurrent cat either sees the old complete content or the new
complete content — never a partial token.

GitCommand now returns a cleanup func() alongside the *exec.Cmd
that callers must defer to stop the goroutine and remove the
credentials file. All in-tree callers (executeClone,
fetchInternal, GetUpstreamRefs, and the LFS snapshot fetch) are
updated. cleanup is always non-nil and safe to call multiple times,
including when GitCommand itself returned an error.

Tests

  • Existing TestGitCommandWithCredentialProvider is tightened to
    assert that the token is not embedded in the helper string and
    that the on-disk file contains the full credential response with
    0600 perms.
  • TestGitCommand_CleanupRemovesCredentialFile verifies the file is
    removed by cleanup and that cleanup is idempotent.
  • TestGitCommand_RefreshGoroutineUpdatesFile drives one tick of the
    refresh loop and asserts the file picks up a rotated token, then
    verifies a same-token tick is a no-op (no file churn).
  • TestWriteCredentialFile_Atomic runs concurrent reader/writer
    goroutines for 200 rewrites and asserts the reader never observes a
    partial write.
  • TestShellSingleQuote covers the path-quoting helper.

Full just test and just lint pass locally.

… credential helper

GitHub App installation tokens are valid for one hour. The previous
credential.helper embedded the token as a literal in a shell-function
closure at exec time, so any git subprocess that ran longer than the
TTL — most notably 'git lfs fetch' for large LFS repositories — would
keep presenting the original token to GitHub long after it had expired
and would loop on 'Bad credentials' until git-lfs's internal retry
budget gave up (observed in production: jobs running 20+ hours before
failing).

Switch to a file-based credential helper: the token is written to a
0600 temp file and exposed as 'credential.helper=!cat <file>', which
git re-reads on every credential query. A background goroutine
re-fetches the token on a 30s ticker for the lifetime of the returned
cleanup and rewrites the file atomically when the value changes, so
long-running subprocesses transparently pick up rotated tokens on the
next git-lfs retry. The TokenManager's own caching means the ticker is
a cheap map lookup for most ticks; only the pre-expiry refresh hits
the GitHub API.

GitCommand now returns a cleanup function alongside the cmd that
callers must defer to stop the goroutine and remove the credentials
file. All in-tree callers (clone, fetch, ls-remote, lfs snapshot) are
updated accordingly.

Amp-Thread-ID: https://ampcode.com/threads/T-019e805f-48b7-7594-9d46-415a44d2e1c5
Co-authored-by: Amp <amp@ampcode.com>
@jrobotham-square jrobotham-square requested a review from a team as a code owner June 1, 2026 00:09
@jrobotham-square jrobotham-square requested review from alecthomas and removed request for a team June 1, 2026 00:09
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b56403ad6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/gitclone/command.go Outdated
Drop doc comments that restate the code, retain only the why-not-what
explanations (helper file rotation contract, token-TTL rationale,
atomic-write invariant, etc.). Remove internal repository name from
the snapshot.go comment.

Amp-Thread-ID: https://ampcode.com/threads/T-019e805f-48b7-7594-9d46-415a44d2e1c5
Co-authored-by: Amp <amp@ampcode.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df1479d4fe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/gitclone/command.go Outdated
Comment thread internal/gitclone/command.go
Git appends the operation (`get`/`store`/`erase`) as a positional
argument to `!`-prefixed credential helpers. The previous
`!cat <credfile>` form therefore became `cat <credfile> get`, which
would also cat a file named `get` from the worktree (or `store`/
`erase`). Lines in that file are parsed by git as credential entries
and override the real token.

Wrap the helper in a shell function that only outputs on `get` and
absorbs the action argument. Add a regression test that drops hostile
`get`/`store`/`erase` files into a worktree and asserts
`git credential fill` returns the real token.

Reported by Codex on PR #321.

Amp-Thread-ID: https://ampcode.com/threads/T-019e805f-48b7-7594-9d46-415a44d2e1c5
Co-authored-by: Amp <amp@ampcode.com>
@jrobotham-square
Copy link
Copy Markdown
Contributor Author

Good catch — confirmed locally that with the original !cat '<credfile>' form, a worktree file named get does override the real token (and the same trick works for store/erase).

Fixed in cc097e7 by wrapping the helper in a shell function that both gates on get and absorbs the operation argument:

credential.helper=!f() { test "$1" = get && cat '<credfile>'; }; f

Added TestGitCommand_HelperIgnoresHostileGetFile that drops hostile get/store/erase files into a temp worktree and asserts git credential fill returns the real token. The test would fail against the previous helper string.

…ne on cleanup

Two issues reported by Codex on PR #321.

(P1) writeCredentialFile previously wrote the rotated token to
`path + ".new"` and renamed it over the destination. Because path
lives in /tmp on a shared host, a hostile local user could pre-create
`<path>.new` as a symlink, and os.WriteFile would follow it, leaking
the next-rotated GitHub App token to attacker-readable storage. Switch
to os.CreateTemp (O_EXCL + random suffix) for the intermediate file so
the write target cannot be pre-positioned by anyone else.

(P2) cleanup previously cancelled the refresh context and immediately
removed the credential file. A refresh tick that began before the
cancel could finish its rename AFTER the cleanup-side os.Remove,
leaving a fresh token sitting in /tmp once the caller believed it had
been wiped. Track the refresh goroutine in a sync.WaitGroup so cleanup
blocks until it has fully exited before removing the file.

Adds two regression tests:

- TestWriteCredentialFile_IgnoresHostileSiblingSymlink plants the
  `<path>.new` symlink the old code would have followed and asserts
  the rotated write lands at path, not at the symlink target.
- TestCleanup_WaitsForInFlightRefresh shrinks the refresh interval to
  1ms, drives the production goroutine into a blocking provider, and
  asserts cleanup does not return until the goroutine has unwound.

credentialFileRefreshInterval becomes a var (with a nolint comment)
purely so the second test can shrink it to ms granularity.

Amp-Thread-ID: https://ampcode.com/threads/T-019e805f-48b7-7594-9d46-415a44d2e1c5
Co-authored-by: Amp <amp@ampcode.com>
@jrobotham-square
Copy link
Copy Markdown
Contributor Author

Both valid. Fixed in dda656c.

P1 (symlink attack on path + ".new") — reproduced locally: os.WriteFile follows a pre-planted symlink and writes the rotated token to the symlink target. Switched the intermediate file to os.CreateTemp (O_EXCL + random suffix) in the same directory, then rename into place. Added TestWriteCredentialFile_IgnoresHostileSiblingSymlink that plants the old-style <path>.new symlink to a sentinel and asserts the rotated write lands at path, not at the sentinel.

P2 (cleanup races refresh tick)cleanup now tracks the refresh goroutine via sync.WaitGroup (using Go 1.25's wg.Go) and waits for it to exit before os.Remove(path). Added TestCleanup_WaitsForInFlightRefresh that shrinks credentialFileRefreshInterval to 1ms, drives the production goroutine into a blocking provider, and asserts cleanup blocks until the goroutine unwinds.

credentialFileRefreshInterval is now a var (with a nolint:gochecknoglobals // test seam annotation) solely to let the cleanup test shrink it.

Comment thread internal/gitclone/command.go Outdated
return nil, cleanup, errors.Wrap(err, "start token credential file")
}
cleanup = fileCleanup
// `!cmd` runs cmd via the shell on every credential query, so a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of these incredibly verbose comments please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it 😆
df1479d

jrobotham-square and others added 3 commits June 1, 2026 10:37
The deferred cleanup kept the credential helper file and its background
refresh goroutine alive through snapshot.CreatePaths archive upload,
which only touches the local cache. Scope the credential lifetime to
the subprocess that actually needs it.

Amp-Thread-ID: https://ampcode.com/threads/T-019e805f-48b7-7594-9d46-415a44d2e1c5
Co-authored-by: Amp <amp@ampcode.com>
@jrobotham-square
Copy link
Copy Markdown
Contributor Author

closed in favour of #322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants