Skip to content

Port data-repo reconciliation into the API process (closes #66)#68

Merged
themightychris merged 2 commits into
mainfrom
worktree-agent-aacd6f51bc5cf5c00
May 19, 2026
Merged

Port data-repo reconciliation into the API process (closes #66)#68
themightychris merged 2 commits into
mainfrom
worktree-agent-aacd6f51bc5cf5c00

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

Move the data-repo reconciliation state machine from deploy/docker/entrypoint.sh into the Node API process so the same code can serve both boot-time and the future hot-reload webhook (#65), and so the two latent shell-pipe bugs we just hit in production are obsoleted by structured Node error handling.

  • apps/api/src/store/reconcile.ts — the state machine, with the same five outcomes the shell had (in-sync / fast-forwarded / pushed-ahead / rebased / conflict-escaped) plus an explicit fetch-failed for network blips. Same conflict-escape semantics — abort rebase, create + push a conflicts/<UTC> branch from pre-rebase HEAD, hard-reset local to origin.
  • apps/api/src/plugins/reconcile.ts — Fastify plugin registered between storePlugin and servicesPlugin (so the in-memory state is built from the post-reconciliation tree). Decorates fastify.dataRepoLock (a single-slot async lock) and fastify.reconcileDataRepo({branch}) so Webhook endpoint for hot-reload on push to the published data branch #65's webhook can call the same code path with one line.
  • apps/api/tests/data-repo-reconcile.test.ts — seven cases covering all five outcomes against ephemeral bare-repo "remotes", plus a fetch-failed case (bogus remote URL) and a regression test for the single-branch-clone refspec bug.
  • deploy/docker/entrypoint.sh — trimmed from ~190 lines to ~75. What's left: trust the PVC, set a pseudonymous git identity, full-history clone on first boot. The reconciliation runs inside the API process.
  • docs/operations/deploy.md — updated boot-sequence section + a new reconciliation-state-machine reference.

Bugs obsoleted in passing

  1. Single-branch clone refspec narrowness. Reconciler always fetches with an explicit +refs/heads/<branch>:refs/remotes/<remote>/<branch>. No more silently empty refs/remotes/origin/Y after git clone --branch X.
  2. git rebase | sed swallowed the rebase exit code. Node's execFile rejects the Promise on non-zero exit — the pipe class of bug doesn't exist in this code.

State machine outcomes tested

Outcome Test
in-sync local HEAD == remote HEAD
fast-forwarded local behind by 1 commit; ff-only merge succeeds
pushed-ahead local ahead by 1 commit; push lands on the bare
rebased diverged on independent files; rebase clean, push lands
conflict-escaped both sides touch same file; conflict branch pushed; HEAD reset to origin
fetch-failed bogus origin URL; local state preserved, warn logged
(regression) single-branch clone + reconcile against same branch ff's

What's kept in the entrypoint and why

  • git config --global safe.directory $CFP_DATA_REPO_PATH — must happen before any git operation in the container because the PVC may carry files owned by a different uid.
  • Pseudonymous git identity env vars — belt-and-suspenders for any tool that touches the tree outside the API. Reconciler also re-applies these to the repo-local git config user.name/email.
  • Full-history initial clone — has to happen before Node starts, since openPublicStore() errors out if .git doesn't exist. Subsequent boots: the PVC has a clone and we just exec node. The reconciler picks it up from there.
  • git remote set-url origin "$CFP_DATA_REMOTE" for already-cloned PVCs — keeps an operator-rotated remote URL live without forcing a re-clone.

Surprises / decisions

  • gitsheets' internal Mutex is not exposed on Repository. We don't reach through internals; instead the plugin owns a single-slot lock at the Fastify layer (apps/api/src/lib/data-repo-lock.ts). Boot is uncontended; Webhook endpoint for hot-reload on push to the published data branch #65's webhook will hold this lock across fetch + in-memory rebuild.
  • The conflict-branch name uses the same conflicts/YYYY-MM-DDTHH-MM-SSZ shape the shell version produced, so any operator alerting on conflicts/* ref creation keeps working unchanged.
  • receive.denyCurrentBranch=ignore on the bare in tests — needed because the bare carries main as its HEAD ref and the default receive.denyCurrentBranch=refuse would reject pushes. This is only on the test-rig bare; production pushes go to GitHub which doesn't have working trees.
  • outcome: 'pushed-ahead' still returns success even when the push itself fails — same semantic the shell had. The push daemon retries on its schedule, and we don't want a transient push failure to crash the pod.

Test plan

  • npm run -w packages/shared build
  • npm run -w apps/api type-check
  • npm run -w apps/api test (233/233 pass, including 7 new in data-repo-reconcile.test.ts)
  • npm run type-check across all workspaces
  • npm run lint
  • Production smoke: pod restart, observe data-repo reconciled info line at boot
  • Production smoke: provoke a rebase conflict, observe conflict escape hatch error line + a conflicts/<UTC> branch on origin

Closes #66.

🤖 Generated with Claude Code

themightychris and others added 2 commits May 19, 2026 08:53
Adds a structured reconciliation state machine in
`apps/api/src/store/reconcile.ts` plus a Fastify plugin that wires it
between `storePlugin` and `servicesPlugin`, so the in-memory state is
built from the post-reconciliation tree. This replaces the shell logic in
`deploy/docker/entrypoint.sh` (see follow-up commit) and gives the future
hot-reload webhook (#65) a single call to make.

The state machine preserves all five outcomes from the shell version
(in-sync / fast-forwarded / pushed-ahead / rebased / conflict-escaped)
plus an explicit 'fetch-failed' for network blips. The conflict-escape
branch name (`conflicts/<UTC>`) matches the shell format so existing
operator tooling keeps working.

In passing, two latent bugs from the shell version are obsoleted:

  1. Single-branch clone refspec narrowness — we now always fetch with
     an explicit `+refs/heads/<branch>:refs/remotes/<remote>/<branch>`,
     so `git clone --branch X` followed by reconciling against the same
     X keeps working regardless of remote.origin.fetch.
  2. `git rebase | sed` swallowing the rebase exit code — Node's
     execFile rejects the Promise on a non-zero exit, no pipe in sight.

Also adds a small `dataRepoLock` decoration on Fastify (a single-slot
async lock) for serializing reconciliation with future webhook-driven
git operations. At boot it's uncontended; #65's handler will hold it
across the fetch + in-memory rebuild.

Refs #66.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The entrypoint's reconcile state machine moved into the API process
(prior commit). What's left is the bits that *must* run before the Node
process exists: trust the PVC, set a pseudonymous git identity for
rebase committer lines, and ensure a `.git` exists by doing a
full-history clone on first boot.

About 190 lines of shell → ~75 (most of it now comments). The
reconciliation, conflict-escape-hatch, and fetch-failure handling are
all the API's job now.

Also updates docs/operations/deploy.md "Boot sequence" to reflect the
split — entrypoint clones, API reconciles.

Refs #66.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@themightychris themightychris merged commit bb82486 into main May 19, 2026
1 check passed
@themightychris themightychris deleted the worktree-agent-aacd6f51bc5cf5c00 branch May 19, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move data-repo reconciliation from entrypoint shell into the API process

1 participant