Background
PR #86 switched the data-repo volume from PVC to emptyDir and made the clone bare. Two doc gaps remain:
1. Durability tradeoff is undocumented
With emptyDir, the bare clone is wiped on every pod boot. If the pod crashes (OOM, node failure, hard kill) before the push daemon has shipped locally-committed gitsheets writes to origin, those commits are lost — the volume goes away before any container code (reconcile, escape-hatch) can run.
Today's reconcile escape-hatches local-ahead commits to conflicts/<UTC> only when the container restarts cleanly enough to invoke reconcileDataRepo. On a hard crash, that path never runs.
Mitigating factor: the push daemon runs continuously with retry/backoff, so the window of unpushed local state is small. But this is now a meaningful change in the durability story vs. the previous PVC-backed setup, and it should be documented as an explicit operating constraint.
Where to add:
specs/behaviors/storage.md — under "The data clone is bare", a short subsection: "Durability: writes are durable once origin/<branch> has them. The push daemon is the only line of defense for in-flight commits; on hard pod crash before the next push, local commits are lost."
docs/operations/runbook.md — in the "API won't boot" or a new "Durability" section, the same note plus a pointer to the push daemon's lag/log for verification.
2. Stale wording in runbook's "Fetch from the pod's data clone" section
docs/operations/runbook.md — the section that documents the git-pod-uploadpack.sh operator helper still says:
The pod's working tree lives on a PVC at /app/data inside the container...
After PR #86 that's incorrect on both counts — it's now a bare gitdir on an emptyDir. Update to reflect bare-clone reality. Reference specs/behaviors/storage.md → "The data clone is bare".
Scope
Single PR. Maybe ~30 lines of diff total.
Filed as follow-up from PR #86.
Background
PR #86 switched the data-repo volume from PVC to
emptyDirand made the clone bare. Two doc gaps remain:1. Durability tradeoff is undocumented
With
emptyDir, the bare clone is wiped on every pod boot. If the pod crashes (OOM, node failure, hard kill) before the push daemon has shipped locally-committed gitsheets writes to origin, those commits are lost — the volume goes away before any container code (reconcile, escape-hatch) can run.Today's reconcile escape-hatches local-ahead commits to
conflicts/<UTC>only when the container restarts cleanly enough to invokereconcileDataRepo. On a hard crash, that path never runs.Mitigating factor: the push daemon runs continuously with retry/backoff, so the window of unpushed local state is small. But this is now a meaningful change in the durability story vs. the previous PVC-backed setup, and it should be documented as an explicit operating constraint.
Where to add:
specs/behaviors/storage.md— under "The data clone is bare", a short subsection: "Durability: writes are durable onceorigin/<branch>has them. The push daemon is the only line of defense for in-flight commits; on hard pod crash before the next push, local commits are lost."docs/operations/runbook.md— in the "API won't boot" or a new "Durability" section, the same note plus a pointer to the push daemon's lag/log for verification.2. Stale wording in runbook's "Fetch from the pod's data clone" section
docs/operations/runbook.md— the section that documents thegit-pod-uploadpack.shoperator helper still says:After PR #86 that's incorrect on both counts — it's now a bare gitdir on an
emptyDir. Update to reflect bare-clone reality. Referencespecs/behaviors/storage.md→ "The data clone is bare".Scope
Single PR. Maybe ~30 lines of diff total.
Filed as follow-up from PR #86.