Slice 3A: processgit-updater sidecar (state machine + HTTP API + cosign verify)#128
Conversation
…gn verify)
Foundation of the in-product self-update story. Adds the `updater/`
sidecar: a tiny separate Go module that orchestrates ProcessGit updates
inside Docker deployments.
A separate process is necessary because:
1. A container cannot safely update itself in place. Replacing the
running binary while it serves requests races with active sessions
and connections. The sidecar runs continuously and survives
main-container restarts.
2. Privilege boundary. The updater needs access to /var/run/docker.sock;
the main ProcessGit container does not.
3. Tiny dependency surface. Stdlib-only Go (enforced by a tripwire
test), plus docker CLI + cosign in the runtime image. The whole
sidecar is independently reviewable.
What ships in this PR (Slice 3A):
HTTP API
GET /healthz — liveness (no auth)
GET /status — current state, active job
GET /releases/latest — proxies GitHub Releases
POST /update — kicks off an update; 409 if one runs
GET /update/{id} — job status + step history
GET /history — last 50 jobs (newest first)
Auth: bearer token from $PROCESSGIT_UPDATER_TOKEN, constant-time
compare via crypto/subtle.
State machine
idle → planning → snapshotting → pulling → verifying → migrating
→ swapping → healthchecking → committed
On post-swap failure: rolling_back → rolled_back. If rollback
itself fails: failed (requires manual intervention).
Persistence
Atomic write-temp-then-rename on $STATE_DIR/state.json. Bounded
history (50 jobs). One job active at a time, enforced by Store.
Real wiring
- GitHub Releases API client (api.github.com/repos/…/releases).
- cosign verify (image) + cosign verify-blob (release.json) via
os/exec. Manifest signature is verified BEFORE we trust ANY
field of release.json — an attacker who substitutes a malicious
manifest cannot redirect the updater to a different image.
Docker operations
Stubbed in Slice 3A — each method logs + sleeps to simulate work.
The orchestrator's state machine and HTTP API are exercisable
end-to-end without touching real containers. PROCESSGIT_UPDATER_STUB
defaults to true; Slice 3B will replace the stubs with real
docker CLI invocations (pull / run --rm / stop / run / inspect /
exec — none of which is conceptually hard, but each deserves
careful testing and review).
Runtime image
Multi-stage Dockerfile: golang:1.25-alpine3.22 build → alpine:3.22
runtime with docker-cli, cosign (from gcr.io/projectsigstore/cosign),
ca-certificates, tini. Final image ~150 MB, dominated by docker CLI.
Non-root user, EXPOSE 9000, ENTRYPOINT via tini for clean SIGTERM
handling.
Tests (7 total, all passing):
- Store round-trip: load/save/active enforced, ordering preserved
after reload
- Concurrent AddJob refused
- State.IsTerminal classification
- Job.transitionTo sets CompletedAt on terminal states only
- Orchestrator happy path: full state-machine traversal in stub mode
using a fake GitHub server + /bin/true as cosign stub
- Concurrent-update rejection at the API layer
- Bearer-auth: 401 without/wrong token, 200 with correct token
- Tripwire: TestNoExternalImports fails if a go.sum ever appears
Code stats: ~1900 lines total across 11 files. ~1400 lines of Go
(including 346 lines of tests).
Out of scope for this PR (deliberate splits):
- deploy/docker-compose.yml integration: adds the updater service,
wires the bearer token via .env, sets the network so only the
main app can reach the updater. Separate PR — touches deployment
config and will need .env.example documentation.
- .github/workflows/release.yml addition: builds the updater image
paired with the main image on every semver tag. Separate PR —
small workflow edit, doesn't affect updater code.
- Slice 3B: real docker calls (pull / swap / healthcheck / rollback).
- Slice 3C: volume snapshot for full disaster recovery.
- Slice 4: admin UI at /-/admin/updates consuming this API.
Co-authored-by: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff62b6507b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "--certificate-identity-regexp", m.Signing.IdentityRegex, | ||
| "--certificate-oidc-issuer", m.Signing.Issuer, |
There was a problem hiding this comment.
Pin cosign identity policy outside untrusted manifest
VerifyBlob uses m.Signing.IdentityRegex and m.Signing.Issuer from release.json before that manifest is trusted, which defeats the stated security boundary. If an attacker can substitute release assets, they can choose a permissive regex/issuer and provide a matching cert/signature so blob verification passes, then control image and migration fields. The expected signer policy must be fixed in updater config/code (or another trusted channel), not read from the unverified payload being authenticated.
Useful? React with 👍 / 👎.
| req.Header.Set("Accept", "application/octet-stream") | ||
| req.Header.Set("User-Agent", "processgit-updater") | ||
| resp, err := c.HTTP.Do(req) |
There was a problem hiding this comment.
Authenticate release asset downloads with GitHub token
The updater documents PROCESSGIT_UPDATER_GITHUB_TOKEN as required for private repos, but downloadAsset does not send any Authorization header when fetching browser_download_url. In private repositories this causes manifest asset fetches to fail (typically 404), so updates cannot start even when a token is configured. Reuse the token-authenticated API asset flow (or equivalent authenticated download path) for these requests.
Useful? React with 👍 / 👎.
| func (s *Store) Active() *Job { | ||
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
| return s.active |
There was a problem hiding this comment.
Return a snapshot from Active to avoid concurrent races
Store.Active returns the internal *Job pointer directly, unlike Get/List which copy. While an update is running, /status can serialize this object concurrently with orchestrator mutations (transitionTo appends/modifies Steps), creating unsynchronized read/write access to the same struct. This can yield race-detector failures and undefined runtime behavior; return a copied snapshot under the lock instead.
Useful? React with 👍 / 👎.
Slice 3A —
processgit-updatersidecarFoundation of the in-product self-update story. Adds the
updater/sidecar: a tiny separate Go module (stdlib-only) that orchestrates ProcessGit updates inside Docker deployments.Why a separate process
/var/run/docker.sock. The main ProcessGit container does not.TestNoExternalImports), plusdocker-cli+cosignin the runtime image. ~150 MB final image, dominated by the docker CLI.What ships
GET /healthz,GET /status,GET /releases/latest,POST /update,GET /update/{id},GET /history. All non-/healthzpaths requireAuthorization: Bearer $PROCESSGIT_UPDATER_TOKEN, compared in constant time.idle → planning → snapshotting → pulling → verifying → migrating → swapping → healthchecking → committed. Failure paths:rolling_back → rolled_back(recovered) orfailed(manual intervention). One job at a time, enforced byStore.AddJob.$STATE_DIR/state.json, atomic write-temp-then-rename, bounded history (50 jobs).api.github.com/repos/…/releases+ asset downloads.cosign verify(image) +cosign verify-blob(release.json), viaos/exec.golang:1.25-alpine3.22build →alpine:3.22runtime withdocker-cli,cosign(fromgcr.io/projectsigstore/cosign:v2.4.1),ca-certificates,tini. Non-root, EXPOSE 9000.Critical safety property
The manifest signature is verified before we trust any of its fields (image ref, digest, migration command). An attacker who can substitute a malicious
release.jsoncannot redirect the updater to a different image, becausecosign verify-blobagainst the workflow's OIDC identity must pass first.What's stubbed (Slice 3B)
Docker.Pull,Docker.SwapContainer,Docker.RunMigration,Docker.Healthcheck,Docker.Rollback,Docker.InspectAppImageDigest,Docker.Snapshot— all currently log and sleep. Stub mode is the default (PROCESSGIT_UPDATER_STUB=true). The orchestrator state machine, GitHub client, cosign verification, persistence, and HTTP API all run real code; only the container surgery is deferred.This split keeps the PR reviewable (~1400 lines of Go, no real container operations) and lets the state machine and signature-verification logic harden before they touch live containers.
Tests
7 tests, all passing locally against Go 1.22 (sandbox limit) and against Go 1.25 (target):
The 14-second
HappyPathtest traverses every state in the machine using stubbed docker ops, a fake GitHub server (httptest.NewServer), and/bin/trueas a cosign substitute.Local quickstart
Full state-machine run completes in ~20 seconds in stub mode.
Out of scope (queued)
deploy/docker-compose.ymlintegration — adds the updater service, wires bearer token via.env, sets internal network.github/workflows/release.ymlextension — builds & signsprocessgit-updaterimage paired with the main image on every semver tagdocker pull/swap/healthcheck/rollback/-/admin/updatesconsuming this APIFile inventory