Skip to content

provisioner: relocate observability scaffolding from api repo#6

Merged
mastermanas805 merged 2 commits into
masterfrom
obs/provisioner-obs-relocate-2026-05-12
May 12, 2026
Merged

provisioner: relocate observability scaffolding from api repo#6
mastermanas805 merged 2 commits into
masterfrom
obs/provisioner-obs-relocate-2026-05-12

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

Phase B2 of the observability rollout — the real source of truth for the provisioner's slog default, New Relic agent init, nrgrpc unary interceptor, trace-id stamper, and HTTP /healthz sidecar on :8092 was staged under api/provisioner/ in the api repo as a reference scaffold. This PR moves those four files into the canonical provisioner repo (here) and switches imports from the temporary instant.dev/provisioner/internal/_obs_stubs/{buildinfo,logctx} stubs to the canonical instant.dev/common/{buildinfo,logctx} packages — the same swap as InstaNode-dev/api#40.

Pair with the deployment manifest update in infra/k8s/provisioner/deployment.yaml (lives in the api repo) to add containerPort: 8092 named healthz once the new image is wired into the rollout pipeline.

Changes

  • main.go: install logctx slog default; replace single UnaryInterceptor with ChainUnaryInterceptor(auth, composeTraceIDInjector(nrgrpc.UnaryServerInterceptor(nrApp))); start :8092 HTTP /healthz sidecar; graceful shutdown of both surfaces on SIGTERM.
  • internal/server/healthz.go: HealthzHandler returning {ok, service, commit_id, build_time, version} JSON, reading buildinfo from instant.dev/common/buildinfo so the existing Dockerfile -ldflags -X path lights it up.
  • internal/server/healthz_test.go: response-shape + method-agnostic assertions.
  • main_test.go: 11 tests covering fail-open NR init (empty + bogus key), port non-collision with gRPC, trace-id propagation end-to-end through the chained interceptor, and logctx round-trip via the common package's TraceIDFromContext getter (the stub used TraceID — renamed here).
  • Dockerfile: bump builder from golang:1.24-alpine to golang:1.25-alpine — the New Relic Go agent v3.43.3 declares go 1.25 in its go.mod, which transitively forces this module to go 1.25.0.
  • go.mod / go.sum: pull in github.com/newrelic/go-agent/v3 and github.com/newrelic/go-agent/v3/integrations/nrgrpc.

Test counts

go test ./... -v
-> 34 PASS, 0 FAIL, 0 SKIP

11 of the 34 are the new tests in main_test.go; 2 of the 34 are the new internal healthz_test.go; the remaining 21 are pre-existing tests unaffected by the relocation.

Deploy + verify-live evidence

Image built and pushed:

  • Tag: ghcr.io/mastermanas805/instant-provisioner:v1.0.0-obs-cf456a9
  • Manifest sha256:2e767a55a97d9b2da5c1eb577511786c9ada615c806e44106d090acf184eccba
  • Build args: GIT_SHA=cf456a9 BUILD_TIME=2026-05-12T17:46:04Z VERSION=v1.0.0-obs

Rollout:

$ kubectl set image deployment/instant-provisioner -n instant-infra \
    provisioner=ghcr.io/mastermanas805/instant-provisioner:v1.0.0-obs-cf456a9
deployment.apps/instant-provisioner image updated

$ kubectl rollout status deployment/instant-provisioner -n instant-infra --timeout=180s
deployment "instant-provisioner" successfully rolled out

/healthz curl evidence:

$ kubectl port-forward deployment/instant-provisioner -n instant-infra 8192:8092 &
$ curl -s http://localhost:8192/healthz
{"ok":true,"service":"instant-provisioner","commit_id":"cf456a9","build_time":"2026-05-12T17:46:04Z","version":"v1.0.0-obs"}

commit_id matches the git SHA. The Dockerfile -ldflags -X instant.dev/common/buildinfo.GitSHA=${GIT_SHA} path is end-to-end live.

Pushback / things flagged

  1. Rollout required a one-off secret patch unrelated to this PR. The cluster's instant-infra/instant-infra-secrets was missing the PROVISIONER_DATABASE_URL key referenced by the deployment env (the old pods were running because they were created before the key was removed; pods don't re-validate secretKeyRefs after start). Patched the secret to add an empty value (kubectl patch secret instant-infra-secrets -n instant-infra --type=json -p='[{"op":"add","path":"/data/PROVISIONER_DATABASE_URL","value":""}]') — the provisioner's cfg.ProvisionerDatabaseURL == "" path already exists in code (pool disabled, log line provisioner.pool_disabled — PROVISIONER_DATABASE_URL or AES_KEY not set). Belongs in a follow-up infra PR to mark these secretKeyRefs optional: true or to populate the real DSN.
  2. slog handler emits commit_id="dev" in body logs even though /healthz correctly reports the real SHA. Root cause: instant.dev/common/logctx's commitID() reads from the COMMIT_ID env var, falling back to "dev" — it does NOT read from buildinfo.GitSHA. Same gotcha that affected the api/worker side in api: switch obsstubs to common packages so ldflag actually patches the live buildinfo api#40; the api repo's fix was to set COMMIT_ID env in the deployment. Recommend mirroring that in infra/k8s/provisioner/deployment.yaml in the next infra PR (out of scope here since this PR is to the provisioner repo, not infra).
  3. No existing slog/NR scaffolding to merge at master. The provisioner repo was clean — only the buildinfo ldflag path (from PR obs: Dockerfile ldflags + smoke-buildinfo target (track 1/8) #5, track 1) was already in. No conflicts.

Container name in instant-provisioner Deployment: provisioner (single container; gRPC :50051, plus the new internal /healthz :8092 sidecar not yet declared as a containerPort).

Test plan

  • go test ./... -v passes locally (34 PASS / 0 FAIL / 0 SKIP)
  • Docker buildx build + push succeeds
  • k8s rollout completes (2/2 pods Running on new image)
  • curl /healthz returns commit_id matching the git SHA from the built image
  • Add containerPort: 8092 name: healthz to infra/k8s/provisioner/deployment.yaml in a follow-up infra PR
  • Add COMMIT_ID env var to the deployment in a follow-up infra PR so body logs match /healthz

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

mastermanas805 and others added 2 commits May 12, 2026 23:14
Phase B2 of the observability rollout — the real source of truth for
the provisioner's slog default, New Relic agent init, nrgrpc unary
interceptor, trace-id stamper, and HTTP /healthz sidecar on :8092
was staged under api/provisioner/ as a reference scaffold. This PR
moves those four files into the canonical provisioner repo and
switches imports from the temporary instant.dev/provisioner/internal/
_obs_stubs/{buildinfo,logctx} stubs to the canonical
instant.dev/common/{buildinfo,logctx} packages. Same swap as PR #40
on the api repo.

Changes
-------
- main.go: install logctx slog default; chain ChainUnaryInterceptor
  with the existing auth interceptor plus nrgrpc + trace-id stamper;
  start :8092 HTTP /healthz sidecar; graceful shutdown of both
  surfaces on SIGTERM.
- internal/server/healthz.go: HealthzHandler returning
  {ok, service, commit_id, build_time, version} JSON, reading
  buildinfo from instant.dev/common/buildinfo so the existing
  Dockerfile -ldflags -X path lights it up.
- internal/server/healthz_test.go: response-shape + method-agnostic
  assertions.
- main_test.go: 11 new tests covering fail-open NR init, port
  non-collision with gRPC, trace-id propagation through the chained
  interceptor, and logctx round-trip via the common package's
  TraceIDFromContext getter (the stub used TraceID — renamed here).

Test counts
-----------
go test ./... -v: 34 PASS, 0 FAIL, 0 SKIP.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The New Relic Go agent v3.43.3 (pulled in by the observability scaffold)
declares `go 1.25` in its go.mod, which transitively forces this module
to `go 1.25.0`. The Dockerfile's previous `golang:1.24-alpine` builder
image rejected `go mod download` with `requires go >= 1.25.0`.

Bumped to `golang:1.25-alpine`. The api repo's Dockerfile was already
on 1.25 (api/go.mod has been on 1.25.0 since the obs-1 merge); this
brings the provisioner Dockerfile in line.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mastermanas805 mastermanas805 merged commit 05c6cb0 into master May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant