provisioner: relocate observability scaffolding from api repo#6
Merged
Merged
Conversation
Phase B2 of the observability rollout — the real source of truth for
the provisioner's slog default, New Relic agent init, nrgrpc unary
interceptor, trace-id stamper, and HTTP /healthz sidecar on :8092
was staged under api/provisioner/ as a reference scaffold. This PR
moves those four files into the canonical provisioner repo and
switches imports from the temporary instant.dev/provisioner/internal/
_obs_stubs/{buildinfo,logctx} stubs to the canonical
instant.dev/common/{buildinfo,logctx} packages. Same swap as PR #40
on the api repo.
Changes
-------
- main.go: install logctx slog default; chain ChainUnaryInterceptor
with the existing auth interceptor plus nrgrpc + trace-id stamper;
start :8092 HTTP /healthz sidecar; graceful shutdown of both
surfaces on SIGTERM.
- internal/server/healthz.go: HealthzHandler returning
{ok, service, commit_id, build_time, version} JSON, reading
buildinfo from instant.dev/common/buildinfo so the existing
Dockerfile -ldflags -X path lights it up.
- internal/server/healthz_test.go: response-shape + method-agnostic
assertions.
- main_test.go: 11 new tests covering fail-open NR init, port
non-collision with gRPC, trace-id propagation through the chained
interceptor, and logctx round-trip via the common package's
TraceIDFromContext getter (the stub used TraceID — renamed here).
Test counts
-----------
go test ./... -v: 34 PASS, 0 FAIL, 0 SKIP.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The New Relic Go agent v3.43.3 (pulled in by the observability scaffold) declares `go 1.25` in its go.mod, which transitively forces this module to `go 1.25.0`. The Dockerfile's previous `golang:1.24-alpine` builder image rejected `go mod download` with `requires go >= 1.25.0`. Bumped to `golang:1.25-alpine`. The api repo's Dockerfile was already on 1.25 (api/go.mod has been on 1.25.0 since the obs-1 merge); this brings the provisioner Dockerfile in line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase B2 of the observability rollout — the real source of truth for the provisioner's slog default, New Relic agent init, nrgrpc unary interceptor, trace-id stamper, and HTTP /healthz sidecar on :8092 was staged under
api/provisioner/in the api repo as a reference scaffold. This PR moves those four files into the canonical provisioner repo (here) and switches imports from the temporaryinstant.dev/provisioner/internal/_obs_stubs/{buildinfo,logctx}stubs to the canonicalinstant.dev/common/{buildinfo,logctx}packages — the same swap as InstaNode-dev/api#40.Pair with the deployment manifest update in
infra/k8s/provisioner/deployment.yaml(lives in the api repo) to addcontainerPort: 8092namedhealthzonce the new image is wired into the rollout pipeline.Changes
logctxslog default; replace singleUnaryInterceptorwithChainUnaryInterceptor(auth, composeTraceIDInjector(nrgrpc.UnaryServerInterceptor(nrApp))); start:8092HTTP/healthzsidecar; graceful shutdown of both surfaces on SIGTERM.HealthzHandlerreturning{ok, service, commit_id, build_time, version}JSON, reading buildinfo frominstant.dev/common/buildinfoso the existing Dockerfile-ldflags -Xpath lights it up.TraceIDFromContextgetter (the stub usedTraceID— renamed here).golang:1.24-alpinetogolang:1.25-alpine— the New Relic Go agent v3.43.3 declaresgo 1.25in its go.mod, which transitively forces this module togo 1.25.0.github.com/newrelic/go-agent/v3andgithub.com/newrelic/go-agent/v3/integrations/nrgrpc.Test counts
11 of the 34 are the new tests in
main_test.go; 2 of the 34 are the new internalhealthz_test.go; the remaining 21 are pre-existing tests unaffected by the relocation.Deploy + verify-live evidence
Image built and pushed:
ghcr.io/mastermanas805/instant-provisioner:v1.0.0-obs-cf456a9GIT_SHA=cf456a9 BUILD_TIME=2026-05-12T17:46:04Z VERSION=v1.0.0-obsRollout:
/healthzcurl evidence:commit_idmatches the git SHA. The Dockerfile-ldflags -X instant.dev/common/buildinfo.GitSHA=${GIT_SHA}path is end-to-end live.Pushback / things flagged
instant-infra/instant-infra-secretswas missing thePROVISIONER_DATABASE_URLkey referenced by the deployment env (the old pods were running because they were created before the key was removed; pods don't re-validate secretKeyRefs after start). Patched the secret to add an empty value (kubectl patch secret instant-infra-secrets -n instant-infra --type=json -p='[{"op":"add","path":"/data/PROVISIONER_DATABASE_URL","value":""}]') — the provisioner'scfg.ProvisionerDatabaseURL == ""path already exists in code (pool disabled, log lineprovisioner.pool_disabled — PROVISIONER_DATABASE_URL or AES_KEY not set). Belongs in a follow-up infra PR to mark these secretKeyRefsoptional: trueor to populate the real DSN.commit_id="dev"in body logs even though/healthzcorrectly reports the real SHA. Root cause:instant.dev/common/logctx'scommitID()reads from theCOMMIT_IDenv var, falling back to"dev"— it does NOT read frombuildinfo.GitSHA. Same gotcha that affected the api/worker side in api: switch obsstubs to common packages so ldflag actually patches the live buildinfo api#40; the api repo's fix was to setCOMMIT_IDenv in the deployment. Recommend mirroring that ininfra/k8s/provisioner/deployment.yamlin the next infra PR (out of scope here since this PR is to the provisioner repo, not infra).Container name in
instant-provisionerDeployment:provisioner(single container; gRPC :50051, plus the new internal /healthz :8092 sidecar not yet declared as a containerPort).Test plan
go test ./... -vpasses locally (34 PASS / 0 FAIL / 0 SKIP)curl /healthzreturns commit_id matching the git SHA from the built imagecontainerPort: 8092 name: healthztoinfra/k8s/provisioner/deployment.yamlin a follow-up infra PRCOMMIT_IDenv var to the deployment in a follow-up infra PR so body logs match/healthzCo-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com