Skip to content

obs: provisioner service observability — slog default, NR agent, gRPC interceptor, healthz sidecar (track 5/8)#38

Merged
mastermanas805 merged 1 commit into
masterfrom
obs/obs-3-provisioner-fresh
May 12, 2026
Merged

obs: provisioner service observability — slog default, NR agent, gRPC interceptor, healthz sidecar (track 5/8)#38
mastermanas805 merged 1 commit into
masterfrom
obs/obs-3-provisioner-fresh

Conversation

@mastermanas805
Copy link
Copy Markdown
Member

Summary

Track 5 of 8 in the 2026-05-12 observability rollout. Stages observability wiring for the instant.dev/provisioner gRPC service as a self-contained provisioner/ subdir.

  • provisioner/main.go — slog default with logctx enrichment, NR Go agent fail-open init, gRPC server with nrgrpc.UnaryServerInterceptor composed with a trace-id injector, HTTP sidecar on :8092 serving /healthz.
  • provisioner/internal/server/healthz.go{ok, service, commit_id, build_time, version} handler.
  • provisioner/internal/_obs_stubs/{buildinfo,logctx}/ — minimal vendored stubs of tracks 1 and 2 (deleted after those land and we switch to instant.dev/common/...).
  • provisioner/go.mod — separate module so NR deps don't pollute the api binary.

Scope caveat (read before merging)

The 8 obs worktrees were spawned from the api repo, but the real provisioner is its own repo (InstaNode-dev/provisioner). The prompted file paths (provisioner/main.go, etc.) assumed a monorepo layout that doesn't exist here. Rather than reach into the sibling provisioner repo from this worktree (which would break filesystem isolation between the 8 parallel agents), this PR ships the changes as a staging subdir.

Follow-up: copy the four files verbatim into the real provisioner repo. The module path already lines up — that repo declares module instant.dev/provisioner. Stub imports flip from _obs_stubs/... to instant.dev/common/... once tracks 1+2 merge.

Foundation confirmed

OTel W3C TraceContext propagation across gRPC is already wired:

  • Client side (api): api/internal/provisioner/client.go uses grpc.WithStatsHandler(otelgrpc.NewClientHandler()).
  • Server side (real provisioner main.go): grpc.StatsHandler(otelgrpc.NewServerHandler()) already registered.

NR's distributed tracer reads the same W3C TraceContext OTel propagates, so NR spans share parent IDs with OTel spans. The trace-id stamping logic in composeTraceIDInjector builds on that.

Test plan

$ cd provisioner && go test ./... -count=1 -v

14 cases, all green:

  • TestStampTraceIDFromNR — load-bearing: NR trace ID propagates to logctx.WithTraceID on ctx
  • TestStampTraceIDFromNR_NoTxn — safe no-op when ctx has no NR txn
  • TestComposeTraceIDInjectorRunsInner — chains inner interceptor + handler exactly once
  • TestComposeTraceIDInjectorPropagatesNRTraceID — end-to-end: synthetic nrgrpc → handler sees trace_id
  • TestInitNewRelicFailOpenOnEmptyKey — empty NEW_RELIC_LICENSE_KEY returns nil app, no panic
  • TestInitNewRelicFailOpenOnInvalidKey — short license key returns nil app, no panic
  • TestHealthzReturnsCommitID/healthz JSON shape pinned
  • TestHealthzHandler_ResponseShape — required keys present
  • TestHealthzHandler_AcceptsAnyMethod — GET/HEAD/POST all 200
  • TestHealthzPortNoCollisionWithGRPC:8092:50051
  • TestLogctxWithTraceIDRoundTrip — ctx setter/getter contract incl. empty-string no-op
  • TestNewGRPCServerWithNilNRApp — fail-open path doesn't panic
  • TestEnvAppNameOverrideNEW_RELIC_APP_NAME env wired
  • TestProcessSmoke — basic plumbing

Docker build + k8s deploy skipped per brief (track 6 owns k8s wiring; track 4 owns Dockerfiles).

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Stages observability wiring for the instant.dev/provisioner gRPC service
as a self-contained subdir (provisioner/) so the same files can be copied
into the real provisioner repo (github.com/InstaNode-dev/provisioner) in
a follow-up PR.

Why scaffolded here. The 8-track observability rollout dispatched one
parallel agent per track from per-track worktrees of the api repo. The
provisioner is its own repo (not a subdir of api), so the prompted file
paths (provisioner/main.go etc.) don't map to a real path in this
worktree. Rather than touch the sibling provisioner repo from an
api-configured worktree (which would violate filesystem isolation between
parallel agents), this PR ships the changes as a clearly-marked staging
subdir.

What's in:
- provisioner/main.go — sets slog default (the current main.go has no
  default — the only service with this inconsistency); initialises the
  New Relic Go agent fail-open on empty/invalid license key; constructs
  the gRPC server with nrgrpc.UnaryServerInterceptor composed with a
  trace-id injector that stamps the propagated W3C trace ID onto ctx via
  logctx so downstream slog calls log with trace_id; starts an HTTP
  sidecar on :8092 serving GET /healthz (port chosen to not collide with
  gRPC 50051 or any other in-cluster port).
- provisioner/internal/server/healthz.go — tiny http.Handler returning
  {ok, service, commit_id, build_time, version}.
- provisioner/internal/_obs_stubs/buildinfo/ — minimal stub of what will
  become instant.dev/common/buildinfo once track 1 merges (-ldflags
  injects GitSHA/BuildTime/Version at link time).
- provisioner/internal/_obs_stubs/logctx/ — minimal stub of what will
  become instant.dev/common/logctx (slog handler that stamps service +
  commit_id + trace_id on every record).
- provisioner/go.mod — separate Go module so the api binary doesn't pull
  in NR deps, and so the files copy verbatim into the real provisioner
  repo which already declares `module instant.dev/provisioner`.

Tests (14 cases, all green):
- gRPC interceptor propagates NR trace_id to handler ctx via logctx
  (load-bearing — TestComposeTraceIDInjectorPropagatesNRTraceID)
- NR agent init fail-open on empty/invalid license key
- /healthz returns commit_id, build_time, version, service
- /healthz handler accepts GET/HEAD/POST without 405
- healthz port (8092) does not collide with gRPC port (50051)
- logctx round-trip for trace_id with empty-string no-op semantics
- newGRPCServer constructor handles nil NR app (fail-open)

OTel trace-context propagation across gRPC is already wired on both
ends: api side uses otelgrpc.NewClientHandler() in
internal/provisioner/client.go, and the real provisioner main.go already
registers otelgrpc.NewServerHandler(). NR's distributed tracer reads the
W3C TraceContext that OTel propagates, so the NR span IDs and OTel span
IDs share a parent — this is the foundation the trace-id-stamping logic
builds on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant