See what your coding agents actually received.
A local-first macOS observatory for agent instructions, tools, transcript evidence, and verified outbound LLM request facts.
The intended first-run experience is the native app:
- Download
Agent-Observatory-0.3.0-macOS.dmgfrom Releases. - Open the DMG and drag Agent Observatory.app to Applications.
- Open Agent Observatory.app from Applications.
- Start with the built-in demo feed. No account, proxy, or trust setup is required to see the product surface.
- When ready, use the app's onboarding panel to turn on live capture. The app walks through the local engine install, macOS NetworkExtension approval, and login-keychain trust step.
The native app currently targets the macOS 26 preview because it uses the new
Liquid Glass SwiftUI surface. Public release artifacts are Developer ID signed,
hardened-runtime, notarized, stapled, and checked by make release-qa and
make v03-safe-capture-qa before publication.
Live capture has two explicit steps: install the local engine, then enable the macOS system extension (NetworkExtension transparent proxy). On first enable, macOS asks you to approve Agent Observatory in System Settings → General → Login Items & Extensions → Network Extensions, then the app trusts the local CA in your login keychain. Approve both, restart any already-running agent shells, and newly launched Claude/Codex sessions are captured normally. (System extensions only activate from /Applications, so install the app there first.)
No Xcode required for the backend-only smoke test. Requires Go 1.26:
cd backend
go run ./cmd/agents monitor --demoThe command starts the localhost API, live SSE stream, local proxy, and sanitized demo data. Open the native app later for the full visual surface.
The Go engine and CLI are separate and portable across Go-supported platforms.
Coding-agent context is mostly invisible. You usually discover a bad instruction stack, missing tool, stale skill, or provider mismatch only after the agent makes a strange decision.
Agent Observatory turns that hidden state into a product surface:
| Before | With Observatory |
|---|---|
| "The agent ignored my repo rules." | See whether the expected instructions were present. |
| "Why didn't it use the right tool?" | Compare expected tools against transcript and wire evidence. |
| "The transcript says one thing, the request may say another." | Detect conflicts between observed transcript facts and verified outbound requests. |
| "I need to debug this locally, not upload prompts to another service." | Run a local app and localhost engine with derived-fact persistence only. |
| Surface | What you get |
|---|---|
| Sessions | Recent local Claude, Codex, and Antigravity sessions from on-disk transcripts. |
| Expected context | Instructions, skills, and tools resolved for each workspace. |
| Observed evidence | Facts found in local CLI transcripts. |
| Verified evidence | Facts captured from outbound LLM requests through the local proxy. |
| Drift | Expected context that was missing from a complete source. |
| Conflicts | Transcript and wire evidence disagreeing for the same session/request. |
| Live feed | A realtime stream of agent requests as they leave the machine. |
Observatory separates what should be present from what was actually observed or verified.
flowchart LR
A["AGENTS.md<br/>skills<br/>tool registry"] --> B["Expected context"]
C["Local transcripts"] --> D["Observed facts"]
E["Local HTTPS proxy"] --> F["Verified facts"]
B --> G["Fact merge"]
D --> G
F --> G
G --> H["expected / observed / verified<br/>drift / conflict / gap"]
H --> I["SwiftUI app<br/>CLI<br/>localhost API"]
The backend is the source of truth. The app, CLI, and API all render the same fact-level model.
The public artifact is a DMG:
open Agent-Observatory-0.3.0-macOS.dmgDrag Agent Observatory.app to Applications, then open it. The app starts with demo data. Drag-installing the app does not enable live capture; live capture is an explicit second step inside onboarding.
The DMG also has a zip fallback for environments where DMGs are inconvenient.
The app bundle contains the agents helper either way.
There are two evidence levels:
| Level | Meaning | Setup |
|---|---|---|
| Observed | Read from local CLI transcripts. | Passive; no proxy required. |
| Verified | Captured from outbound LLM requests. | One explicit local install. |
Use the onboarding panel to enable live capture after you have read the trust explanation. It installs a local launchd daemon and a stable local CA, trusts that CA in your login keychain, and enables a NetworkExtension transparent proxy that routes only allowlisted LLM-provider flows to the local proxy.
Then use Claude, Codex, and other agents normally. The system extension diverts
only provider traffic to Observatory; everything else is untouched. Full body
capture is source-aware: supported, trust-ready coding agents are locally
TLS-inspected, while unknown or stale-trust tools are tunneled opaquely so they
keep working and appear as pass-through coverage events. If source metadata is
missing during an upgrade or extension mismatch, the installed daemon also
treats that as pass-through. There is no global
HTTPS_PROXY hijack. The only env vars the install sets are the additive
NODE_EXTRA_CA_CERTS (Node/Claude Code/Gemini CLI) and
CODEX_CA_CERTIFICATE (Codex) — because those runtimes don't all read the macOS
keychain. Both only add Observatory's CA without replacing the system roots;
Bedrock via the AWS Go SDK reads the keychain directly and needs nothing.
No wrapper command in the primary flow. No managed launch. No browser extension. The onboarding panel also exposes the reset command; the equivalent CLI command is:
agents uninstallUninstall reverses the setup and is covered by a looped fake-home QA harness.
Power users can install and inspect live capture from the standalone CLI:
agents install
agents status
agents uninstallFor the app release path, onboarding provides fully-qualified commands that point at the helper bundled inside Agent Observatory.app. A separate CLI install is optional, not required for the GUI path.
Native app requirements:
- macOS 26+
- Xcode 26+
- XcodeGen
- Go 1.26+
Build the local release artifacts:
make release
open dist/Agent\ Observatory.appThe app opens with a first-run onboarding surface and starts in Demo mode so the live feed is immediately useful before any proxy or trust setup. The onboarding path lets users explore sample evidence first, then copy the live capture install command when they are ready. Use the menu bar extra to switch Demo/Live mode, reopen onboarding, refresh sessions, show the main window, or quit.
For inner-loop development only:
make app-build
open /tmp/observatory-dd/Build/Products/Debug/Agent\ Observatory.appmake qa
make v03-safe-capture-qaTogether these run backend build, vet, unit tests, race tests, install lifecycle
QA, the macOS app build, and the 0.3 safe-capture policy smoke. The
safe-capture QA starts a temporary daemon and proves one supported-source
provider request is captured while one unknown provider-bound client is tunneled
and reported in /api/coverage.
This is a local app. The engine binds to 127.0.0.1; there is no hosted service
and no cloud database.
Yes, verified capture uses a local MITM hop for inspected provider requests. That explicit local interception is what makes HTTPS body inspection possible.
Capture ingress is a macOS NetworkExtension transparent proxy (a signed, user-approved system extension). To inspect by hostname, the extension's kernel rule takes all outbound TCP :443 flows into the (local, user-space) system extension, peeks the TLS ClientHello SNI, and then:
- allowlisted provider SNI from a supported, trust-ready source → relays the flow to the local proxy, which terminates TLS and parses the request;
- allowlisted provider SNI from an unknown or stale-trust source → relays an opaque tunnel through the local proxy with no TLS termination; or
- anything else → immediately direct-relays to the real destination with no TLS termination and nothing persisted — the flow is untouched in every way that matters (its bytes are copied through; we never see plaintext).
So there is no global HTTPS_PROXY/HTTP_PROXY hijack, and provider flows are
decrypted only after both the host allowlist and source/trust policy agree.
(Loopback/RFC1918 are excluded at the kernel rule; UDP/QUIC is never taken.) For
agents to accept the local proxy's certificates on the full-capture path, Observatory
installs its CA into your login keychain (per-user, never the System
keychain) at the moment you approve the system extension; agents uninstall (and
disabling capture) removes that trust.
Honest caveats on trust — runtimes resolve roots differently, so a single keychain root isn't enough:
- Claude Code (Node/Bun) doesn't read the macOS keychain by default → the
install sets the additive
NODE_EXTRA_CA_CERTS. - Codex CLI talks over WebSockets (
wss://…/responses), which can't be usefully inspected. Observatory replies426 Upgrade Requiredto that upgrade, which Codex maps to an instant HTTP fallback on the same endpoint — and the HTTP request is fully captured. For that HTTP path's trust, the install sets the additiveCODEX_CA_CERTIFICATE(Codex's own custom-CA var), since Codex uses rustls/native-tls rather than the macOS keychain. - Bedrock via the AWS Go SDK reads the login keychain directly → no env var.
- Gemini CLI is a Node CLI candidate → source attribution plus current
NODE_EXTRA_CA_CERTSis required before full capture.
Both env vars only add Observatory's CA; they never replace the system roots,
so unrelated HTTPS is unaffected. The install sets no HTTPS_PROXY/
HTTP_PROXY and no root-replacing SSL_CERT_FILE/AWS_CA_BUNDLE: routing is
the extension's job. Caveat: env vars only reach processes that actually inherit
them. Agent Observatory 0.3 treats missing or stale runtime trust as a bypass
condition: the provider request is tunneled opaquely instead of being locally
TLS-terminated. If a client still rejects an Observatory leaf, the capture pause
circuit breaker passes future provider traffic through.
sequenceDiagram
participant Agent as Agent process
participant Proxy as Observatory local proxy
participant Provider as OpenAI / Anthropic / Bedrock
Agent->>Proxy: provider flow routed by<br/>NetworkExtension transparent proxy
Note over Proxy: Extract derived facts:<br/>prompt length, instruction match,<br/>endpoint, tool names
Proxy->>Provider: Normal upstream TLS<br/>using system roots
Provider-->>Proxy: Provider response
Proxy-->>Agent: Response forwarded
Important boundaries:
- Observatory's CA is local to the client-to-proxy leg for inspected hosts.
- The CA is installed into your per-user login keychain (behind the system extension approval), never the macOS System keychain, and is removed on uninstall.
- Only allowlisted provider flows are ever TLS-terminated; non-provider :443 flows are direct-relayed after the SNI peek, so the CA is never exercised against — and the proxy never sees plaintext for — non-provider hosts.
- A stable CA certificate and private key are stored under Observatory's local
state directory (
0600key, per-user) so the ambient daemon can restart without breaking trust. Any process running as you could read that key while capture is installed — a same-user local risk, removed byagents uninstall. - Upstream provider TLS still uses normal system trust.
- In memory, the proxy parses the request body (up to an 8 MiB cap — larger bodies are forwarded unparsed) and keeps a bounded ring (most-recent 500 captures) of the assembled prompt + tool text to drive the live feed and instruction matching. On disk, only derived facts are persisted — prompt length, endpoint, tool names — never raw prompt bodies. Instruction matching is computed against your resolved local instruction files.
| Surface | Status | Notes |
|---|---|---|
| Claude transcript discovery | Observed | Reads local JSONL transcript context and complete tool catalogs when available. |
| Codex transcript discovery | Observed | Reads local session JSONL; tool evidence is positive-only when only invoked tools are present. |
| Antigravity transcript discovery | Partial | Discovers sessions from history; opaque .pb conversation bodies are not parsed. |
| OpenAI chat/responses body shapes | Verified parser coverage | Covered by backend proxy parser tests. |
| Anthropic Messages body shape | Verified parser coverage | Covered by native Anthropic proxy-path test. |
| Bedrock Anthropic body shape | Verified parser coverage | Covered by backend proxy parser tests. |
| Install-once ambient capture | Local QA | Install/status/uninstall are covered by repeated fake-home lifecycle tests. |
| Source-aware provider capture | Local live-provider QA | make v03-safe-capture-qa proves supported-source capture plus unknown-source pass-through. |
| Command | Purpose |
|---|---|
agents install |
Install ambient capture: proxy daemon, stable local CA, and additive per-runtime CA env (NODE_EXTRA_CA_CERTS, CODEX_CA_CERTIFICATE). |
agents trust install |
Trust the local CA in your login keychain (run behind the approved extension). |
agents status |
Show installed, partial, or absent setup state. |
agents uninstall |
Fully reverse the install. |
agents serve |
Run the localhost JSON API only (default subcommand; no proxy). |
agents monitor --demo |
Run the API, SSE stream, proxy, and sample feed. |
agents sessions --limit 20 |
Print recent sessions and evidence marks. |
agents context explain /path/to/project |
Show resolved context for a workspace. |
agents doctor wire |
Report verified-capture capability per runtime. |
/api/coverage |
Local daemon endpoint with capture/bypass counts and recent pass-through reasons. |
make releaseArtifacts are written to dist/:
Agent-Observatory-0.3.0-macOS.dmgAgent-Observatory-0.3.0-macOS.zipAgent Observatory.appagentsSHA256SUMS
make release is intentionally headless: it builds the signed app, zip,
DMG, and checksums without running Finder AppleScript, so it does not steal GUI
focus during local QA. The DMG is functional but plain by default. If you
explicitly want Finder window background/icon positioning, run:
DMG_STYLE=polished make release-polishedThat polished path uses Finder layout metadata and can bring a Finder window to the foreground.
For public distribution, notarize and staple the app and DMG after make release:
NOTARY_PROFILE=<notarytool-keychain-profile> make notarize
# or:
APP_STORE_CONNECT_KEY_ID=<key-id> \
APP_STORE_CONNECT_API_KEY_P8=/path/to/AuthKey_<key-id>.p8 \
APP_STORE_CONNECT_ISSUER_ID=<issuer-uuid> \
make notarize
# or:
MACOS_NOTARY_APPLE_ID=<apple-id> \
MACOS_NOTARY_APP_PASSWORD=<app-specific-password> \
MACOS_NOTARY_TEAM_ID=<team-id> \
make notarize
make v03-safe-capture-qa
make release-qaThere is also a manual GitHub Actions release workflow, macOS Release, for
building, notarizing, and optionally staging a draft release on a macOS runner
once the repository release secrets in docs/release-publication-runbook.md are
configured.
The older v0.2 finalize helpers remain in scripts/ for the published v0.2
release. They are not 0.3 launch gates.
The DMG is the primary user-facing artifact. It contains
Agent Observatory.app and an Applications symlink for the normal macOS
drag-install flow. The zip is a fallback. A build is publication-ready only
after make v03-safe-capture-qa passes and make release-qa passes against the
stapled app and DMG.
Security notes for the local CA, prompt-data handling, and vulnerability reports are in SECURITY.md.
Capture mechanism:
- HTTP/3 (QUIC) is not captured. The extension takes TCP :443 only; provider HTTP/3 falls back to TCP in practice, but a QUIC-only client would be missed.
- ECH / no-SNI flows fail open. If a provider enables Encrypted ClientHello, the SNI is unreadable and the flow is direct-relayed (not captured), never broken.
- Inspected hosts are proxied over HTTP/1.1. Mainstream provider SDKs accept this; a hypothetical HTTP/2-only client is unsupported on the inspected path.
- The SNI peek reassembles across TCP segments but assumes a single-record ClientHello; an unusually large multi-record hello fails open (not captured).
Per-runtime CA trust (NODE_EXTRA_CA_CERTS, CODEX_CA_CERTIFICATE):
- It's additive trust, not routing — and it only helps newly launched processes that inherit the env. In 0.3, provider flows from supported runtimes with missing or stale trust are tunneled opaquely instead of full-captured. That is a missed capture, not a broken agent.
- Node/Bun (Claude Code): a client that passes an explicit
ca:option, sanitizes its env, or embeds its own runtime won't pick upNODE_EXTRA_CA_CERTS. Bun honors only its own CA store for some operations. - Hatch/OpenCode: Hatch's MCP path keeps a persistent
opencode serveprocess. If that server started before Observatory trust existed, restart Hatch MCP/OpenCode before enabling capture.agents doctor wireflags running Hatch OpenCode servers with missing or stale trust. - Codex CLI: its primary
wss://…/responsestransport can't be inspected, so the proxy replies426and Codex falls back to its HTTP endpoint (which is captured). HTTP-path trust uses the additiveCODEX_CA_CERTIFICATE. Other provider WebSockets with no HTTP fallback (e.g. OpenAI Realtime/v1/realtime) are relayed untouched, not captured — so they keep working. - Gemini CLI: generativelanguage.googleapis.com
generateContentrequests are parsed when the source is attributable to Gemini's Node process with currentNODE_EXTRA_CA_CERTS; otherwise they are tunneled. - Bedrock (AWS Go SDK): reads the login keychain directly, needs no env.
- These vars add a trusted root for inheriting processes; removed by
agents uninstall.
Scope:
- macOS 26 and Xcode 26 are required for the native Liquid Glass app.
- Verified capture requires the system-extension approval + login-keychain trust.
- Antigravity transcript contents are discovery-only when stored in opaque
.pbfiles. - This release observes context. It does not yet manage canonical context upstream for every agent runtime.
| Next | Why it matters |
|---|---|
| Short demo clip | Helps people understand the live feed before cloning. |
| Broader runtime notes | Clarifies install-once capture behavior across agent stacks. |
| HTTP/3 + multi-record SNI capture | Closes the remaining capture-coverage gaps. |
| Canonical context management | Turns the observatory into the control plane after observability proves demand. |
make backend-qa
make v03-safe-capture-qa
make app-build
make release
make release-layout-qa
NOTARY_PROFILE=<notarytool-keychain-profile> make notarize
# or:
APP_STORE_CONNECT_KEY_ID=<key-id> \
APP_STORE_CONNECT_API_KEY_P8=/path/to/AuthKey_<key-id>.p8 \
APP_STORE_CONNECT_ISSUER_ID=<issuer-uuid> \
make notarize
# or:
MACOS_NOTARY_APPLE_ID=<apple-id> \
MACOS_NOTARY_APP_PASSWORD=<app-specific-password> \
MACOS_NOTARY_TEAM_ID=<team-id> \
make notarize
make v03-safe-capture-qa
make release-qaDetailed release and planning notes live under docs/, including
docs/v0.3-safe-capture-spec.md,
docs/v0.3-launch-readiness.md,
docs/ne-reset-runbook.md,
docs/release-publication-runbook.md,
and docs/release-v0.3-draft.md. Historical
v0.2 records remain in docs/v0.2-readiness.md and
docs/release-v0.2-draft.md.
