Agent Observatory

See what your coding agents actually received.

A local-first macOS observatory for agent instructions, tools, transcript evidence, and verified outbound LLM request facts.

Try It

The intended first-run experience is the native app:

Download Agent-Observatory-0.3.0-macOS.dmg from Releases.
Open the DMG and drag Agent Observatory.app to Applications.
Open Agent Observatory.app from Applications.
Start with the built-in demo feed. No account, proxy, or trust setup is required to see the product surface.
When ready, use the app's onboarding panel to turn on live capture. The app walks through the local engine install, macOS NetworkExtension approval, and login-keychain trust step.

The native app currently targets the macOS 26 preview because it uses the new Liquid Glass SwiftUI surface. Public release artifacts are Developer ID signed, hardened-runtime, notarized, stapled, and checked by make release-qa and make v03-safe-capture-qa before publication.

Enabling live capture

Live capture has two explicit steps: install the local engine, then enable the macOS system extension (NetworkExtension transparent proxy). On first enable, macOS asks you to approve Agent Observatory in System Settings → General → Login Items & Extensions → Network Extensions, then the app trusts the local CA in your login keychain. Approve both, restart any already-running agent shells, and newly launched Claude/Codex sessions are captured normally. (System extensions only activate from /Applications, so install the app there first.)

No Xcode required for the backend-only smoke test. Requires Go 1.26:

cd backend
go run ./cmd/agents monitor --demo

The command starts the localhost API, live SSE stream, local proxy, and sanitized demo data. Open the native app later for the full visual surface.

The Go engine and CLI are separate and portable across Go-supported platforms.

Why This Exists

Coding-agent context is mostly invisible. You usually discover a bad instruction stack, missing tool, stale skill, or provider mismatch only after the agent makes a strange decision.

Agent Observatory turns that hidden state into a product surface:

Before	With Observatory
"The agent ignored my repo rules."	See whether the expected instructions were present.
"Why didn't it use the right tool?"	Compare expected tools against transcript and wire evidence.
"The transcript says one thing, the request may say another."	Detect conflicts between observed transcript facts and verified outbound requests.
"I need to debug this locally, not upload prompts to another service."	Run a local app and localhost engine with derived-fact persistence only.

What It Shows

Surface	What you get
Sessions	Recent local Claude, Codex, and Antigravity sessions from on-disk transcripts.
Expected context	Instructions, skills, and tools resolved for each workspace.
Observed evidence	Facts found in local CLI transcripts.
Verified evidence	Facts captured from outbound LLM requests through the local proxy.
Drift	Expected context that was missing from a complete source.
Conflicts	Transcript and wire evidence disagreeing for the same session/request.
Live feed	A realtime stream of agent requests as they leave the machine.

The Core Idea

Observatory separates what should be present from what was actually observed or verified.

flowchart LR
    A["AGENTS.md<br/>skills<br/>tool registry"] --> B["Expected context"]
    C["Local transcripts"] --> D["Observed facts"]
    E["Local HTTPS proxy"] --> F["Verified facts"]
    B --> G["Fact merge"]
    D --> G
    F --> G
    G --> H["expected / observed / verified<br/>drift / conflict / gap"]
    H --> I["SwiftUI app<br/>CLI<br/>localhost API"]

The backend is the source of truth. The app, CLI, and API all render the same fact-level model.

Install Paths

Native App

The public artifact is a DMG:

open Agent-Observatory-0.3.0-macOS.dmg

Drag Agent Observatory.app to Applications, then open it. The app starts with demo data. Drag-installing the app does not enable live capture; live capture is an explicit second step inside onboarding.

The DMG also has a zip fallback for environments where DMGs are inconvenient. The app bundle contains the agents helper either way.

Live Capture

There are two evidence levels:

Level	Meaning	Setup
Observed	Read from local CLI transcripts.	Passive; no proxy required.
Verified	Captured from outbound LLM requests.	One explicit local install.

Use the onboarding panel to enable live capture after you have read the trust explanation. It installs a local launchd daemon and a stable local CA, trusts that CA in your login keychain, and enables a NetworkExtension transparent proxy that routes only allowlisted LLM-provider flows to the local proxy.

Then use Claude, Codex, and other agents normally. The system extension diverts only provider traffic to Observatory; everything else is untouched. Full body capture is source-aware: supported, trust-ready coding agents are locally TLS-inspected, while unknown or stale-trust tools are tunneled opaquely so they keep working and appear as pass-through coverage events. If source metadata is missing during an upgrade or extension mismatch, the installed daemon also treats that as pass-through. There is no global HTTPS_PROXY hijack. The only env vars the install sets are the additive NODE_EXTRA_CA_CERTS (Node/Claude Code/Gemini CLI) and CODEX_CA_CERTIFICATE (Codex) — because those runtimes don't all read the macOS keychain. Both only add Observatory's CA without replacing the system roots; Bedrock via the AWS Go SDK reads the keychain directly and needs nothing.

No wrapper command in the primary flow. No managed launch. No browser extension. The onboarding panel also exposes the reset command; the equivalent CLI command is:

agents uninstall

Uninstall reverses the setup and is covered by a looped fake-home QA harness.

Optional CLI

Power users can install and inspect live capture from the standalone CLI:

agents install
agents status
agents uninstall

For the app release path, onboarding provides fully-qualified commands that point at the helper bundled inside Agent Observatory.app. A separate CLI install is optional, not required for the GUI path.

Build From Source

Native app requirements:

macOS 26+
Xcode 26+
XcodeGen
Go 1.26+

Build the local release artifacts:

make release
open dist/Agent\ Observatory.app

The app opens with a first-run onboarding surface and starts in Demo mode so the live feed is immediately useful before any proxy or trust setup. The onboarding path lets users explore sample evidence first, then copy the live capture install command when they are ready. Use the menu bar extra to switch Demo/Live mode, reopen onboarding, refresh sessions, show the main window, or quit.

For inner-loop development only:

make app-build
open /tmp/observatory-dd/Build/Products/Debug/Agent\ Observatory.app

Full Local QA

make qa
make v03-safe-capture-qa

Together these run backend build, vet, unit tests, race tests, install lifecycle QA, the macOS app build, and the 0.3 safe-capture policy smoke. The safe-capture QA starts a temporary daemon and proves one supported-source provider request is captured while one unknown provider-bound client is tunneled and reported in /api/coverage.

Trust Model

This is a local app. The engine binds to 127.0.0.1; there is no hosted service and no cloud database.

Yes, verified capture uses a local MITM hop for inspected provider requests. That explicit local interception is what makes HTTPS body inspection possible.

Capture ingress is a macOS NetworkExtension transparent proxy (a signed, user-approved system extension). To inspect by hostname, the extension's kernel rule takes all outbound TCP :443 flows into the (local, user-space) system extension, peeks the TLS ClientHello SNI, and then:

allowlisted provider SNI from a supported, trust-ready source → relays the flow to the local proxy, which terminates TLS and parses the request;
allowlisted provider SNI from an unknown or stale-trust source → relays an opaque tunnel through the local proxy with no TLS termination; or
anything else → immediately direct-relays to the real destination with no TLS termination and nothing persisted — the flow is untouched in every way that matters (its bytes are copied through; we never see plaintext).

So there is no global HTTPS_PROXY/HTTP_PROXY hijack, and provider flows are decrypted only after both the host allowlist and source/trust policy agree. (Loopback/RFC1918 are excluded at the kernel rule; UDP/QUIC is never taken.) For agents to accept the local proxy's certificates on the full-capture path, Observatory installs its CA into your login keychain (per-user, never the System keychain) at the moment you approve the system extension; agents uninstall (and disabling capture) removes that trust.

Honest caveats on trust — runtimes resolve roots differently, so a single keychain root isn't enough:

Claude Code (Node/Bun) doesn't read the macOS keychain by default → the install sets the additive NODE_EXTRA_CA_CERTS.
Codex CLI talks over WebSockets (wss://…/responses), which can't be usefully inspected. Observatory replies 426 Upgrade Required to that upgrade, which Codex maps to an instant HTTP fallback on the same endpoint — and the HTTP request is fully captured. For that HTTP path's trust, the install sets the additive CODEX_CA_CERTIFICATE (Codex's own custom-CA var), since Codex uses rustls/native-tls rather than the macOS keychain.
Bedrock via the AWS Go SDK reads the login keychain directly → no env var.
Gemini CLI is a Node CLI candidate → source attribution plus current NODE_EXTRA_CA_CERTS is required before full capture.

Both env vars only add Observatory's CA; they never replace the system roots, so unrelated HTTPS is unaffected. The install sets no HTTPS_PROXY/ HTTP_PROXY and no root-replacing SSL_CERT_FILE/AWS_CA_BUNDLE: routing is the extension's job. Caveat: env vars only reach processes that actually inherit them. Agent Observatory 0.3 treats missing or stale runtime trust as a bypass condition: the provider request is tunneled opaquely instead of being locally TLS-terminated. If a client still rejects an Observatory leaf, the capture pause circuit breaker passes future provider traffic through.

sequenceDiagram
    participant Agent as Agent process
    participant Proxy as Observatory local proxy
    participant Provider as OpenAI / Anthropic / Bedrock

    Agent->>Proxy: provider flow routed by<br/>NetworkExtension transparent proxy
    Note over Proxy: Extract derived facts:<br/>prompt length, instruction match,<br/>endpoint, tool names
    Proxy->>Provider: Normal upstream TLS<br/>using system roots
    Provider-->>Proxy: Provider response
    Proxy-->>Agent: Response forwarded

Important boundaries:

Observatory's CA is local to the client-to-proxy leg for inspected hosts.
The CA is installed into your per-user login keychain (behind the system extension approval), never the macOS System keychain, and is removed on uninstall.
Only allowlisted provider flows are ever TLS-terminated; non-provider :443 flows are direct-relayed after the SNI peek, so the CA is never exercised against — and the proxy never sees plaintext for — non-provider hosts.
A stable CA certificate and private key are stored under Observatory's local state directory (0600 key, per-user) so the ambient daemon can restart without breaking trust. Any process running as you could read that key while capture is installed — a same-user local risk, removed by agents uninstall.
Upstream provider TLS still uses normal system trust.
In memory, the proxy parses the request body (up to an 8 MiB cap — larger bodies are forwarded unparsed) and keeps a bounded ring (most-recent 500 captures) of the assembled prompt + tool text to drive the live feed and instruction matching. On disk, only derived facts are persisted — prompt length, endpoint, tool names — never raw prompt bodies. Instruction matching is computed against your resolved local instruction files.

Compatibility

Surface	Status	Notes
Claude transcript discovery	Observed	Reads local JSONL transcript context and complete tool catalogs when available.
Codex transcript discovery	Observed	Reads local session JSONL; tool evidence is positive-only when only invoked tools are present.
Antigravity transcript discovery	Partial	Discovers sessions from history; opaque `.pb` conversation bodies are not parsed.
OpenAI chat/responses body shapes	Verified parser coverage	Covered by backend proxy parser tests.
Anthropic Messages body shape	Verified parser coverage	Covered by native Anthropic proxy-path test.
Bedrock Anthropic body shape	Verified parser coverage	Covered by backend proxy parser tests.
Install-once ambient capture	Local QA	Install/status/uninstall are covered by repeated fake-home lifecycle tests.
Source-aware provider capture	Local live-provider QA	`make v03-safe-capture-qa` proves supported-source capture plus unknown-source pass-through.

Commands

Command	Purpose
`agents install`	Install ambient capture: proxy daemon, stable local CA, and additive per-runtime CA env (`NODE_EXTRA_CA_CERTS`, `CODEX_CA_CERTIFICATE`).
`agents trust install`	Trust the local CA in your login keychain (run behind the approved extension).
`agents status`	Show installed, partial, or absent setup state.
`agents uninstall`	Fully reverse the install.
`agents serve`	Run the localhost JSON API only (default subcommand; no proxy).
`agents monitor --demo`	Run the API, SSE stream, proxy, and sample feed.
`agents sessions --limit 20`	Print recent sessions and evidence marks.
`agents context explain /path/to/project`	Show resolved context for a workspace.
`agents doctor wire`	Report verified-capture capability per runtime.
`/api/coverage`	Local daemon endpoint with capture/bypass counts and recent pass-through reasons.

Release Artifacts

make release

Artifacts are written to dist/:

Agent-Observatory-0.3.0-macOS.dmg
Agent-Observatory-0.3.0-macOS.zip
Agent Observatory.app
agents
SHA256SUMS

make release is intentionally headless: it builds the signed app, zip, DMG, and checksums without running Finder AppleScript, so it does not steal GUI focus during local QA. The DMG is functional but plain by default. If you explicitly want Finder window background/icon positioning, run:

DMG_STYLE=polished make release-polished

That polished path uses Finder layout metadata and can bring a Finder window to the foreground.

For public distribution, notarize and staple the app and DMG after make release:

NOTARY_PROFILE=<notarytool-keychain-profile> make notarize
# or:
APP_STORE_CONNECT_KEY_ID=<key-id> \
APP_STORE_CONNECT_API_KEY_P8=/path/to/AuthKey_<key-id>.p8 \
APP_STORE_CONNECT_ISSUER_ID=<issuer-uuid> \
make notarize
# or:
MACOS_NOTARY_APPLE_ID=<apple-id> \
MACOS_NOTARY_APP_PASSWORD=<app-specific-password> \
MACOS_NOTARY_TEAM_ID=<team-id> \
make notarize
make v03-safe-capture-qa
make release-qa

There is also a manual GitHub Actions release workflow, macOS Release, for building, notarizing, and optionally staging a draft release on a macOS runner once the repository release secrets in docs/release-publication-runbook.md are configured.

The older v0.2 finalize helpers remain in scripts/ for the published v0.2 release. They are not 0.3 launch gates.

The DMG is the primary user-facing artifact. It contains Agent Observatory.app and an Applications symlink for the normal macOS drag-install flow. The zip is a fallback. A build is publication-ready only after make v03-safe-capture-qa passes and make release-qa passes against the stapled app and DMG.

Security notes for the local CA, prompt-data handling, and vulnerability reports are in SECURITY.md.

Current Limitations

Capture mechanism:

HTTP/3 (QUIC) is not captured. The extension takes TCP :443 only; provider HTTP/3 falls back to TCP in practice, but a QUIC-only client would be missed.
ECH / no-SNI flows fail open. If a provider enables Encrypted ClientHello, the SNI is unreadable and the flow is direct-relayed (not captured), never broken.
Inspected hosts are proxied over HTTP/1.1. Mainstream provider SDKs accept this; a hypothetical HTTP/2-only client is unsupported on the inspected path.
The SNI peek reassembles across TCP segments but assumes a single-record ClientHello; an unusually large multi-record hello fails open (not captured).

Per-runtime CA trust (NODE_EXTRA_CA_CERTS, CODEX_CA_CERTIFICATE):

It's additive trust, not routing — and it only helps newly launched processes that inherit the env. In 0.3, provider flows from supported runtimes with missing or stale trust are tunneled opaquely instead of full-captured. That is a missed capture, not a broken agent.
Node/Bun (Claude Code): a client that passes an explicit ca: option, sanitizes its env, or embeds its own runtime won't pick up NODE_EXTRA_CA_CERTS. Bun honors only its own CA store for some operations.
Hatch/OpenCode: Hatch's MCP path keeps a persistent opencode serve process. If that server started before Observatory trust existed, restart Hatch MCP/OpenCode before enabling capture. agents doctor wire flags running Hatch OpenCode servers with missing or stale trust.
Codex CLI: its primary wss://…/responses transport can't be inspected, so the proxy replies 426 and Codex falls back to its HTTP endpoint (which is captured). HTTP-path trust uses the additive CODEX_CA_CERTIFICATE. Other provider WebSockets with no HTTP fallback (e.g. OpenAI Realtime /v1/realtime) are relayed untouched, not captured — so they keep working.
Gemini CLI: generativelanguage.googleapis.com generateContent requests are parsed when the source is attributable to Gemini's Node process with current NODE_EXTRA_CA_CERTS; otherwise they are tunneled.
Bedrock (AWS Go SDK): reads the login keychain directly, needs no env.
These vars add a trusted root for inheriting processes; removed by agents uninstall.

Scope:

macOS 26 and Xcode 26 are required for the native Liquid Glass app.
Verified capture requires the system-extension approval + login-keychain trust.
Antigravity transcript contents are discovery-only when stored in opaque .pb files.
This release observes context. It does not yet manage canonical context upstream for every agent runtime.

Roadmap

Next	Why it matters
Short demo clip	Helps people understand the live feed before cloning.
Broader runtime notes	Clarifies install-once capture behavior across agent stacks.
HTTP/3 + multi-record SNI capture	Closes the remaining capture-coverage gaps.
Canonical context management	Turns the observatory into the control plane after observability proves demand.

Development

make backend-qa
make v03-safe-capture-qa
make app-build
make release
make release-layout-qa
NOTARY_PROFILE=<notarytool-keychain-profile> make notarize
# or:
APP_STORE_CONNECT_KEY_ID=<key-id> \
APP_STORE_CONNECT_API_KEY_P8=/path/to/AuthKey_<key-id>.p8 \
APP_STORE_CONNECT_ISSUER_ID=<issuer-uuid> \
make notarize
# or:
MACOS_NOTARY_APPLE_ID=<apple-id> \
MACOS_NOTARY_APP_PASSWORD=<app-specific-password> \
MACOS_NOTARY_TEAM_ID=<team-id> \
make notarize
make v03-safe-capture-qa
make release-qa

Detailed release and planning notes live under docs/, including docs/v0.3-safe-capture-spec.md, docs/v0.3-launch-readiness.md, docs/ne-reset-runbook.md, docs/release-publication-runbook.md, and docs/release-v0.3-draft.md. Historical v0.2 records remain in docs/v0.2-readiness.md and docs/release-v0.2-draft.md.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
app		app
backend		backend
docs		docs
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
GOAL.md		GOAL.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Observatory

Try It

Enabling live capture

Why This Exists

What It Shows

The Core Idea

Install Paths

Native App

Live Capture

Optional CLI

Build From Source

Full Local QA

Trust Model

Compatibility

Commands

Release Artifacts

Current Limitations

Roadmap

Development

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Observatory

Try It

Enabling live capture

Why This Exists

What It Shows

The Core Idea

Install Paths

Native App

Live Capture

Optional CLI

Build From Source

Full Local QA

Trust Model

Compatibility

Commands

Release Artifacts

Current Limitations

Roadmap

Development

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages