Add built-in observability with structured logging and tracing by chinmaymk · Pull Request #12 · chinmaymk/ra

chinmaymk · 2026-03-08T17:10:03Z

Adds a production-grade observability system that captures all agent
actions as structured JSON logs with trace/span correlation IDs.
Enabled by default at info level on stderr.

Structured logger with levels (debug/info/warn/error), output to
stderr/stdout/file, and JSON-formatted log lines
Span-based tracer with nested spans for loop, iteration, model call,
and tool execution phases
Full instrumentation of the agent loop: loop lifecycle, model
responses (with token usage and response preview), tool execution
(with input/output previews and timing), and errors
Config support via ra.config.json and RA_OBSERVABILITY_* env vars
Logger/tracer threaded through CLI, REPL, HTTP, and MCP interfaces
17 new tests covering logger, tracer, and factory
Documentation with visualization guides for jq, Grafana+Loki,
Jaeger, ELK, and OpenTelemetry Collector

https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

Implement production-grade observability as middleware — no changes to the agent loop, tools, or interfaces. Observability hooks into ra's existing 9 lifecycle points via createObservabilityMiddleware(). New files: - src/observability/logger.ts — Structured JSON logger with levels - src/observability/tracer.ts — Span-based tracer with timing - src/observability/middleware.ts — All 9 hooks in one place - src/observability/index.ts — Factory with split log/trace config - docs/observability.md — Config, log reference, visualization guides Existing files touched: - src/config/ — Add ObservabilityConfig type and defaults - src/index.ts — Create and wire observability middleware Enabled by default. Logs to stderr. Configure via config file or RA_LOG_LEVEL, RA_LOG_OUTPUT, RA_TRACE_OUTPUT env vars. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

The merge helper was a one-off function in index.ts with type casts. Now it lives alongside runMiddlewareChain, is properly typed, and accepts any number of middleware configs via rest params. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

Same pattern as resolver and memory middleware: just prepend into the existing middleware object. No separate merge step needed. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

- Drop no-op onStreamChunk hook from obs middleware - Add error.stack to onError log for debuggability - Add onCompact callback to CompactionConfig so compaction events are logged - Clean up middleware: middleware → shorthand in index.ts - Add Observability section to README documenting logs, traces, and config https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

- Replace stale mergeMiddleware reference with prepend-to-chain approach - Add context compacted log event to reference table - Add stack field to agent loop failed event - Note onCompact callback pattern for compaction logging https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

Bugs fixed: - onError only closed loopSpan, leaving iterationSpan, modelSpan, and toolSpans orphaned in the tracer's activeSpans Map. Added drainOpenSpans() to end all child spans before closing the root. - Middleware reuse across loop runs could leak stale spans from a crashed run. beforeLoopBegin now drains any leftover state on entry. - error.stack was logged but not traced — added to the loopSpan error attributes for consistency. - All span variables are now nullable (Span | undefined) with guards, preventing endSpan calls on uninitialized spans. Tests added: - Error path: verifies all 3 span types are emitted with error status and stack trace is present in both log and trace output - Reuse safety: same middleware instance across multiple successful runs - Crash recovery: successful run after a failed run with same middleware - onCompact callback: called with correct info, not called when skipped or when summarization fails Also added Observability link to README nav header. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

Startup/shutdown logs: - custom middleware loaded (info, hookCount) - session storage initialized (debug, path) - resuming session (info, sessionId, messageCount) - shutting down (info) Test additions: - Tool execution failure: verifies error log + error span status - Remove unused firstRunOutput variable from reuse test Docs: - README: add toolCallId to tool execution complete/failed (was inconsistent with executing tool row), add startup event summary with link to full reference - docs/observability.md: add all 4 new log events to reference table https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

tsc with emitDeclarationOnly cannot resolve .ts extension imports on value exports (type-only exports are erased and fine). Remove the extensions so build:types passes. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

@ts-expect-error

- Add parameter signatures to NoopLogger overrides to match base class - Add parameter signatures to NoopTracer overrides to match base class - Remove stale @ts-expect-error (Tracer constructor already accepts null) https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

- Add onCompact callback to compaction type in src/config/types.ts - Use type assertion in middleware merge loop to avoid union discrimination issue https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

- Merge origin/main (subagent tool feature) - Add subagent-specific observability: log task count, per-task status, and aggregate token usage in beforeToolExecution/afterToolExecution - Fix HTTP integration test flake: keep draining stderr pipe after port detection so observability log writes don't block/crash the server https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

chinmaymk force-pushed the claude/add-observability-logging-rSZQn branch from a4b6b79 to b4e71db Compare March 8, 2026 22:08

chinmaymk force-pushed the claude/add-observability-logging-rSZQn branch from b4e71db to a244e4f Compare March 8, 2026 22:15

claude added 10 commits March 8, 2026 22:20

Remove mergeMiddleware — prepend obs hooks directly into the chain

a98c361

Same pattern as resolver and memory middleware: just prepend into the existing middleware object. No separate merge step needed. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

Fix tsc errors: add onCompact to config type, fix middleware merge types

8d7227c

- Add onCompact callback to compaction type in src/config/types.ts - Use type assertion in middleware merge loop to avoid union discrimination issue https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx

chinmaymk merged commit de67852 into main Mar 10, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add built-in observability with structured logging and tracing#12

Add built-in observability with structured logging and tracing#12
chinmaymk merged 11 commits intomainfrom
claude/add-observability-logging-rSZQn

chinmaymk commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chinmaymk commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants