Add built-in observability with structured logging and tracing#12
Merged
Add built-in observability with structured logging and tracing#12
Conversation
a4b6b79 to
b4e71db
Compare
Implement production-grade observability as middleware — no changes to the agent loop, tools, or interfaces. Observability hooks into ra's existing 9 lifecycle points via createObservabilityMiddleware(). New files: - src/observability/logger.ts — Structured JSON logger with levels - src/observability/tracer.ts — Span-based tracer with timing - src/observability/middleware.ts — All 9 hooks in one place - src/observability/index.ts — Factory with split log/trace config - docs/observability.md — Config, log reference, visualization guides Existing files touched: - src/config/ — Add ObservabilityConfig type and defaults - src/index.ts — Create and wire observability middleware Enabled by default. Logs to stderr. Configure via config file or RA_LOG_LEVEL, RA_LOG_OUTPUT, RA_TRACE_OUTPUT env vars. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
b4e71db to
a244e4f
Compare
The merge helper was a one-off function in index.ts with type casts. Now it lives alongside runMiddlewareChain, is properly typed, and accepts any number of middleware configs via rest params. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
Same pattern as resolver and memory middleware: just prepend into the existing middleware object. No separate merge step needed. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
- Drop no-op onStreamChunk hook from obs middleware - Add error.stack to onError log for debuggability - Add onCompact callback to CompactionConfig so compaction events are logged - Clean up middleware: middleware → shorthand in index.ts - Add Observability section to README documenting logs, traces, and config https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
- Replace stale mergeMiddleware reference with prepend-to-chain approach - Add context compacted log event to reference table - Add stack field to agent loop failed event - Note onCompact callback pattern for compaction logging https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
Bugs fixed: - onError only closed loopSpan, leaving iterationSpan, modelSpan, and toolSpans orphaned in the tracer's activeSpans Map. Added drainOpenSpans() to end all child spans before closing the root. - Middleware reuse across loop runs could leak stale spans from a crashed run. beforeLoopBegin now drains any leftover state on entry. - error.stack was logged but not traced — added to the loopSpan error attributes for consistency. - All span variables are now nullable (Span | undefined) with guards, preventing endSpan calls on uninitialized spans. Tests added: - Error path: verifies all 3 span types are emitted with error status and stack trace is present in both log and trace output - Reuse safety: same middleware instance across multiple successful runs - Crash recovery: successful run after a failed run with same middleware - onCompact callback: called with correct info, not called when skipped or when summarization fails Also added Observability link to README nav header. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
Startup/shutdown logs: - custom middleware loaded (info, hookCount) - session storage initialized (debug, path) - resuming session (info, sessionId, messageCount) - shutting down (info) Test additions: - Tool execution failure: verifies error log + error span status - Remove unused firstRunOutput variable from reuse test Docs: - README: add toolCallId to tool execution complete/failed (was inconsistent with executing tool row), add startup event summary with link to full reference - docs/observability.md: add all 4 new log events to reference table https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
tsc with emitDeclarationOnly cannot resolve .ts extension imports on value exports (type-only exports are erased and fine). Remove the extensions so build:types passes. https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
- Add parameter signatures to NoopLogger overrides to match base class - Add parameter signatures to NoopTracer overrides to match base class - Remove stale @ts-expect-error (Tracer constructor already accepts null) https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
- Add onCompact callback to compaction type in src/config/types.ts - Use type assertion in middleware merge loop to avoid union discrimination issue https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
- Merge origin/main (subagent tool feature) - Add subagent-specific observability: log task count, per-task status, and aggregate token usage in beforeToolExecution/afterToolExecution - Fix HTTP integration test flake: keep draining stderr pipe after port detection so observability log writes don't block/crash the server https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a production-grade observability system that captures all agent
actions as structured JSON logs with trace/span correlation IDs.
Enabled by default at info level on stderr.
stderr/stdout/file, and JSON-formatted log lines
and tool execution phases
responses (with token usage and response preview), tool execution
(with input/output previews and timing), and errors
Jaeger, ELK, and OpenTelemetry Collector
https://claude.ai/code/session_01RpTTaxAzxVxHEqjj3WwnUx