A lightweight observability and replay layer for debugging agentic AI workflows. AgentTrace records tool calls, model invocations, intermediate decisions, input/output, latency, cost, and final results, enabling step-by-step replay and analysis of agent behavior.
Agentic AI workflows are complex, multi-step processes that involve:
- Multiple LLM calls with different prompts
- Tool invocations (search, code execution, API calls)
- Decision points and branching logic
- State management across steps
Debugging these workflows is challenging because:
- You can't see what happened during execution
- It's hard to understand why an agent made a particular decision
- Cost and token usage are opaque
- Reproducing issues is difficult
AgentTrace solves this by providing a complete execution trace with rich metadata, cost tracking, and replay capabilities.
AgentTrace consists of three components:
- SDK (
sdk/): Python library for instrumenting agent code - Server (
server/): FastAPI application for ingesting, storing, and serving trace data - Dashboard (
dashboard/): Next.js UI for visualizing runs, spans, costs, and tokens
Agent Code → SDK → Exporter (JSONL/HTTP/OTLP) → Server (FastAPI, SQLAlchemy, SQLite/PostgreSQL) → Dashboard (Next.js)
- Python 3.11+
- Node.js 20+
- Docker (optional, for PostgreSQL)
cd sdk
pip install -e .cd server
pip install -r requirements.txt
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The server will:
- Initialize an SQLite database at
./data/agenttrace.db - Expose API at
http://localhost:8000 - Serve API docs at
http://localhost:8000/docs
cd dashboard
npm install
npm run devThe dashboard will be available at http://localhost:3000.
cd examples
python research_agent.pyThis will:
- Create a run named "research_agent"
- Record tool calls (
web_search,parse_content) - Record LLM calls (
synthesize_findings,generate_followup_questions) - Export traces to
./data/research_traces.jsonl
Use APIExporter instead of JSONLExporter when you want the example to ingest directly into the running server and appear in the dashboard.
from agenttrace import Tracer, SpanType
from agenttrace.exporters import APIExporter, JSONLExporter
from agenttrace.wrappers import trace_tool, trace_llm
# Initialize tracer with exporter
tracer = Tracer()
tracer.set_exporter(JSONLExporter(path="./traces.jsonl"))
# Decorate tools
@trace_tool(tracer)
def web_search(query: str) -> dict:
# Simulate search
return {"results": [...]}
# Decorate LLM calls
@trace_llm(tracer, model="gpt-4", cost_per_prompt_token=0.00003, cost_per_completion_token=0.00006)
def synthesize_findings(context: str) -> str:
# Call LLM
return "Summary of findings..."
# Run with tracing (with optional correlation ID for multi-agent workflows)
with tracer.run("research_agent", correlation_id="workflow-123"):
results = web_search("AI agent debugging")
summary = synthesize_findings(str(results))
# Flush to ensure data is written
tracer.flush()To send the same trace data to the server, configure the exporter like this:
tracer.set_exporter(APIExporter(endpoint="http://localhost:8000/api"))AgentTrace uses a span-based tracing model inspired by OpenTelemetry. Each operation (LLM call, tool call, decision) is a span with:
- Unique ID and parent span ID for nesting
- Type (LLM_CALL, TOOL_CALL, DECISION, RETRIEVAL, CUSTOM)
- Input/output data
- Start/end times and duration
- Cost and token usage
- Status (STARTED, COMPLETED, ERROR)
This enables composable, nestable tracing that mirrors the structure of agent workflows.
The SDK uses Python's contextvars for thread-safe and asyncio-safe state management. The current run ID and active span are stored in context variables, allowing instrumentation to access the current trace context without explicit passing.
AgentTrace supports correlating traces across multiple agent instances using a correlation_id. This is useful for:
- Distributed agent workflows
- Multi-agent systems
- Tracking related executions across different services
Use the correlation_id parameter when starting a run:
with tracer.run("agent_task", correlation_id="workflow-123"):
# Agent executionAgentTrace provides a diff API to compare two runs side-by-side, showing:
- Cost differences
- Token usage differences
- Span count differences
- Duration differences
- Span-level differences (added, removed, changed)
Use the diff API endpoint: /api/diff/runs?run_id_1=<id>&run_id_2=<id>
AgentTrace captures all inputs and outputs for each span, enabling step-by-step replay of agent execution. The replay endpoint returns all steps in chronological order with full input/output data.
Use the replay API endpoint: /api/replay/runs/<run_id>
JSONL (JSON Lines) is used for local development because:
- Human-readable and grep-able
- Easy to parse and analyze
- Supports streaming writes
- No external dependencies
For production, the SDK supports HTTP export to the AgentTrace server and OTLP export to OpenTelemetry-compatible systems.
The server is a separate FastAPI application because:
- Decouples tracing from agent execution
- Enables real-time dashboard updates
- Supports multiple agents sharing a trace store
- Provides a REST API for custom integrations
Both the SDK and server use Pydantic for type-safe validation at boundaries:
- SDK: Validates data before export
- Server: Validates incoming API requests
- Ensures consistency across components
The SDK is designed to be non-blocking:
- Export failures are logged but don't raise exceptions
- Context variables are cleared on run completion
- The tracer can be used without an exporter (no-op mode)
The server includes:
- Global exception handler for unhandled errors
- Database transaction rollback on failure
- Graceful degradation when optional features are missing
- SDK: Unit tests for Tracer, Span, exporters, and wrappers
- Server: Integration tests for API endpoints and end-to-end workflows
- Dashboard: E2E tests with Playwright
Run tests with:
# SDK
cd sdk && pytest tests/ -v
# Server
cd server && pytest tests/ -v
# Dashboard
cd dashboard && npm run test:e2eFor production, use PostgreSQL instead of SQLite:
# Set environment variable
export DATABASE_URL="postgresql+asyncpg://user:password@host:5432/agenttrace"Or use Docker Compose:
docker-compose up -dConfigure the server with environment variables:
DATABASE_URL: SQLAlchemy connection stringDATABASE_TYPE: Database backend (sqlite or postgres)SERVER_HOST: Host address (default: 0.0.0.0)SERVER_PORT: Port number (default: 8000)MAX_TRACE_RETENTION_DAYS: Maximum days to retain tracesBUFFER_SIZE: Buffer size for trace ingestionCORS_ORIGINS: Comma-separated list of allowed CORS originsSECRET_KEY: Secret key for JWT authenticationENVIRONMENT: Environment type (development or production)
The server includes JWT-based authentication. To enable:
- Set
SECRET_KEYto a strong random value - Use the
/api/auth/registerendpoint to create users - Use the
/api/auth/tokenendpoint to get access tokens - Include the token in the
Authorizationheader:Bearer <token>
In production, restrict CORS origins to specific domains:
export CORS_ORIGINS="https://yourdomain.com,https://app.yourdomain.com"AgentTrace includes a GitHub Actions CI/CD pipeline that:
- Runs SDK tests with pytest
- Runs server tests with pytest
- Runs dashboard E2E tests with Playwright
- Builds and pushes Docker images to Docker Hub
- Runs security scans with Trivy
The pipeline is configured in .github/workflows/ci.yml.
This project demonstrates:
- Three-tier architecture: SDK → Server → Dashboard
- Async/await patterns: Python async for server, React hooks for dashboard
- Type safety: Python type hints, TypeScript, Pydantic validation
- Modern web stack: FastAPI, Next.js 14, Tailwind CSS, Recharts
- Database design: SQLAlchemy ORM, migrations, indexing
- Testing: pytest, pytest-asyncio, Playwright E2E
- Docker: Multi-container deployment with PostgreSQL
- Observability: Span-based tracing, cost tracking, token usage
- Multi-agent correlation: Correlation IDs for distributed workflows
- Trace diffing: Compare runs for regression testing
- Prompt replay: Step-by-step replay for debugging
- Authentication: JWT-based API authentication
- Developer experience: Decorators, context managers, API docs
- CI/CD: GitHub Actions with automated testing and deployment.