AOP (Agentic Observability Protocol)

Universal AI Agent Observability - Works with MCP, LangChain, CrewAI, or custom agents. Also supports A2A and AP2 protocols.

A "black box recorder" for agentic systems - Track, debug, and optimize agent behavior across any framework.

Why AOP?

The Problem: When your AI agent fails, you have NO idea why.

Which tool failed?
What parameters caused the issue?
How long did each step take?
What was the full execution trace?

Before AOP:

❌ Agent failed (no context, no logs, no trace)

After AOP:

$ aop trace --correlation-id abc123

Execution Trace (850ms total):
  ├─ tool.called: search_web (120ms)
  │   └─ tool.completed ✓
  ├─ tool.called: parse_results (45ms)
  │   └─ tool.completed ✓
  └─ tool.called: generate_summary (685ms) ← SLOW!
      └─ tool.error: RateLimitError ❌

✅ Now you know: Rate limit on the summary API!

AOP is like a flight recorder for AI agents - complete visibility into what happened, when, and why.

Use Cases

🐛 Debugging Agent Failures

Problem: Agent crashes in production, you don't know why. Solution: Click the error event → "View Full Trace" → see entire execution chain

💸 Cost Optimization

Problem: Agent costs are exploding, unclear which tools/prompts are expensive. Solution: Analytics tab shows: most-called tools, slowest operations, cost per workflow

🔍 Production Monitoring

Problem: Need to monitor agent health, response times, error rates. Solution: Export to Prometheus, set up Grafana dashboards, get alerts

🏢 Compliance & Auditing

Problem: Enterprise needs audit trail of all agent actions. Solution: PostgreSQL backend, queryable event log, exportable to compliance systems

👥 Multi-Agent Orchestration

Problem: Multiple agents coordinate on tasks, hard to track who did what. Solution: Correlation IDs group related events, trace explorer shows agent interactions

What is AOP?

AOP is a universal observability protocol for AI agents that works across MCP, A2A, and AP2 protocols. It provides complete visibility into agent behavior with minimal code and zero performance impact.

Key Features:

🔒 Privacy-First - Local storage by default, you own your data
⚡ Fast - <1ms P99 overhead, production-ready performance
🌍 Protocol-Agnostic - Works with MCP, A2A, AP2 out of the box
📊 Powerful Analytics - Trace reconstruction, aggregations, time-series analysis
🎯 Simple API - 1-line decorator reduces code by 86%
🔓 Open Source - MIT licensed, community-driven

5-Minute Quick Start

Step 1: Install AOP

pip install aop-pack

Step 2: Add to Your MCP Server

# your_mcp_server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server
from aop import AOPClient
import asyncio

# Initialize AOP (creates local database automatically)
aop = AOPClient()

# Create MCP server
server = Server("my-server")

# Decorate your MCP tools with AOP
@server.call_tool()
@aop.mcp.observe_tool(agent_id='my-server')
async def search_web(query: str) -> dict:
    """Search the web for information."""
    # Your actual search implementation
    import httpx
    async with httpx.AsyncClient() as client:
        response = await client.get(f'https://api.duckduckgo.com/?q={query}&format=json')
        return response.json()

# Run the server
if __name__ == "__main__":
    asyncio.run(stdio_server(server))

⚠️ IMPORTANT: Decorator Order Matters!

# ✅ CORRECT: MCP decorator FIRST, then AOP
@server.call_tool()
@aop.mcp.observe_tool(agent_id='my-server')
async def my_tool():
    ...

# ❌ WRONG: AOP first breaks MCP registration
@aop.mcp.observe_tool(agent_id='my-server')
@server.call_tool()
async def my_tool():
    ...

Step 3: Use Your Tool Through an LLM

Option A: Claude Desktop

Add to your Claude Desktop MCP config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "my-server": {
      "command": "python",
      "args": ["/path/to/your_mcp_server.py"]
    }
  }
}

Restart Claude Desktop, then in chat:

"Search the web for AI agent observability tools"

When Claude calls your search_web tool, AOP automatically logs it! ✨

Option B: Any MCP Client

from mcp import ClientSession

async with ClientSession() as session:
    # When this runs, AOP captures it
    result = await session.call_tool("search_web", {"query": "AI agents"})

Step 4: Verify AOP Captured It

# Query events
$ aop query --agent-id my-server --last 1h

┌────────────────────┬──────────────┬──────────┬──────────┬───────────┐
│ Timestamp          │ Agent        │ Tool     │ Duration │ Status    │
├────────────────────┼──────────────┼──────────┼──────────┼───────────┤
│ 2025-01-15 10:30   │ my-server    │ search   │ 145ms    │ ✓ Success │
└────────────────────┴──────────────┴──────────┴──────────┴───────────┘

Step 5: View Live Dashboard

$ aop dashboard
# Open http://localhost:8000

# Watch live as Claude uses your tools!

That's it! All your MCP tools now have complete observability.

Installation

# Complete package with all features included
pip install aop-pack

The aop-pack package includes everything you need:

✅ Core observability library
✅ Command-line tools (CLI)
✅ Web dashboard
✅ OpenTelemetry export
✅ Prometheus metrics
✅ PostgreSQL support
✅ All dependencies bundled

Framework Support

✅ Ready Now

MCP (Model Context Protocol) - First-class support with @observe_tool decorator
Standalone Python tools - Decorator-based observability for any function
FastMCP - Works out of box with FastMCP servers
Official MCP SDK - Full support for mcp.server

🚧 Coming Soon

LangChain - Callback handler integration (basic support via decorators available now)
CrewAI - Tool wrapper for CrewAI agents
AutoGPT - Plugin support for AutoGPT
Semantic Kernel - Middleware integration

💡 Custom Frameworks

Works with any agent framework - just log events manually:

client.log_event({
    'agent_id': 'my-agent',
    'event_type': 'custom.tool.call',
    'protocol': 'custom',
    'data': {'tool': 'my-tool', 'params': {...}}
})

Why AOP vs Alternatives?

Feature	AOP	LangSmith	Helicone	Custom Logging
Setup time	1 minute	Account signup	Proxy setup	Days of dev
Pricing	Free	$99+/mo	Pay per request	Dev time cost
Data location	Your machine	Cloud	Cloud	Your choice
Framework support	Any	LangChain-first	LLM proxy only	Manual
Dashboard	Included	Cloud UI	Cloud UI	Build yourself
Trace search	Event ID/Correlation/Parent	Correlation only	Session only	Manual queries
Export formats	JSON/CSV/TOON/OTEL/Prometheus	Limited	Limited	Custom
Privacy	100% local	Cloud-based	Cloud-based	Your choice

Choose AOP if:

Privacy/compliance requires local data
Multi-framework setup (MCP + LangChain + custom)
Don't want monthly subscription
Need self-hosted solution
Want full control over your observability data

Choose alternatives if:

Want managed cloud service
Only use LangChain
Need enterprise support contracts
Prefer SaaS over self-hosted

Integrating with MCP Servers

Using FastMCP

from fastmcp import FastMCP
from aop import AOPClient

# Initialize AOP
aop = AOPClient()

# Create FastMCP server
mcp = FastMCP("my-server")

# Decorate your tools
@mcp.tool()
@aop.mcp.observe_tool(agent_id='my-server')
def calculator(operation: str, a: float, b: float) -> float:
    """Calculator tool with complete observability."""
    ops = {'add': a+b, 'sub': a-b, 'mul': a*b, 'div': a/b}
    return ops[operation]

@mcp.tool()
@aop.mcp.observe_tool(agent_id='my-server')
def search(query: str, max_results: int = 5) -> list:
    """Search tool with AOP logging."""
    import httpx
    response = httpx.get(f'https://api.example.com/search?q={query}')
    return response.json()['results'][:max_results]

Using Official MCP SDK

from mcp.server import Server
from mcp.server.stdio import stdio_server
from aop import AOPClient
import asyncio

aop = AOPClient()
server = Server("my-server")

@server.call_tool()
@aop.mcp.observe_tool(agent_id='my-server')
async def get_weather(city: str) -> dict:
    """Get weather for a city."""
    # Your implementation
    import httpx
    async with httpx.AsyncClient() as client:
        response = await client.get(f'https://api.weather.com/{city}')
        return response.json()

if __name__ == "__main__":
    asyncio.run(stdio_server(server))

Multi-Tool Server Example

from mcp.server import Server
from aop import AOPClient

aop = AOPClient()
server = Server("research-assistant")

@server.call_tool()
@aop.mcp.observe_tool(agent_id='research-assistant')
async def search_papers(topic: str, year: int) -> list:
    """Search academic papers."""
    # Implementation
    pass

@server.call_tool()
@aop.mcp.observe_tool(agent_id='research-assistant')
async def summarize_paper(paper_id: str) -> str:
    """Summarize a paper."""
    # Implementation
    pass

@server.call_tool()
@aop.mcp.observe_tool(agent_id='research-assistant')
async def extract_citations(paper_id: str) -> list:
    """Extract citations from a paper."""
    # Implementation
    pass

# All three tools now have complete observability!

Testing It Works

After decorating your tools, verify AOP is logging:

1. Check Events Were Logged

# View recent events
$ aop query --agent-id my-server

# Filter by tool name
$ aop query --agent-id my-server --event-type mcp.tool.called

# Last hour only
$ aop query --agent-id my-server --last 1h

2. Verify Output

You should see:

┌────────────────────┬──────────────┬──────────┬──────────┬───────────┐
│ Timestamp          │ Agent        │ Tool     │ Duration │ Status    │
├────────────────────┼──────────────┼──────────┼──────────┼───────────┤
│ 2025-01-15 10:30   │ my-server    │ search   │ 145ms    │ ✓ Success │
│ 2025-01-15 10:31   │ my-server    │ calc     │ 2ms      │ ✓ Success │
└────────────────────┴──────────────┴──────────┴──────────┴───────────┘

If you see nothing:

Check decorator order (MCP decorator must be first!)
Verify agent_id matches
Check database file exists: ls -la *.db
See Common Pitfalls below

3. Check Database Location

# AOP creates database in current directory by default
$ ls -la aop_events.db

# Or specify custom location:
client = AOPClient(storage='sqlite:///path/to/events.db')

4. Test Without LLM (Manual Testing)

# For testing, you can call tools directly
if __name__ == "__main__":
    result = search_web("AI agents")
    print(result)

    # Check AOP logged it
    events = aop.query(agent_id='my-server')
    print(f"Logged {len(events)} events")

Basic Usage

Decorator Pattern (Recommended)

from aop import AOPClient

client = AOPClient()

@client.mcp.observe_tool(agent_id='my-agent')
def search(query: str, max_results: int = 10):
    """Search for information."""
    import httpx
    response = httpx.get(f'https://api.duckduckgo.com/?q={query}&format=json')
    results = response.json().get('RelatedTopics', [])
    return results[:max_results]

# Use normally - everything is logged automatically!
result = search(query='AI agents', max_results=5)

What gets logged automatically:

✅ Tool name (search)
✅ Function parameters (query='AI agents', max_results=5)
✅ Return value
✅ Execution duration
✅ Errors and exceptions (if any)
✅ Parent-child relationships (for multi-step workflows)

Context Manager

with client.mcp.tool_execution('my-agent', 'search', {'q': 'test'}) as handle:
    results = perform_search('test')
    handle.set_result(results)

Manual Logging

client.log_event({
    'agent_id': 'my-agent',
    'event_type': 'mcp.tool.called',
    'protocol': 'mcp',
    'data': {'tool_name': 'search', 'params': {'q': 'test'}}
})

Core Features

1. Querying Events

# Get recent events
events = client.query(agent_id='my-agent', limit=50)

# Filter by event type
tool_calls = client.query(
    agent_id='my-agent',
    event_type='mcp.tool.called'
)

# Time range queries
from datetime import datetime, timedelta
recent = client.query(
    agent_id='my-agent',
    start_time=datetime.now() - timedelta(hours=1)
)

# Get complete trace
trace_events = client.get_trace(correlation_id='trace-123')

2. Analytics & Insights

from aop import Analytics

analytics = Analytics(client)

# Reconstruct distributed traces
trace = analytics.reconstruct_trace(correlation_id='trace-123')
print(f"Duration: {trace['total_duration_ms']}ms")
print(f"Events: {trace['event_count']}")

# Tool usage analytics
tool_counts = analytics.count_by_tool('my-agent')
avg_durations = analytics.avg_duration_by_tool('my-agent')

# Latency percentiles
p95 = analytics.percentile_duration('my-agent', percentile=95)
p99 = analytics.percentile_duration('my-agent', percentile=99)

# Time-series analysis
timeline = analytics.events_over_time('my-agent', bucket_size='1h')
rate = analytics.event_rate('my-agent', window_minutes=60)

3. Command-Line Interface

# Query events
aop query --agent-id my-agent --last 1h

# Visualize traces
aop trace --correlation-id trace-123

# View analytics
aop stats --agent-id my-agent --window 24h

# Export data
aop export --output events.json --last 7d
aop export --output events.toon --format toon  # TOON format (30-60% fewer tokens)

# Start Prometheus exporter
aop prometheus --port 9090

# Launch web dashboard
aop dashboard

4. Web Dashboard

Launch a professional web interface for real-time monitoring:

pip install aop[dashboard]
aop dashboard

Features:

Tabular Event View - Clean table with sortable columns (timestamp, agent, type, duration)
Live Updates - New events smoothly push down existing ones via WebSocket
Click-to-View - Click any event row to see full details in side panel
Smart Sorting - Sort by date/time, agent (A-Z), event type, or duration
Color-Coded Status - Visual indicators (🟢 success, 🔴 error, 🔵 in-progress)
Export from UI - Export to JSON, CSV, TOON, OpenTelemetry, Prometheus
Real-time Stats - Live performance metrics as events stream in

Access at http://localhost:8000

5. Exporters

JSON Export

from aop.exporters import JSONExporter

exporter = JSONExporter(client)
events = client.query(agent_id='my-agent', limit=100)
json_output = exporter.export(events)

# Save to file
exporter.export_to_file(events, 'events.json')

CSV Export

from aop.exporters import CSVExporter

exporter = CSVExporter(client)
events = client.query(agent_id='my-agent', limit=100)

# Export to file
exporter.export_to_file(events, 'events.csv')

TOON (Token-Oriented Object Notation)

LLM-optimized export format with 30-60% token reduction - Perfect for AI-assisted debugging and trace analysis.

from aop.exporters import ToonExporter

# Basic export
exporter = ToonExporter(flatten=True, delimiter='comma')
events = client.query(correlation_id='trace-123', limit=100)
toon_output = exporter.export(events)

# Export to file
exporter.export_to_file(events, 'trace.toon')

# Check token savings
stats = exporter.get_token_estimate(events)
print(f"Token savings: {stats['savings_percent']}%")
# Output: Token savings: 45.2%

CLI Export:

# Export to TOON format
aop export --output events.toon --format toon

# Export with options
aop export -o trace.toon -f toon --toon-delimiter pipe --correlation-id abc123

# Export recent events
aop export -o recent.toon -f toon --last 1h --limit 100

Why TOON?

📉 30-60% fewer tokens than JSON for uniform event arrays
💰 Lower LLM costs when analyzing traces in prompts
🎯 Optimized for AI consumption and debugging
📊 Tabular format for uniform data (similar to CSV)

Use Cases:

AI-assisted debugging ("analyze this trace and find bottlenecks")
Cost-effective trace analysis with GPT-4/Claude
Passing large event datasets in LLM prompts
Automated performance analysis

OpenTelemetry

from aop.exporters import OpenTelemetryExporter

exporter = OpenTelemetryExporter(client)
events = client.query(correlation_id='trace-123')
spans = exporter.export_events(events)

exporter.export_to_collector(
    spans=spans,
    endpoint='http://localhost:4317'
)

Prometheus

# Start metrics server
aop prometheus --port 9090

# Metrics available at http://localhost:9090/metrics

Metrics exposed:

aop_events_total - Total events (by type, agent, protocol)
aop_tool_duration_seconds - Tool duration histogram
aop_tool_errors_total - Tool error counter
aop_event_rate - Events per minute gauge

Protocol Support

MCP (Model Context Protocol)

# Tool execution (decorator)
@client.mcp.observe_tool(agent_id='my-agent')
def my_tool(param: str):
    return process(param)

# LLM sampling
req_id = client.mcp.log_sampling_request(
    agent_id='my-agent',
    model='gpt-4',
    prompt='Explain AI'
)

client.mcp.log_sampling_response(
    agent_id='my-agent',
    model='gpt-4',
    response='AI is...',
    parent_id=req_id
)

A2A (Agent-to-Agent Protocol)

# Task assignment
client.a2a.log_task_assigned(
    agent_id='orchestrator',
    task_id='task-123',
    assigned_to='worker-agent',
    task_data={'action': 'process'}
)

# Task completion
client.a2a.log_task_completed(
    agent_id='worker-agent',
    task_id='task-123',
    result={'status': 'done'}
)

# Messaging
client.a2a.log_message_sent(
    agent_id='agent-1',
    recipient='agent-2',
    message={'type': 'request', 'data': {...}}
)

AP2 (Agent Payments Protocol)

# Payment tracking
client.ap2.log_payment_initiated(
    agent_id='my-agent',
    payment_id='pay-123',
    amount=10.50,
    currency='USD',
    recipient='service-provider'
)

client.ap2.log_payment_completed(
    agent_id='my-agent',
    payment_id='pay-123',
    transaction_id='txn-456'
)

# Cost tracking
client.ap2.log_cost_incurred(
    agent_id='my-agent',
    cost_amount=0.15,
    currency='USD',
    resource_type='llm_api'
)

Storage Backends

SQLite (Default)

# File-based
client = AOPClient(storage='sqlite:///aop_events.db')

# In-memory (testing)
client = AOPClient(storage='memory')

PostgreSQL

client = AOPClient(
    storage='postgresql://user:password@localhost:5432/aop_db'
)

Custom Storage

Implement the BaseStorage interface for custom backends.

Distributed Tracing

Link related events across agents using correlation_id:

import uuid

trace_id = str(uuid.uuid4())

# All events use the same correlation_id
@client.mcp.observe_tool(agent_id='orchestrator', correlation_id=trace_id)
def step1():
    return process_step1()

@client.mcp.observe_tool(agent_id='worker', correlation_id=trace_id)
def step2(data):
    return process_step2(data)

# Execute workflow
result1 = step1()
result2 = step2(result1)

# Reconstruct complete trace
from aop import Analytics
analytics = Analytics(client)
trace = analytics.reconstruct_trace(correlation_id=trace_id)

🔍 Trace Explorer - Multiple Search Methods

AOP supports three ways to view execution traces:

1. By Correlation ID (Original)

Group all events in a workflow using a shared correlation ID.

# When logging events
with client.trace('user-request-123'):
    client.mcp.log_tool_call(...)  # Auto-tagged with correlation_id

# View trace
from aop import Analytics
analytics = Analytics(client)
trace = analytics.reconstruct_trace('user-request-123')

Dashboard: Enter correlation ID in Trace Explorer tab

CLI:

aop trace --correlation-id user-request-123

API:

GET /api/traces/{correlation_id}

2. By Event ID ⭐ NEW

Click any event to see its complete trace - no correlation ID needed!

Dashboard:

Click any event in Live Feed
Click "🔍 View Full Trace" button
Automatically shows root + all related events

How it works:

Walks up parent_id chain to find root event
Walks down to find all children
Reconstructs complete execution tree

API:

GET /api/traces/by-event/{event_id}

Example:

# Log events without correlation_id
call_event = client.mcp.log_tool_call(
    agent_id='my-agent',
    tool_name='search',
    params={'query': 'test'}
)

result_event = client.mcp.log_tool_result(
    agent_id='my-agent',
    tool_name='search',
    result={'found': 10},
    duration_ms=120,
    parent_id=call_event  # Links to parent
)

# Reconstruct trace using ANY event ID
analytics = Analytics(client)
trace = analytics.reconstruct_trace_from_event(result_event)
# Returns both events in tree structure!

3. By Parent ID ⭐ NEW

Search using any parent event ID to see all its children.

API:

GET /api/traces/by-parent/{parent_id}

Why Multiple Search Methods?

Method	Use Case
Correlation ID	Planned workflows, multi-agent orchestration
Event ID	Debugging - "show me what led to this error"
Parent ID	Analyzing specific operation's sub-operations

No correlation_id? No problem! Event ID search works even when you forgot to set correlation IDs.

Verify It's Working (60 Second Test)

1. Install and start dashboard

pip install aop-pack
aop dashboard

2. In another terminal, log a test event

from aop import AOPClient

client = AOPClient()
client.log_event({
    'agent_id': 'test-agent',
    'event_type': 'mcp.tool.called',
    'protocol': 'mcp',
    'data': {'tool_name': 'test_tool', 'params': {'query': 'hello'}}
})

3. Check dashboard

Open http://localhost:8000 - you should see your test event in Live Feed.

✅ Working? You're ready to integrate with your agent. ❌ Not showing? Check the troubleshooting section below.

How It Works

┌─────────────────┐
│  Your Agent     │
│ (MCP/LangChain) │
└────────┬────────┘
         │ @observe_tool decorator
         ▼
┌─────────────────┐
│   AOPClient     │ ← Logs events
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│    Storage      │ ← SQLite/PostgreSQL
│ (local/cloud)   │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   Dashboard     │ ← Visualize & export
│   Analytics     │
└─────────────────┘

Key Design Principles:

Zero external dependencies for core library
Local-first - works offline, no cloud required
Async logging - doesn't slow down your agent
Pluggable storage - SQLite for dev, PostgreSQL for prod

Common Pitfalls

"I decorated my function but see no events"

Solution 1: Check Decorator Order

# ✅ CORRECT
@server.call_tool()  # MCP decorator FIRST
@aop.mcp.observe_tool(agent_id='my-server')
async def my_tool():
    ...

# ❌ WRONG
@aop.mcp.observe_tool(agent_id='my-server')  # AOP first breaks MCP
@server.call_tool()
async def my_tool():
    ...

Solution 2: Verify Database Location

# Check if database was created
$ ls -la *.db
-rw-r--r--  1 user  staff  12288 Jan 15 10:30 aop_events.db

# If not found, specify explicit path
client = AOPClient(storage='sqlite:///./aop_events.db')

Solution 3: Tool Must Be Called By LLM

Remember: MCP tools only generate events when CALLED!
- Start your server
- Use tool through Claude Desktop or MCP client
- Then check: aop query --agent-id my-server

"Events not showing in dashboard"

Check agent_id matches:

# In code
@aop.mcp.observe_tool(agent_id='my-server')  # ← Note the ID

# In query
$ aop query --agent-id my-server  # ← Must match exactly!

"Async function breaks with decorator"

Decorator order is critical:

# ✅ Works with async
@server.call_tool()
@aop.mcp.observe_tool(agent_id='my-server')
async def my_async_tool():
    await asyncio.sleep(1)
    return "done"

"Database locked" error

Multiple processes accessing same database:

# Solution: Use PostgreSQL for multi-process
client = AOPClient(storage='postgresql://localhost/aop')

# Or: Use separate databases per process
client = AOPClient(storage=f'sqlite:///aop_{os.getpid()}.db')

"No module named 'mcp'"

Install MCP SDK:

pip install mcp
# or
pip install fastmcp

Examples

Complete working examples:

Basic MCP Server with AOP

See examples/mcp_server_with_aop.py for a complete, runnable example.

Decorator Usage

See docs/examples/decorator_demo.py - Shows async/sync tools with decorators.

Analytics and Tracing

See docs/examples/analytics_demo.py - Analytics and trace reconstruction.

TOON Export

See examples/toon_export_demo.py - Export events in TOON format.

Run examples:

# Complete MCP server example
python examples/mcp_server_with_aop.py

# Decorator patterns
python docs/examples/decorator_demo.py

# Analytics demo
python docs/examples/analytics_demo.py

Complete Real-World Example: Weather Agent

Here's a complete, copy-pasteable example of an MCP weather server with AOP:

# weather_server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server
from aop import AOPClient
import asyncio
import httpx

# Initialize AOP
aop = AOPClient(storage='sqlite:///weather_events.db')
server = Server("weather-agent")

@server.call_tool()
@aop.mcp.observe_tool(agent_id="weather-agent")
async def get_weather(city: str) -> dict:
    """Get current weather for a city."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://api.openweathermap.org/data/2.5/weather",
            params={"q": city, "appid": "YOUR_API_KEY", "units": "metric"}
        )
        data = response.json()
        return {
            "city": city,
            "temperature": data["main"]["temp"],
            "description": data["weather"][0]["description"],
            "humidity": data["main"]["humidity"]
        }

@server.call_tool()
@aop.mcp.observe_tool(agent_id="weather-agent")
async def get_forecast(city: str, days: int = 5) -> dict:
    """Get weather forecast."""
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://api.openweathermap.org/data/2.5/forecast",
            params={"q": city, "appid": "YOUR_API_KEY", "units": "metric", "cnt": days}
        )
        return response.json()

if __name__ == "__main__":
    asyncio.run(stdio_server(server))

Run it:

# Install dependencies
pip install aop-pack mcp httpx

# Start server (in one terminal)
python weather_server.py &

# Start dashboard (in another terminal)
aop dashboard

Use it in Claude Desktop:

Add to your MCP settings (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "weather": {
      "command": "python",
      "args": ["/path/to/weather_server.py"]
    }
  }
}

Restart Claude Desktop and ask:

"What's the weather in San Francisco?"

See observability:

Live Feed: Real-time tool calls as you chat
Trace Explorer: See which weather calls happened together
Analytics: Which cities you query most, average response time
Export: Download events as JSON, CSV, or TOON

# View recent tool calls
aop query --agent-id weather-agent --last 1h

# Export to TOON for LLM analysis
aop export -o weather_trace.toon -f toon --agent-id weather-agent

Integrations

LangChain

from langchain.tools import Tool
from aop import AOPClient

client = AOPClient()

@client.mcp.observe_tool(agent_id='langchain-agent')
def search_tool(query: str) -> str:
    import httpx
    response = httpx.get(f'https://api.duckduckgo.com/?q={query}&format=json')
    return str(response.json())

lc_tool = Tool(
    name="Search",
    func=search_tool,
    description="Search for information"
)

# All tool calls are now logged to AOP

OpenTelemetry

aop export-otel --correlation-id trace-123 --endpoint http://localhost:4317

Prometheus + Grafana

# Start Prometheus exporter
aop prometheus --port 9090

# Add to prometheus.yml
scrape_configs:
  - job_name: 'aop'
    static_configs:
      - targets: ['localhost:9090']

View metrics in Grafana with pre-built dashboards.

Performance

AOP is designed for production use with minimal overhead:

<1ms P99 latency - Won't slow down your agents
Zero runtime dependencies - Core library has no deps
Async support - Non-blocking logging
Connection pooling - Efficient database usage
Optional validation - Skip in production for speed

Benchmarks (local SQLite):

Insert event: 0.3ms median, 0.8ms P99
Query 100 events: 2.5ms median

Design Principles

Privacy-First - Local storage by default, no telemetry, you own your data
Zero Dependencies - Core library uses only Python stdlib
Protocol-Agnostic - Not tied to any specific agent protocol
Storage-Flexible - Pluggable backends (SQLite, PostgreSQL, custom)
Minimal Overhead - <1ms P99, production-ready performance
Developer-Friendly - Simple API, decorator pattern, type hints

Documentation

Getting Started

Installation & Quick Start - Get up and running in 5 minutes
User Guide - Comprehensive usage guide
Examples - Code examples and tutorials

Reference

API Reference - Complete API documentation
CLI Reference - Command-line tools
Event Schema Specification - Event schema details

Advanced

Protocol Guide - MCP, A2A, AP2 protocols in depth
Dashboard Guide - Web dashboard usage
Integrations - OpenTelemetry, Prometheus, frameworks
Architecture - System design and internals
Troubleshooting - Common issues and solutions

Roadmap

See RoadMap.md for detailed development plan.

v0.1.0-alpha (Current)

✅ Core event logging and querying
✅ Protocol adapters (MCP, A2A, AP2)
✅ Analytics engine
✅ CLI tools
✅ Web dashboard
✅ OpenTelemetry and Prometheus exporters
✅ TOON export format

v0.2.0 (Planned)

Batch insert optimization
Stream processing API
Additional storage backends
Enhanced dashboard features
Performance improvements

Contributing

AOP is open source and community-driven. Contributions are welcome!

Ways to contribute:

Report bugs and request features via GitHub Issues
Submit pull requests
Improve documentation
Share examples and use cases

See CONTRIBUTING.md for guidelines.

Community

Questions: GitHub Discussions
Issues: GitHub Issues
Documentation: docs.aop-protocol.org

License

MIT License - see LICENSE file for details.

Citation

If you use AOP in your research or project, please cite:

@software{aop2025,
  title = {AOP: Agentic Observability Protocol},
  author = {AOP Contributors},
  year = {2025},
  url = {https://github.com/aop-protocol/aop},
  version = {0.1.0-alpha4}
}

Built with ❤️ by the AOP community

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
aop		aop
docs		docs
examples		examples
landing-page		landing-page
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
RoadMap.md		RoadMap.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AOP (Agentic Observability Protocol)

Why AOP?

Use Cases

🐛 Debugging Agent Failures

💸 Cost Optimization

🔍 Production Monitoring

🏢 Compliance & Auditing

👥 Multi-Agent Orchestration

What is AOP?

5-Minute Quick Start

Step 1: Install AOP

Step 2: Add to Your MCP Server

Step 3: Use Your Tool Through an LLM

Step 4: Verify AOP Captured It

Step 5: View Live Dashboard

Installation

Framework Support

✅ Ready Now

🚧 Coming Soon

💡 Custom Frameworks

Why AOP vs Alternatives?

Integrating with MCP Servers

Using FastMCP

Using Official MCP SDK

Multi-Tool Server Example

Testing It Works

1. Check Events Were Logged

2. Verify Output

3. Check Database Location

4. Test Without LLM (Manual Testing)

Basic Usage

Decorator Pattern (Recommended)

Context Manager

Manual Logging

Core Features

1. Querying Events

2. Analytics & Insights

3. Command-Line Interface

4. Web Dashboard

5. Exporters

JSON Export

CSV Export

TOON (Token-Oriented Object Notation)

OpenTelemetry

Prometheus

Protocol Support

MCP (Model Context Protocol)

A2A (Agent-to-Agent Protocol)

AP2 (Agent Payments Protocol)

Storage Backends

SQLite (Default)

PostgreSQL

Custom Storage

Distributed Tracing

🔍 Trace Explorer - Multiple Search Methods

1. By Correlation ID (Original)

2. By Event ID ⭐ NEW

3. By Parent ID ⭐ NEW

Verify It's Working (60 Second Test)

1. Install and start dashboard

2. In another terminal, log a test event

3. Check dashboard

How It Works

Common Pitfalls

"I decorated my function but see no events"

"Events not showing in dashboard"

"Async function breaks with decorator"

"Database locked" error

"No module named 'mcp'"

Examples

Basic MCP Server with AOP

Decorator Usage

Analytics and Tracing

TOON Export

Complete Real-World Example: Weather Agent

Integrations

Packages