# Analytics API - Tutorial

This tutorial demonstrates how to use the Agenta Analytics API to analyze LLM performance metrics. You'll learn how to:

- Retrieve aggregated metrics over time
- Analyze costs, latency, and token usage
- Filter analytics by status and other attributes
- Track error trends and failure rates
- Compare performance across different time periods

## What You'll Build

We'll create analytics queries that:
1. Track daily LLM costs and spending trends
2. Monitor error rates and identify peak error times
3. Analyze token usage patterns
4. Compare performance metrics over time
5. Generate cost reports and visualizations

## Setup

Before using the API, you need your Agenta API key. You can create API keys from the Settings page in your Agenta workspace.

In [None]:
import os
import requests
import json
from datetime import datetime, timedelta, timezone
from getpass import getpass

# Configuration
AGENTA_HOST = os.getenv("AGENTA_HOST", "https://cloud.agenta.ai")
api_key = os.getenv("AGENTA_API_KEY")
if not api_key:
    api_key = getpass("Enter your Agenta API key: ")
    os.environ["AGENTA_API_KEY"] = api_key

# Setup base configuration
BASE_URL = f"{AGENTA_HOST}/api/preview/tracing/spans/analytics"
HEADERS = {
    "Authorization": f"ApiKey {api_key}",
    "Content-Type": "application/json"
}

print("✅ Setup complete!")
print(f"API endpoint: {BASE_URL}")

## Part 1: Get Recent Metrics

Let's start by retrieving metrics for the last 7 days with daily buckets. Each bucket contains aggregated metrics for all traces within that day.

In [None]:
# Get analytics for last 7 days with daily buckets
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=7)

payload = {
    "focus": "trace",
    "interval": 1440,  # 1440 minutes = daily buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

print(f"📊 Found {data['count']} daily buckets\n")

# Show all days with activity
for bucket in data['buckets']:
    if bucket['total']['count'] > 0:
        date = bucket['timestamp'][:10]
        print(f"Date: {date}")
        print(f"  Traces: {bucket['total']['count']}")
        print(f"  Cost: ${bucket['total']['costs']:.4f}")
        print(f"  Tokens: {bucket['total']['tokens']:,.0f}")
        print(f"  Errors: {bucket['errors']['count']}\n")

## Part 2: Track Daily Costs

Calculate total costs and generate summary statistics over a time period.

In [None]:
# Get daily metrics for last 30 days
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=30)

payload = {
    "focus": "trace",
    "interval": 1440,  # Daily buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

# Calculate totals
total_traces = sum(b['total']['count'] for b in data['buckets'])
total_cost = sum(b['total']['costs'] for b in data['buckets'])
total_tokens = sum(b['total']['tokens'] for b in data['buckets'])
total_errors = sum(b['errors']['count'] for b in data['buckets'])

print("💰 Cost Summary (Last 30 Days)")
print("=" * 50)
print(f"Total Cost: ${total_cost:.2f}")
print(f"Total Requests: {total_traces:,}")
if total_traces > 0:
    print(f"Average Cost per Request: ${total_cost/total_traces:.6f}")
    print(f"Total Tokens: {total_tokens:,.0f}")
    print(f"Average Tokens per Request: {total_tokens/total_traces:.1f}")
    print(f"Error Rate: {(total_errors/total_traces)*100:.2f}%")

## Part 3: Analyze Error Trends

Monitor error rates over time to identify patterns and peak error times.

In [None]:
# Get hourly metrics for last 7 days
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=7)

payload = {
    "focus": "trace",
    "interval": 60,  # Hourly buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

print("🚨 Error Analysis")
print("=" * 50)

# Find hours with high error rates
high_error_periods = []
for bucket in data['buckets']:
    if bucket['total']['count'] > 0:
        error_rate = (bucket['errors']['count'] / bucket['total']['count']) * 100
        if error_rate > 5:  # Flag periods with > 5% errors
            high_error_periods.append({
                'time': bucket['timestamp'],
                'error_rate': error_rate,
                'total': bucket['total']['count'],
                'errors': bucket['errors']['count']
            })

if high_error_periods:
    print(f"\nFound {len(high_error_periods)} periods with high error rates (>5%):\n")
    for period in high_error_periods[:10]:  # Show top 10
        print(f"  {period['time']}")
        print(f"    Error Rate: {period['error_rate']:.1f}%")
        print(f"    Total: {period['total']}, Errors: {period['errors']}\n")
else:
    print("✅ No high error rates detected in the last 7 days")

## Part 4: Filter by Status Code

Analyze only successful traces by filtering on status code.

In [None]:
# Get successful traces only
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=7)

payload = {
    "focus": "trace",
    "interval": 1440,  # Daily buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    },
    "filter": {
        "conditions": [
            {
                "field": "status.code",
                "operator": "eq",
                "value": "STATUS_CODE_OK"
            }
        ]
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

# Calculate success metrics
total_count = sum(b['total']['count'] for b in data['buckets'])
total_cost = sum(b['total']['costs'] for b in data['buckets'])
total_duration = sum(b['total']['duration'] for b in data['buckets'])

print("✅ Successful Traces (Last 7 Days)")
print("=" * 50)
print(f"Count: {total_count:,}")
print(f"Total Cost: ${total_cost:.4f}")
if total_count > 0:
    print(f"Avg Duration: {total_duration/total_count:.0f}ms")

## Part 5: Track Token Usage

Monitor token consumption patterns over time.

In [None]:
# Get daily token usage for last 7 days
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=7)

payload = {
    "focus": "trace",
    "interval": 1440,  # Daily buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

print("🎯 Token Usage Analysis")
print("=" * 50)
print("\nDaily Token Usage:\n")

for bucket in data['buckets']:
    if bucket['total']['count'] > 0:
        date = bucket['timestamp'][:10]
        avg_tokens = bucket['total']['tokens'] / bucket['total']['count']
        print(f"  {date}: {bucket['total']['tokens']:>8,.0f} total ({avg_tokens:>6.0f} avg)")

## Part 6: Analyze Performance

Track latency trends over time to identify performance changes.

In [None]:
# Get hourly performance for last 24 hours
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=1)

payload = {
    "focus": "trace",
    "interval": 60,  # Hourly buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

print("⚡ Performance Analysis (Last 24 Hours)")
print("=" * 50)
print("\nHourly Average Latency:\n")

latencies = []
for bucket in data['buckets']:
    if bucket['total']['count'] > 0:
        avg_duration = bucket['total']['duration'] / bucket['total']['count']
        latencies.append(avg_duration)
        hour = bucket['timestamp'][11:16]  # Extract HH:MM
        print(f"  {hour}: {avg_duration:7.0f}ms")

if latencies:
    print(f"\n📈 Statistics:")
    print(f"  Min: {min(latencies):.0f}ms")
    print(f"  Max: {max(latencies):.0f}ms")
    print(f"  Avg: {sum(latencies)/len(latencies):.0f}ms")

## Part 7: Generate Monthly Cost Report

Create a comprehensive monthly report with cost breakdown and usage statistics.

In [None]:
# Get monthly metrics
newest = datetime.now(timezone.utc)
oldest = newest - timedelta(days=30)

payload = {
    "focus": "trace",
    "interval": 1440,  # Daily buckets
    "windowing": {
        "oldest": oldest.isoformat(),
        "newest": newest.isoformat()
    }
}

response = requests.post(BASE_URL, headers=HEADERS, json=payload)
data = response.json()

# Calculate totals
total_traces = sum(b['total']['count'] for b in data['buckets'])
total_cost = sum(b['total']['costs'] for b in data['buckets'])
total_tokens = sum(b['total']['tokens'] for b in data['buckets'])
total_duration = sum(b['total']['duration'] for b in data['buckets'])
total_errors = sum(b['errors']['count'] for b in data['buckets'])

print("📊 MONTHLY COST REPORT")
print("=" * 60)
print(f"Period: {oldest.strftime('%Y-%m-%d')} to {newest.strftime('%Y-%m-%d')}")
print("=" * 60)

print("\n💰 Cost Summary:")
print(f"  Total Cost: ${total_cost:.2f}")
if total_traces > 0:
    print(f"  Average Cost per Request: ${total_cost/total_traces:.6f}")
daily_cost = total_cost / 30
print(f"  Average Daily Cost: ${daily_cost:.2f}")
print(f"  Projected Monthly Cost: ${daily_cost * 30:.2f}")

print("\n📊 Usage Statistics:")
print(f"  Total Requests: {total_traces:,}")
successful = total_traces - total_errors
print(f"  Successful: {successful:,}")
print(f"  Failed: {total_errors:,}")
if total_traces > 0:
    print(f"  Failure Rate: {(total_errors/total_traces)*100:.2f}%")
    print(f"  Average Daily Requests: {total_traces/30:.0f}")

print("\n🎯 Performance Metrics:")
if total_traces > 0:
    print(f"  Average Latency: {total_duration/total_traces:.0f}ms")
print(f"  Total Tokens: {total_tokens:,.0f}")
if total_traces > 0:
    print(f"  Average Tokens per Request: {total_tokens/total_traces:.1f}")
    print(f"  Average Daily Tokens: {total_tokens/30:,.0f}")

# Cost per 1K tokens
if total_tokens > 0:
    cost_per_1k = (total_cost / total_tokens) * 1000
    print(f"  Cost per 1K Tokens: ${cost_per_1k:.4f}")

# Find most expensive days
print("\n📅 Top 5 Most Expensive Days:")
days_with_data = [(b['timestamp'][:10], b['total']['costs'], b['total']['count']) 
                  for b in data['buckets'] if b['total']['count'] > 0]
sorted_days = sorted(days_with_data, key=lambda x: x[1], reverse=True)
for i, (date, cost, count) in enumerate(sorted_days[:5], 1):
    print(f"  {i}. {date}: ${cost:.4f} ({count} requests)")

## Part 8: Compare Week-over-Week Performance

Analyze how metrics change from one week to the next.

In [None]:
# Helper function to get weekly metrics
def get_weekly_metrics(weeks_ago=0):
    newest = datetime.now(timezone.utc) - timedelta(weeks=weeks_ago)
    oldest = newest - timedelta(days=7)
    
    payload = {
        "focus": "trace",
        "interval": 10080,  # Weekly bucket
        "windowing": {
            "oldest": oldest.isoformat(),
            "newest": newest.isoformat()
        }
    }
    
    response = requests.post(BASE_URL, headers=HEADERS, json=payload)
    data = response.json()
    
    if data['buckets']:
        bucket = data['buckets'][0]
        return {
            'count': bucket['total']['count'],
            'costs': bucket['total']['costs'],
            'duration': bucket['total']['duration'],
            'tokens': bucket['total']['tokens'],
            'errors': bucket['errors']['count']
        }
    return None

this_week = get_weekly_metrics(0)
last_week = get_weekly_metrics(1)

def calc_change(current, previous):
    if previous == 0:
        return "N/A"
    change = ((current - previous) / previous) * 100
    symbol = "📈" if change > 0 else "📉" if change < 0 else "➡️"
    return f"{symbol} {change:+.1f}%"

print("📊 Week-over-Week Comparison")
print("=" * 60)

if this_week and last_week:
    print("\n💰 Cost:")
    print(f"  Last Week: ${last_week['costs']:.4f}")
    print(f"  This Week: ${this_week['costs']:.4f}")
    print(f"  Change: {calc_change(this_week['costs'], last_week['costs'])}")

    print("\n📊 Volume:")
    print(f"  Last Week: {last_week['count']:,} requests")
    print(f"  This Week: {this_week['count']:,} requests")
    print(f"  Change: {calc_change(this_week['count'], last_week['count'])}")

    print("\n⚡ Performance:")
    last_avg = last_week['duration'] / last_week['count'] if last_week['count'] > 0 else 0
    this_avg = this_week['duration'] / this_week['count'] if this_week['count'] > 0 else 0
    print(f"  Last Week: {last_avg:.0f}ms")
    print(f"  This Week: {this_avg:.0f}ms")
    print(f"  Change: {calc_change(this_avg, last_avg)}")

    print("\n🚨 Error Rate:")
    last_err_rate = (last_week['errors'] / last_week['count'] * 100) if last_week['count'] > 0 else 0
    this_err_rate = (this_week['errors'] / this_week['count'] * 100) if this_week['count'] > 0 else 0
    print(f"  Last Week: {last_err_rate:.2f}%")
    print(f"  This Week: {this_err_rate:.2f}%")
    print(f"  Change: {calc_change(this_err_rate, last_err_rate)}")
else:
    print("\n⚠️ Not enough data for comparison")

## Part 9: Create Visualizations

Visualize cost and usage trends using matplotlib.

In [None]:
try:
    import matplotlib.pyplot as plt
    import matplotlib.dates as mdates
    from datetime import datetime
    
    # Get daily metrics for last 30 days
    newest = datetime.now(timezone.utc)
    oldest = newest - timedelta(days=30)

    payload = {
        "focus": "trace",
        "interval": 1440,  # Daily buckets
        "windowing": {
            "oldest": oldest.isoformat(),
            "newest": newest.isoformat()
        }
    }

    response = requests.post(BASE_URL, headers=HEADERS, json=payload)
    data = response.json()

    # Extract dates and metrics
    dates = [datetime.fromisoformat(b['timestamp'].replace('Z', '+00:00')) 
             for b in data['buckets'] if b['total']['count'] > 0]
    costs = [b['total']['costs'] for b in data['buckets'] if b['total']['count'] > 0]
    counts = [b['total']['count'] for b in data['buckets'] if b['total']['count'] > 0]

    # Create figure with two subplots
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

    # Plot 1: Daily Cost
    ax1.plot(dates, costs, marker='o', linewidth=2, markersize=4, color='#2563eb')
    ax1.set_title('Daily LLM Costs (Last 30 Days)', fontsize=14, fontweight='bold')
    ax1.set_ylabel('Cost ($)', fontsize=12)
    ax1.grid(True, alpha=0.3)
    ax1.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)

    # Plot 2: Daily Request Volume
    ax2.bar(dates, counts, alpha=0.7, color='steelblue')
    ax2.set_title('Daily Request Volume (Last 30 Days)', fontsize=14, fontweight='bold')
    ax2.set_xlabel('Date', fontsize=12)
    ax2.set_ylabel('Requests', fontsize=12)
    ax2.grid(True, alpha=0.3, axis='y')
    ax2.xaxis.set_major_formatter(mdates.DateFormatter('%m/%d'))
    plt.setp(ax2.xaxis.get_majorticklabels(), rotation=45)

    plt.tight_layout()
    plt.show()
    
    print("✅ Visualizations created successfully!")
    
except ImportError:
    print("⚠️ matplotlib not installed. Run: pip install matplotlib")

## Part 10: Export Data to DataFrame

Convert analytics data to a pandas DataFrame for further analysis.

In [None]:
try:
    import pandas as pd
    
    # Get daily metrics for last 30 days
    newest = datetime.now(timezone.utc)
    oldest = newest - timedelta(days=30)

    payload = {
        "focus": "trace",
        "interval": 1440,  # Daily buckets
        "windowing": {
            "oldest": oldest.isoformat(),
            "newest": newest.isoformat()
        }
    }

    response = requests.post(BASE_URL, headers=HEADERS, json=payload)
    data = response.json()

    # Convert to DataFrame
    rows = []
    for bucket in data['buckets']:
        if bucket['total']['count'] > 0:  # Only include days with data
            rows.append({
                'timestamp': bucket['timestamp'],
                'total_count': bucket['total']['count'],
                'total_cost': bucket['total']['costs'],
                'total_duration': bucket['total']['duration'],
                'total_tokens': bucket['total']['tokens'],
                'error_count': bucket['errors']['count'],
                'error_duration': bucket['errors']['duration'],
                'avg_duration': bucket['total']['duration'] / bucket['total']['count'],
                'avg_cost': bucket['total']['costs'] / bucket['total']['count'],
                'error_rate': (bucket['errors']['count'] / bucket['total']['count'] * 100)
            })

    df = pd.DataFrame(rows)
    df['timestamp'] = pd.to_datetime(df['timestamp'])

    print("📊 Analytics Data Summary\n")
    print(df.describe())
    
    print("\n📅 Recent Days:")
    print(df.tail(10).to_string())
    
    # Optional: Save to CSV
    # df.to_csv('analytics_export.csv', index=False)
    # print("\n✅ Data exported to analytics_export.csv")
    
except ImportError:
    print("⚠️ pandas not installed. Run: pip install pandas")

## Summary

In this tutorial, you learned how to:

1. ✅ **Retrieve aggregated metrics** using the Analytics API
2. ✅ **Track daily costs** and generate spending reports
3. ✅ **Analyze error trends** to identify reliability issues
4. ✅ **Filter by status code** to analyze successful vs failed traces
5. ✅ **Track token usage** patterns over time
6. ✅ **Monitor performance** and latency trends
7. ✅ **Generate monthly reports** with comprehensive cost breakdowns
8. ✅ **Compare week-over-week** metrics to identify trends
9. ✅ **Visualize data** using matplotlib
10. ✅ **Export to DataFrame** for further analysis

## Next Steps

- Learn about [Query API](/observability/query-data/query-api) for detailed trace analysis
- Explore [Using the UI](/observability/using-the-ui/filtering-traces) for visual analytics
- Check out [Semantic Conventions](/observability/concepts/semantic-conventions) for available metrics
- Read about [Cost Tracking](/observability/trace-with-python-sdk/track-costs) for automatic cost calculation