# Example 2: Cost Explosion Diagnosis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Javihaus/agents_observability_bootcamp/blob/main/chapter_01_diagnosing_agent_failures/examples/example_02_cost_explosion_diagnosis.ipynb)

**Instructor demonstration** - Students follow along without running code

---

## Objective

Demonstrate how multi-agent systems experience exponential cost growth through:
- Redundant LLM calls
- Lack of caching
- Retry loops without validation
- Over-specified prompts

**Key lesson**: Small inefficiencies compound to massive costs in production.

---

## Scenario

**Customer Support Multi-Agent System**
- Agent 1: Intent Classifier (determines customer request type)
- Agent 2: Information Retriever (fetches relevant docs)
- Agent 3: Response Generator (creates customer response)

**Version A**: Naive implementation (no caching, full context every call)
**Version B**: Optimized implementation (caching, context compression)

**Hypothesis**: Version A will cost 3-5x more than Version B for identical functionality.

## Setup

In [None]:
# Install dependencies
!pip install -q langchain==0.1.0 langchain-anthropic==0.1.1 anthropic==0.18.1
!pip install -q python-dotenv pandas matplotlib

print("Installation complete!")

In [None]:
from google.colab import userdata
from langchain_anthropic import ChatAnthropic
from langchain.schema import HumanMessage, SystemMessage
import pandas as pd
import matplotlib.pyplot as plt
from typing import Dict, List
import time

# Get API key (instructor's key)
ANTHROPIC_API_KEY = userdata.get('ANTHROPIC_API_KEY')

print("Imports successful!")

## Cost Tracking Infrastructure

We'll track every LLM call to measure cumulative costs.

In [None]:
class CostTracker:
    """Track LLM API costs across multiple agent calls."""
    
    # Claude Sonnet 4 pricing (per million tokens)
    INPUT_COST_PER_MILLION = 3.0
    OUTPUT_COST_PER_MILLION = 15.0
    
    def __init__(self):
        self.calls = []
        
    def track_call(self, agent_name: str, input_tokens: int, output_tokens: int):
        """Record a single LLM call."""
        input_cost = (input_tokens / 1_000_000) * self.INPUT_COST_PER_MILLION
        output_cost = (output_tokens / 1_000_000) * self.OUTPUT_COST_PER_MILLION
        total_cost = input_cost + output_cost
        
        self.calls.append({
            'agent': agent_name,
            'input_tokens': input_tokens,
            'output_tokens': output_tokens,
            'input_cost': input_cost,
            'output_cost': output_cost,
            'total_cost': total_cost
        })
        
    def get_total_cost(self) -> float:
        """Calculate cumulative cost across all calls."""
        return sum(call['total_cost'] for call in self.calls)
    
    def get_cost_by_agent(self) -> Dict[str, float]:
        """Calculate cost breakdown by agent."""
        agent_costs = {}
        for call in self.calls:
            agent = call['agent']
            if agent not in agent_costs:
                agent_costs[agent] = 0
            agent_costs[agent] += call['total_cost']
        return agent_costs
    
    def get_summary_df(self) -> pd.DataFrame:
        """Return DataFrame with all calls."""
        return pd.DataFrame(self.calls)
    
    def print_summary(self):
        """Print cost summary."""
        total = self.get_total_cost()
        by_agent = self.get_cost_by_agent()
        
        print("=" * 60)
        print("COST SUMMARY")
        print("=" * 60)
        print(f"\nTotal calls: {len(self.calls)}")
        print(f"Total cost: ${total:.6f}\n")
        print("Cost by agent:")
        for agent, cost in sorted(by_agent.items(), key=lambda x: x[1], reverse=True):
            print(f"  {agent}: ${cost:.6f}")

print("CostTracker class defined")

## Sample Customer Queries

We'll process 5 customer support tickets to compare costs.

In [None]:
# Sample customer support queries
customer_queries = [
    {
        'id': 1,
        'query': 'I forgot my password and the reset email is not arriving. Please help!',
        'expected_intent': 'account_access'
    },
    {
        'id': 2,
        'query': 'Your product charged my credit card twice for the same order. I need a refund.',
        'expected_intent': 'billing_issue'
    },
    {
        'id': 3,
        'query': 'How do I export my data to CSV format? I cannot find the option.',
        'expected_intent': 'feature_question'
    },
    {
        'id': 4,
        'query': 'The mobile app keeps crashing when I try to upload photos.',
        'expected_intent': 'technical_issue'
    },
    {
        'id': 5,
        'query': 'I want to cancel my subscription and delete my account permanently.',
        'expected_intent': 'account_cancellation'
    }
]

print(f"Loaded {len(customer_queries)} sample queries")

## Version A: Naive Implementation (High Cost)

This version has common anti-patterns:
- No caching (repeats identical calls)
- Full context in every prompt
- No token usage optimization

In [None]:
# Initialize LLM and tracker for Version A
llm = ChatAnthropic(
    model="claude-sonnet-4-20250514",
    anthropic_api_key=ANTHROPIC_API_KEY,
    max_tokens=300,
    temperature=0
)

tracker_a = CostTracker()

# Knowledge base (will be included in EVERY prompt in naive version)
knowledge_base = """COMPANY KNOWLEDGE BASE:

Account Access Issues:
- Password reset emails may take up to 10 minutes to arrive
- Check spam folder for reset emails
- Alternative: Reset via SMS to registered phone number
- Contact support if issue persists after 24 hours

Billing Issues:
- Duplicate charges are automatically refunded within 3-5 business days
- Check bank statement for pending reversals
- Contact billing@company.com for immediate assistance
- Include transaction IDs in refund requests

Feature Questions:
- CSV export: Dashboard → Reports → Export dropdown → Select CSV
- Available in Pro and Enterprise plans only
- Free users can upgrade or export to PDF
- Tutorial: help.company.com/export-data

Technical Issues:
- Mobile app crashes: Clear app cache, reinstall if needed
- Photo upload: Ensure file size < 10MB, JPG/PNG only
- Try web version as temporary workaround
- Report bugs: support@company.com with device info

Account Cancellation:
- Cancel anytime: Settings → Subscription → Cancel
- Data deletion: Account → Privacy → Delete account
- Warning: Deletion is permanent and irreversible
- Export data before deletion if needed
"""

print("Version A initialized")
print(f"Knowledge base size: {len(knowledge_base)} characters")

In [None]:
print("=" * 60)
print("VERSION A: NAIVE IMPLEMENTATION")
print("=" * 60)

version_a_responses = []

for query_data in customer_queries:
    query_id = query_data['id']
    query_text = query_data['query']
    
    print(f"\n--- Processing Query {query_id} ---")
    print(f"Customer: {query_text}")
    
    # Agent 1: Intent Classifier (INCLUDES FULL KNOWLEDGE BASE - wasteful!)
    intent_prompt = f"""{knowledge_base}

Customer query: {query_text}

Classify the intent as one of: account_access, billing_issue, feature_question, technical_issue, account_cancellation

Respond with just the intent category."""
    
    intent_messages = [
        SystemMessage(content="You are an intent classification agent."),
        HumanMessage(content=intent_prompt)
    ]
    
    intent_response = llm.invoke(intent_messages)
    
    # Track tokens (simulated - in real implementation, use response metadata)
    intent_input_tokens = len(intent_prompt.split()) * 1.3  # Rough approximation
    intent_output_tokens = len(intent_response.content.split()) * 1.3
    tracker_a.track_call("Intent Classifier", int(intent_input_tokens), int(intent_output_tokens))
    
    intent = intent_response.content.strip()
    print(f"Intent: {intent}")
    
    # Agent 2: Information Retriever (ALSO INCLUDES FULL KNOWLEDGE BASE - redundant!)
    retrieval_prompt = f"""{knowledge_base}

Customer query: {query_text}
Intent: {intent}

Extract the relevant information from the knowledge base for this query."""
    
    retrieval_messages = [
        SystemMessage(content="You are an information retrieval agent."),
        HumanMessage(content=retrieval_prompt)
    ]
    
    retrieval_response = llm.invoke(retrieval_messages)
    
    retrieval_input_tokens = len(retrieval_prompt.split()) * 1.3
    retrieval_output_tokens = len(retrieval_response.content.split()) * 1.3
    tracker_a.track_call("Information Retriever", int(retrieval_input_tokens), int(retrieval_output_tokens))
    
    # Agent 3: Response Generator (AGAIN WITH FULL KNOWLEDGE BASE!)
    response_prompt = f"""{knowledge_base}

Customer query: {query_text}
Intent: {intent}
Relevant information: {retrieval_response.content}

Generate a helpful, professional customer support response."""
    
    response_messages = [
        SystemMessage(content="You are a customer support response agent."),
        HumanMessage(content=response_prompt)
    ]
    
    final_response = llm.invoke(response_messages)
    
    response_input_tokens = len(response_prompt.split()) * 1.3
    response_output_tokens = len(final_response.content.split()) * 1.3
    tracker_a.track_call("Response Generator", int(response_input_tokens), int(response_output_tokens))
    
    version_a_responses.append(final_response.content)
    print(f"\nResponse: {final_response.content[:100]}...")

print("\n" + "=" * 60)
tracker_a.print_summary()
print("=" * 60)

## Version B: Optimized Implementation (Low Cost)

This version implements cost optimizations:
- Caching of intent classification
- Only include relevant knowledge base sections
- Compressed context

In [None]:
# Initialize tracker for Version B
tracker_b = CostTracker()

# Knowledge base split by category (for targeted retrieval)
kb_sections = {
    'account_access': """Account Access Issues:
- Password reset emails may take up to 10 minutes
- Check spam folder
- Alternative: SMS reset
- Contact support if issue persists after 24 hours""",
    
    'billing_issue': """Billing Issues:
- Duplicate charges refunded within 3-5 business days
- Check for pending reversals
- Contact billing@company.com with transaction IDs""",
    
    'feature_question': """Feature Questions:
- CSV export: Dashboard → Reports → Export → CSV
- Pro/Enterprise only
- Tutorial: help.company.com/export-data""",
    
    'technical_issue': """Technical Issues:
- Clear cache, reinstall app
- Photos: <10MB, JPG/PNG only
- Try web version
- Report bugs: support@company.com""",
    
    'account_cancellation': """Account Cancellation:
- Cancel: Settings → Subscription → Cancel
- Delete: Account → Privacy → Delete account
- Warning: Permanent, irreversible
- Export data first if needed"""
}

# Simple cache for identical queries
intent_cache = {}

print("Version B initialized")
print("Optimizations: Caching, targeted retrieval, context compression")

In [None]:
print("=" * 60)
print("VERSION B: OPTIMIZED IMPLEMENTATION")
print("=" * 60)

version_b_responses = []

for query_data in customer_queries:
    query_id = query_data['id']
    query_text = query_data['query']
    
    print(f"\n--- Processing Query {query_id} ---")
    print(f"Customer: {query_text}")
    
    # Agent 1: Intent Classifier (NO knowledge base, just classification)
    # Check cache first
    if query_text in intent_cache:
        intent = intent_cache[query_text]
        print(f"Intent (CACHED): {intent}")
        # No LLM call needed - zero cost!
    else:
        intent_prompt = f"""Classify this customer query into one of these categories:
account_access, billing_issue, feature_question, technical_issue, account_cancellation

Query: {query_text}

Respond with just the category name."""
        
        intent_messages = [
            SystemMessage(content="You are an intent classification agent."),
            HumanMessage(content=intent_prompt)
        ]
        
        intent_response = llm.invoke(intent_messages)
        intent = intent_response.content.strip()
        
        # Cache result
        intent_cache[query_text] = intent
        
        # Track tokens
        intent_input_tokens = len(intent_prompt.split()) * 1.3
        intent_output_tokens = len(intent_response.content.split()) * 1.3
        tracker_b.track_call("Intent Classifier", int(intent_input_tokens), int(intent_output_tokens))
        
        print(f"Intent: {intent}")
    
    # Agent 2: Information Retriever (ONLY relevant KB section)
    relevant_kb = kb_sections.get(intent, "")
    
    retrieval_prompt = f"""Knowledge base:
{relevant_kb}

Query: {query_text}

Extract relevant info."""
    
    retrieval_messages = [
        SystemMessage(content="Extract relevant info."),
        HumanMessage(content=retrieval_prompt)
    ]
    
    retrieval_response = llm.invoke(retrieval_messages)
    
    retrieval_input_tokens = len(retrieval_prompt.split()) * 1.3
    retrieval_output_tokens = len(retrieval_response.content.split()) * 1.3
    tracker_b.track_call("Information Retriever", int(retrieval_input_tokens), int(retrieval_output_tokens))
    
    # Agent 3: Response Generator (compressed context)
    response_prompt = f"""Query: {query_text}
Info: {retrieval_response.content}

Generate professional support response."""
    
    response_messages = [
        SystemMessage(content="You are a customer support agent."),
        HumanMessage(content=response_prompt)
    ]
    
    final_response = llm.invoke(response_messages)
    
    response_input_tokens = len(response_prompt.split()) * 1.3
    response_output_tokens = len(final_response.content.split()) * 1.3
    tracker_b.track_call("Response Generator", int(response_input_tokens), int(response_output_tokens))
    
    version_b_responses.append(final_response.content)
    print(f"\nResponse: {final_response.content[:100]}...")

print("\n" + "=" * 60)
tracker_b.print_summary()
print("=" * 60)

## Cost Comparison Analysis

In [None]:
print("=" * 60)
print("COST COMPARISON: VERSION A vs VERSION B")
print("=" * 60)

cost_a = tracker_a.get_total_cost()
cost_b = tracker_b.get_total_cost()
savings = cost_a - cost_b
multiplier = cost_a / cost_b if cost_b > 0 else 0

print(f"\nVersion A (Naive): ${cost_a:.6f}")
print(f"Version B (Optimized): ${cost_b:.6f}")
print(f"\nAbsolute savings: ${savings:.6f}")
print(f"Cost multiplier: {multiplier:.2f}x")
print(f"Percentage reduction: {(savings/cost_a)*100:.1f}%")

# Scale to production
queries_per_day = 10000
monthly_cost_a = cost_a * queries_per_day * 30
monthly_cost_b = cost_b * queries_per_day * 30
monthly_savings = monthly_cost_a - monthly_cost_b

print("\n" + "=" * 60)
print("PRODUCTION COST PROJECTION (10,000 queries/day)")
print("=" * 60)
print(f"\nVersion A: ${monthly_cost_a:,.2f}/month")
print(f"Version B: ${monthly_cost_b:,.2f}/month")
print(f"\nMonthly savings: ${monthly_savings:,.2f}")
print(f"Annual savings: ${monthly_savings * 12:,.2f}")

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Per-query cost comparison
ax1.bar(['Version A\n(Naive)', 'Version B\n(Optimized)'], [cost_a, cost_b], color=['red', 'green'])
ax1.set_ylabel('Cost per 5 queries ($)')
ax1.set_title('Per-Query Cost Comparison')
ax1.axhline(y=cost_a, color='r', linestyle='--', alpha=0.3)

# Monthly cost projection
ax2.bar(['Version A\n(Naive)', 'Version B\n(Optimized)'], [monthly_cost_a, monthly_cost_b], color=['red', 'green'])
ax2.set_ylabel('Monthly Cost ($)')
ax2.set_title('Production Cost Projection (10k queries/day)')
ax2.ticklabel_format(style='plain', axis='y')

plt.tight_layout()
plt.show()

print("\n" + "=" * 60)
print("KEY INSIGHT")
print("=" * 60)
print("""
Simple optimizations (caching, context compression, targeted retrieval)
can reduce costs by 60-80% with NO functionality loss.

Most costly anti-patterns:
1. Including full knowledge base in every prompt
2. No caching of repeated operations  
3. Redundant context across agents

These compound in production to massive waste.
""")

---

## Instructor Notes

### Teaching Strategy

**Before running**: Ask students to estimate cost difference
- Most underestimate by 50%+
- Reality check is powerful motivator

**During execution**: Point out anti-patterns
- Highlight when full KB is included unnecessarily
- Show token counts growing
- Emphasize redundancy

**After completion**: Scale to production numbers
- Monthly costs make impact concrete
- Annual savings justify optimization effort

### Common Student Questions

**Q: Isn't caching just moving the problem?**
A: Yes, but cache hits are free. 80% hit rate = 80% cost reduction for that operation.

**Q: What about cache invalidation?**
A: Valid concern. Chapter 3 covers cache strategies. For now, focus on identifying cost drivers.

**Q: Does compression hurt quality?**
A: Rarely. Most prompts have unnecessary verbosity. Test quality vs cost tradeoff.

**Q: How do we know what to optimize first?**
A: Chapter 2 builds monitoring to identify highest-cost agents. Optimize those first.

### Time Management

- Setup: 2 minutes
- Version A demo: 5 minutes
- Version B demo: 5 minutes
- Comparison analysis: 5 minutes
- Discussion: 5 minutes
- **Total: 22 minutes**

### Variations

If time permits:
- Show retry loop anti-pattern (3x multiplier)
- Demonstrate unbounded recursion
- Compare different caching strategies

### Transition to Example 3

"We've seen how costs explode. Now let's see why prompts fail unpredictably when you change format..."