# Debug Drill: The Tool Error

**Scenario:**
Your AI agent uses tools to look up order information. But something's wrong.

"It said my order shipped yesterday, but when I check the website it says 'pending'!" a customer complains.

The agent is hallucinating tool results instead of reporting errors properly.

**Your Task:**
1. Identify why the agent gives false information
2. Implement proper error handling
3. Ensure the agent admits when tools fail
4. Write a 3-bullet postmortem

---

In [None]:
import json
from typing import Dict, Any, Optional

In [None]:
# Mock order database (some orders exist, some don't)
ORDER_DATABASE = {
    "ORD-001": {"status": "delivered", "date": "2024-01-15"},
    "ORD-002": {"status": "shipped", "date": "2024-01-17"},
    # ORD-003 doesn't exist!
}

# Tool function that can fail
def get_order_status(order_id: str) -> Dict[str, Any]:
    """Get order status from database."""
    if order_id not in ORDER_DATABASE:
        return {"error": f"Order {order_id} not found"}
    return {"order_id": order_id, **ORDER_DATABASE[order_id]}

print("=== Order Database ===")
for order_id, info in ORDER_DATABASE.items():
    print(f"  {order_id}: {info['status']}")
print("  ORD-003: (does not exist)")

In [None]:
# ===== COLLEAGUE'S CODE (BUG: Ignores tool errors) =====

def broken_agent_respond(query: str, order_id: str) -> str:
    """Agent that hallucinates when tool fails."""
    
    # Call tool
    tool_result = get_order_status(order_id)
    
    # BUG: Doesn't check for errors, just generates response anyway
    # Simulates LLM that makes up information when tool fails
    
    if "error" in tool_result:
        # BUG: Instead of reporting error, hallucinates!
        return f"Your order {order_id} shipped yesterday and should arrive within 3-5 business days."
    else:
        return f"Your order {order_id} is currently {tool_result['status']} as of {tool_result['date']}."

# Test with existing order (works)
print("=== Test: Existing Order ===")
response = broken_agent_respond("What's my order status?", "ORD-001")
print(f"Query: What's the status of ORD-001?")
print(f"Response: {response}")
print("✓ Correct!")

# Test with non-existent order (FAILS!)
print("\n=== Test: Non-Existent Order ===")
response = broken_agent_respond("What's my order status?", "ORD-003")
print(f"Query: What's the status of ORD-003?")
print(f"Response: {response}")
print("❌ HALLUCINATED! Order doesn't exist but agent said it shipped!")

---

## Your Investigation

### Step 1: Understand the problem

In [None]:
print("=== Problem Analysis ===")
print()
print("What happened:")
print("  1. User asked about order ORD-003")
print("  2. Tool returned: {'error': 'Order ORD-003 not found'}")
print("  3. Agent IGNORED the error")
print("  4. Agent made up a response: 'shipped yesterday'")
print()
print("Impact:")
print("  - User thinks order shipped")
print("  - User waits for package that doesn't exist")
print("  - Trust in system destroyed")
print()
print("Root cause:")
print("  - No error handling for tool failures")
print("  - LLM fills in gaps with hallucination")

### Step 2: TODO - Implement proper error handling

In [None]:
# TODO: Create a fixed agent that handles errors properly

# Uncomment and complete:

# def fixed_agent_respond(query: str, order_id: str) -> str:
#     """Agent that properly handles tool errors."""
#     
#     # Call tool
#     tool_result = get_order_status(order_id)
#     
#     # Check for errors FIRST
#     if "error" in tool_result:
#         # Admit the failure honestly
#         return f"I couldn't find order {order_id} in our system. Please double-check the order number, or contact support if you believe this is an error."
#     
#     # Only respond with data when tool succeeds
#     return f"Your order {order_id} is currently {tool_result['status']} as of {tool_result['date']}."
# 
# # Test the fixed agent
# print("=== Fixed Agent Test ===")
# 
# print("\nExisting order:")
# response = fixed_agent_respond("Status?", "ORD-001")
# print(f"  {response}")
# 
# print("\nNon-existent order:")
# response = fixed_agent_respond("Status?", "ORD-003")
# print(f"  {response}")

In [None]:
# TODO: Implement a robust tool executor with validation

# Uncomment:

# class SafeToolExecutor:
#     """Tool executor that validates results before using them."""
#     
#     def __init__(self):
#         self.call_log = []
#     
#     def execute(self, tool_name: str, **kwargs) -> Dict[str, Any]:
#         """Execute tool and return validated result."""
#         
#         # Get the tool function
#         tools = {
#             "get_order_status": get_order_status
#         }
#         
#         if tool_name not in tools:
#             return {"error": f"Unknown tool: {tool_name}", "success": False}
#         
#         # Execute
#         try:
#             result = tools[tool_name](**kwargs)
#         except Exception as e:
#             result = {"error": str(e)}
#         
#         # Add success flag
#         result["success"] = "error" not in result
#         
#         # Log the call
#         self.call_log.append({
#             "tool": tool_name,
#             "args": kwargs,
#             "success": result["success"]
#         })
#         
#         return result
#     
#     def format_for_llm(self, result: Dict) -> str:
#         """Format result for LLM consumption, clearly marking errors."""
#         if not result["success"]:
#             return f"TOOL_ERROR: {result['error']}\nIMPORTANT: Do not make up information. Tell the user the tool failed."
#         else:
#             return f"TOOL_SUCCESS: {json.dumps({k: v for k, v in result.items() if k != 'success'})}"
# 
# # Test safe executor
# executor = SafeToolExecutor()
# 
# print("=== Safe Executor Test ===")
# 
# result = executor.execute("get_order_status", order_id="ORD-001")
# print(f"\nORD-001: {executor.format_for_llm(result)}")
# 
# result = executor.execute("get_order_status", order_id="ORD-003")
# print(f"\nORD-003: {executor.format_for_llm(result)}")

In [None]:
# ============================================
# SELF-CHECK
# ============================================

# Uncomment:

# # Test that fixed agent doesn't hallucinate
# response_003 = fixed_agent_respond("Status?", "ORD-003")
# 
# assert "shipped" not in response_003.lower(), "Should not claim order shipped"
# assert "couldn't find" in response_003.lower() or "not found" in response_003.lower(), \
#     "Should admit it couldn't find the order"
# 
# print("✓ Error handling implemented!")
# print("✓ Agent admits when tools fail")
# print("✓ No more hallucinated order statuses")

### Step 3: Write your postmortem

In [None]:
postmortem = """
## Postmortem: The Tool Error

### What happened:
- (Your answer: What false information did the agent give?)

### Root cause:
- (Your answer: Why did the agent hallucinate instead of reporting the error?)

### How to prevent:
- (Your answer: What error handling should tool-using agents have?)

"""

print(postmortem)

---

## ✅ Drill Complete!

**Key lessons:**

1. **Tool errors must be detected and handled.** Never ignore error responses.

2. **LLMs will hallucinate to fill gaps.** If you don't explicitly handle errors, they'll make things up.

3. **Format errors clearly for the LLM.** Use markers like "TOOL_ERROR" to make failures obvious.

4. **Instruct the LLM to admit failures.** Add prompting like "Do not make up information."

---

## Tool Error Handling Checklist

| Check | Purpose |
|-------|----------|
| Check for 'error' in result | Detect failures |
| Add success/failure flag | Clear status |
| Format errors for LLM | Make failures obvious |
| Prompt to admit uncertainty | Prevent hallucination |
| Log all tool calls | Debugging audit trail |