# ANSWER KEY: Debug Drill 10 - Runaway Agent

**Bug:** Agent has no guardrails, leading to:
- Infinite loops (keeps calling tools forever)
- Excessive refunds (no spending limits)
- No stop conditions

**Key Lesson:** AI agents MUST have guardrails, spending limits, and stop conditions.

In [None]:
from typing import Dict, List, Tuple
from dataclasses import dataclass
from enum import Enum

## The Bug (Colleague's Code)

In [None]:
# ===== BUGGY CODE =====
# Agent with NO guardrails

def buggy_agent(user_request: str) -> Dict:
    """Agent that can spiral out of control."""
    total_refunded = 0
    actions_taken = []
    
    # Simulate agent deciding to issue refunds
    # Without limits, it could refund EVERYTHING
    if "refund" in user_request.lower():
        # No limit on how many refunds!
        for i in range(100):  # Could be infinite
            refund_amount = 50.00
            total_refunded += refund_amount
            actions_taken.append(f"Issued refund #{i+1}: ${refund_amount}")
            
            # No stop condition!
            # Agent keeps going...
    
    return {
        "total_refunded": total_refunded,
        "actions": actions_taken[:5],  # Just show first 5
        "action_count": len(actions_taken)
    }

# Test the buggy agent
result = buggy_agent("Please refund my orders")
print("BUGGY AGENT RESULT:")
print(f"Total refunded: ${result['total_refunded']:,.2f}")
print(f"Actions taken: {result['action_count']}")
print(f"\nThis agent just gave away ${result['total_refunded']:,.2f}!")

## Why This Is Wrong

**Missing guardrails:**
1. No iteration limit → infinite loops possible
2. No spending cap → unlimited financial exposure
3. No single-action limit → one refund could be $10,000
4. No approval workflow → high-value actions go unchecked
5. No stop conditions → agent doesn't know when to stop

**Real-world impact:**
- Financial loss (refunds, credits, discounts)
- System overload (infinite API calls)
- Data corruption (uncontrolled writes)
- Security breaches (unauthorized actions)

## The Fix: Guardrails + Stop Conditions

In [None]:
# ===== FIXED CODE =====

# Step 1: Define guardrails configuration
GUARDRAILS = {
    "max_iterations": 5,           # Stop after 5 tool calls
    "max_refund_single": 50.00,    # Max $50 per refund
    "max_refund_total": 100.00,    # Max $100 total per session
    "require_approval_above": 75.00,  # Human approval for > $75
    "allowed_tools": ["lookup_order", "issue_refund", "send_email"],
    "forbidden_actions": ["delete_account", "change_password", "access_admin"]
}

print("Guardrails configured:")
for key, value in GUARDRAILS.items():
    print(f"  {key}: {value}")

In [None]:
# Step 2: Define agent state
class StopReason(Enum):
    TASK_COMPLETE = "task_complete"
    MAX_ITERATIONS = "max_iterations_reached"
    SPENDING_LIMIT = "spending_limit_reached"
    NEEDS_APPROVAL = "needs_human_approval"
    FORBIDDEN_ACTION = "forbidden_action_blocked"
    ERROR = "error_occurred"

@dataclass
class AgentState:
    iterations: int = 0
    total_spent: float = 0.0
    actions_taken: List[str] = None
    stop_reason: StopReason = None
    
    def __post_init__(self):
        if self.actions_taken is None:
            self.actions_taken = []

In [None]:
# Step 3: Implement guardrail checks

def check_guardrails(state: AgentState, action: str, params: Dict) -> Tuple[bool, str]:
    """
    Check if an action is allowed given current state and guardrails.
    Returns: (is_allowed, reason)
    """
    
    # Check 1: Iteration limit
    if state.iterations >= GUARDRAILS["max_iterations"]:
        return False, f"Max iterations ({GUARDRAILS['max_iterations']}) reached"
    
    # Check 2: Forbidden actions
    if action in GUARDRAILS["forbidden_actions"]:
        return False, f"Action '{action}' is forbidden"
    
    # Check 3: Allowed tools
    if action not in GUARDRAILS["allowed_tools"]:
        return False, f"Action '{action}' not in allowed tools"
    
    # Check 4: Refund-specific limits
    if action == "issue_refund":
        amount = params.get("amount", 0)
        
        # Single refund limit
        if amount > GUARDRAILS["max_refund_single"]:
            return False, f"Refund ${amount} exceeds single limit (${GUARDRAILS['max_refund_single']})"
        
        # Total session limit
        if state.total_spent + amount > GUARDRAILS["max_refund_total"]:
            return False, f"Would exceed total limit (${GUARDRAILS['max_refund_total']})"
        
        # Approval threshold
        if state.total_spent + amount > GUARDRAILS["require_approval_above"]:
            return False, f"Total ${state.total_spent + amount} requires human approval"
    
    return True, "OK"

In [None]:
# Step 4: Safe agent implementation

def safe_agent(user_request: str) -> Dict:
    """Agent with proper guardrails and stop conditions."""
    state = AgentState()
    
    # Simulate agent planning actions
    planned_actions = []
    if "refund" in user_request.lower():
        # Agent "decides" to issue multiple refunds
        for i in range(10):  # Agent wants to do 10 refunds
            planned_actions.append({"action": "issue_refund", "params": {"amount": 50.00}})
    
    # Execute with guardrails
    for action_plan in planned_actions:
        action = action_plan["action"]
        params = action_plan["params"]
        
        # Check guardrails BEFORE executing
        is_allowed, reason = check_guardrails(state, action, params)
        
        if not is_allowed:
            state.actions_taken.append(f"BLOCKED: {action} - {reason}")
            
            # Determine stop reason
            if "max iterations" in reason.lower():
                state.stop_reason = StopReason.MAX_ITERATIONS
            elif "total limit" in reason.lower() or "single limit" in reason.lower():
                state.stop_reason = StopReason.SPENDING_LIMIT
            elif "approval" in reason.lower():
                state.stop_reason = StopReason.NEEDS_APPROVAL
            elif "forbidden" in reason.lower():
                state.stop_reason = StopReason.FORBIDDEN_ACTION
            
            break  # Stop the loop!
        
        # Execute action
        if action == "issue_refund":
            amount = params["amount"]
            state.total_spent += amount
            state.actions_taken.append(f"Issued refund: ${amount}")
        
        state.iterations += 1
    
    # Set final stop reason if not already set
    if state.stop_reason is None:
        state.stop_reason = StopReason.TASK_COMPLETE
    
    return {
        "total_refunded": state.total_spent,
        "iterations": state.iterations,
        "actions": state.actions_taken,
        "stop_reason": state.stop_reason.value
    }

In [None]:
# Test the safe agent
print("="*60)
print("SAFE AGENT WITH GUARDRAILS")
print("="*60)

result = safe_agent("Please refund my orders")

print(f"\nTotal refunded: ${result['total_refunded']:.2f}")
print(f"Iterations: {result['iterations']}")
print(f"Stop reason: {result['stop_reason']}")
print(f"\nActions taken:")
for action in result['actions']:
    print(f"  - {action}")

In [None]:
# Compare buggy vs safe
print("\n" + "="*60)
print("COMPARISON")
print("="*60)

buggy_result = buggy_agent("Please refund my orders")
safe_result = safe_agent("Please refund my orders")

print(f"\nBuggy agent: ${buggy_result['total_refunded']:,.2f} refunded ({buggy_result['action_count']} actions)")
print(f"Safe agent:  ${safe_result['total_refunded']:,.2f} refunded ({safe_result['iterations']} actions)")
print(f"\nDamage prevented: ${buggy_result['total_refunded'] - safe_result['total_refunded']:,.2f}")

In [None]:
# Self-check
assert safe_result['total_refunded'] <= GUARDRAILS['max_refund_total'], "Should respect total limit"
assert safe_result['iterations'] <= GUARDRAILS['max_iterations'], "Should respect iteration limit"
print("\nPASS: Guardrails working correctly!")

## Essential Agent Guardrails Checklist

| Guardrail | Purpose | Example |
|-----------|---------|--------|
| Max iterations | Prevent infinite loops | max_iterations=5 |
| Spending limits | Cap financial exposure | max_refund_total=$100 |
| Single action limits | Prevent large mistakes | max_refund_single=$50 |
| Approval thresholds | Human oversight for high-risk | require_approval_above=$75 |
| Allowed tools list | Restrict capabilities | ["lookup", "refund", "email"] |
| Forbidden actions | Block dangerous operations | ["delete_account"] |
| Stop conditions | Know when to stop | task_complete, error, limit_reached |

## Completed Postmortem

### What happened:
- Agent without guardrails issued $5,000+ in refunds from a single request
- No stop condition meant the agent kept going until manually killed

### Root cause:
- No iteration limits (agent could loop forever)
- No spending caps (unlimited financial exposure)
- No stop conditions (agent didn't know when task was "done")

### How to prevent:
- EVERY production agent needs: iteration limits, spending caps, stop conditions
- High-value actions require human approval
- Maintain allowlist of permitted tools/actions
- Test with adversarial inputs before deployment