# Budget-Guarded RAG Agent with AgentGuard + LangChain

RAG agents are expensive. A retrieval-augmented generation pipeline that searches, retrieves, and synthesizes can easily make 10-50 LLM calls per query. Without budget controls, a single stuck query can burn through your API budget.

This notebook shows how to add a hard dollar cap to a LangChain RAG agent using [AgentGuard](https://github.com/bmdhodl/agent47). When the budget is exceeded, the agent stops immediately.

**No API keys required** — this notebook uses a mocked LLM to demonstrate the pattern.

In [None]:
!pip install -q agentguard47

## 1. Set up the budget guard

BudgetGuard enforces a hard dollar limit. When cumulative cost exceeds the limit, it raises `BudgetExceeded` and the agent stops.

In [None]:
from agentguard import BudgetGuard, BudgetExceeded, LoopGuard

budget_guard = BudgetGuard(
    max_cost_usd=1.00,
    warn_at_pct=0.8,
    on_warning=lambda msg: print(f"WARNING: {msg}"),
)

loop_guard = LoopGuard(max_repeats=3, window=6)

print(f"Budget: ${budget_guard._max_cost_usd:.2f}")
print(f"Warning at: 80%")
print(f"Loop detection: max 3 repeats in 6-call window")

## 2. Simulate a RAG agent

This simulates a RAG pipeline making multiple retrieval + synthesis calls. Each step has a realistic cost estimate based on GPT-4o pricing.

In [None]:
# Simulated RAG pipeline steps with realistic costs
rag_steps = [
    {"step": "embed_query", "description": "Embed user query", "cost": 0.001},
    {"step": "retrieve_docs", "description": "Retrieve top-k documents", "cost": 0.002},
    {"step": "rerank", "description": "Rerank retrieved docs", "cost": 0.05},
    {"step": "synthesize_1", "description": "Synthesize chunk 1", "cost": 0.08},
    {"step": "synthesize_2", "description": "Synthesize chunk 2", "cost": 0.08},
    {"step": "synthesize_3", "description": "Synthesize chunk 3", "cost": 0.08},
    {"step": "merge_results", "description": "Merge synthesis results", "cost": 0.12},
    {"step": "fact_check", "description": "Fact-check against sources", "cost": 0.15},
    {"step": "refine", "description": "Refine final answer", "cost": 0.12},
    {"step": "format_output", "description": "Format and cite sources", "cost": 0.09},
    {"step": "follow_up_1", "description": "Generate follow-up query 1", "cost": 0.06},
    {"step": "follow_up_2", "description": "Generate follow-up query 2", "cost": 0.06},
    {"step": "deep_dive", "description": "Deep dive on subtopic", "cost": 0.15},
]

print(f"RAG pipeline: {len(rag_steps)} steps")
print(f"Estimated total cost: ${sum(s['cost'] for s in rag_steps):.2f}")
print(f"Budget: $1.00 — agent will be stopped before completing all steps")

In [None]:
print("Running RAG pipeline with budget guard...\n")

for i, step in enumerate(rag_steps, 1):
    try:
        # Check for loops
        loop_guard.check(step["step"])
        
        # Consume budget
        budget_guard.consume(calls=1, cost_usd=step["cost"])
        
        cost_so_far = budget_guard.state.cost_used
        pct = (cost_so_far / 1.00) * 100
        print(f"  [{i:2d}] {step['description']:<35s} +${step['cost']:.3f}  (total: ${cost_so_far:.3f}, {pct:.0f}%)")
        
    except BudgetExceeded as e:
        print(f"\n  BUDGET EXCEEDED at step {i}: {step['description']}")
        print(f"  {e}")
        print(f"\n  Final stats:")
        print(f"    Total cost:  ${budget_guard.state.cost_used:.3f}")
        print(f"    API calls:   {budget_guard.state.calls_used}")
        print(f"    Steps completed: {i-1} of {len(rag_steps)}")
        break

## 3. Using with a real LangChain agent

In production, replace the simulation with AgentGuard's LangChain callback handler:

```python
from agentguard import Tracer, BudgetGuard, LoopGuard
from agentguard.integrations.langchain import AgentGuardCallbackHandler

tracer = Tracer(service="rag-agent")
handler = AgentGuardCallbackHandler(
    tracer=tracer,
    budget_guard=BudgetGuard(max_cost_usd=5.00),
    loop_guard=LoopGuard(max_repeats=3),
)

# Pass to any LangChain component
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])
retriever = vectorstore.as_retriever(callbacks=[handler])

# The handler auto-tracks cost and detects loops
chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
result = chain.invoke({"query": "What is quantum computing?"})
```

Install with LangChain extras:
```bash
pip install agentguard47[langchain]
```

## Key takeaways

- **BudgetGuard** enforces hard dollar limits — the agent stops immediately when exceeded
- **LoopGuard** catches repeated tool calls before they compound costs
- **Warning callbacks** give you a chance to wrap up gracefully at 80%
- Works with any LangChain component via the callback handler
- Zero dependencies — pure Python stdlib

**Links:**
- [AgentGuard repo](https://github.com/bmdhodl/agent47)
- [LangChain integration docs](https://github.com/bmdhodl/agent47#langchain)
- [Full guard reference](https://github.com/bmdhodl/agent47#guards)