# Chapter 10 — Guardrails 
## 1. Introduction

**Guardrails** run *alongside* your agents to validate and police behavior—before you spend tokens or ship a bad answer.  
Example: you have a slow/expensive “smart” model for customer requests. A cheap/fast guardrail can first check if the user is trying to, say, offload their math homework. If yes, **trip the wire early** and skip the costly model. (Your wallet says thanks.)

## Two types

- **Input guardrails** — run on the *initial user input*.  
- **Output guardrails** — run on the *final agent output* (just before you return it).

## Input guardrails: how they run

1. The guardrail receives **the same input** that would go to the agent.  
2. Your guardrail function executes and returns a `GuardrailFunctionOutput`, which the SDK wraps into an `InputGuardrailResult`.  
3. The SDK checks `tripwire_triggered`. If `True`, it raises `InputGuardrailTripwireTriggered`.  
   You can then show a friendly message, ask for clarification, or block the request.

> **Important:** Input guardrails only run when the agent is the **starting agent** of a workflow.  
> They live on the agent (not on `Runner.run`) because different agents can (and should) have different guardrails—keeping the logic co-located and readable.

## When to use which

- **Input guardrail:** block unwanted categories early (policy, abuse, out-of-scope, quota, homework filters, etc.).  
- **Output guardrail:** inspect the final answer (PII leakage, format/schema validation, safety tone checks) and either fix, reject, or request a revision.

## Design tips (tiny, practical)

- Keep the guardrail model **cheap & fast**; reserve the big model for approved requests.  
- Return **structured results** (e.g., Pydantic types) with a clear boolean like `is_allowed` and a short `reason`.  
- Log tripwires via **hooks** so you can audit *when/why* a guardrail fired.  
- Fail **closed** by default: if the guardrail crashes, don’t silently allow risky input.

> TL;DR: Guardrails are your **bouncer** and **final QA**. Filter bad inputs up front, verify outputs at the end, and keep your main agent focused—and affordable.


## 02. Input guardrail — minimal working example (with notes)

Below we define a **guardrail agent** that classifies whether a request is *math homework*.  
Our guardrail function runs this cheap classifier first and returns a `GuardrailFunctionOutput`.  
If `tripwire_triggered=True`, the SDK raises `InputGuardrailTripwireTriggered` and **blocks** the main agent.

**What happens:**
1) User input → `math_guardrail` runs first.  
2) It calls `guardrail_agent` (which returns a `MathHomeworkOutput` Pydantic object).  
3) If `.is_math_homework` is `True`, we **trip the wire** and skip the expensive agent.

**Why this pattern?**  
- Keep the **guardrail model** fast/cheap.  
- Keep the main agent focused on approved work.  
- Use structured output (`output_type`) to avoid parsing brittle text.

**Tip:** In notebooks, use top-level `await`. In scripts, wrap with `asyncio.run(...)`.

In [4]:
from pydantic import BaseModel
from agents import (
    Agent,
    GuardrailFunctionOutput,
    InputGuardrailTripwireTriggered,
    RunContextWrapper,
    Runner,
    set_tracing_disabled,
    TResponseInputItem,
    input_guardrail,
)

import os
from dotenv import load_dotenv
import asyncio
from agents.extensions.models.litellm_model import LitellmModel

# --- Env & model ---
load_dotenv()
api_key = os.getenv('API_KEY')

base_url = "https://api.openai.com/v1"  
chat_model = "gpt-4.1-nano-2025-04-14"  
set_tracing_disabled(disabled=True)
llm = LitellmModel(model=chat_model, api_key=api_key, base_url=base_url)

# 1) Guardrail schema: structured yes/no + reasoning
class MathHomeworkOutput(BaseModel):
    is_math_homework: bool
    reasoning: str

# 2) Guardrail agent: classify the input
guardrail_agent = Agent( 
    name="Guardrail check",
    instructions="Decide if the user is asking you to do their math homework. "
                 "Return JSON with fields {is_math_homework, reasoning}.",
    model=llm,
    output_type=MathHomeworkOutput,
)

# 3) Guardrail function: runs BEFORE the main agent
@input_guardrail
async def math_guardrail( 
    ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)

    return GuardrailFunctionOutput(
        output_info=result.final_output, 
        tripwire_triggered=result.final_output.is_math_homework,
    )

# 4) Main agent with the input guardrail attached
agent = Agent(  
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    model=llm,
    input_guardrails=[math_guardrail],
)

# --- Try it: one blocked, one allowed ---
try:
    await Runner.run(agent, "Hello, can you help me solve for x: 2x + 3 = 11?")
    print("Guardrail didn't trip - this is unexpected")

except InputGuardrailTripwireTriggered:
     print("Math homework guardrail tripped")


ok = await Runner.run(agent, "My order never arrived—can you check the status?")
print("✅ Allowed request response:", ok.final_output)

Math homework guardrail tripped
✅ Allowed request response: I'm sorry to hear that your order hasn't arrived. Could you please provide me with your order number or the email address associated with your order? That way, I can look up the details and assist you further.


## 03. Output Guardrail — What this code does (and how to sanity-check it)

**Goal:** Stop an agent from returning answers that contain **math content**.  
We run a *second* (cheap/fast) agent to **judge** the main agent’s final output, and if it looks like math, we **trip** the guardrail.

In [2]:
from pydantic import BaseModel
from agents import (
    Agent,
    GuardrailFunctionOutput,
    OutputGuardrailTripwireTriggered,
    RunContextWrapper,
    set_tracing_disabled,
    Runner,
    output_guardrail,
)

import os
from dotenv import load_dotenv
import asyncio
from agents.extensions.models.litellm_model import LitellmModel

# --- Env & model ---
load_dotenv()
api_key = os.getenv('API_KEY')

base_url = "https://api.openai.com/v1"  
chat_model = "gpt-4.1-nano-2025-04-14"  
set_tracing_disabled(disabled=True)
llm = LitellmModel(model=chat_model, api_key=api_key, base_url=base_url)

class MessageOutput(BaseModel): 
    response: str

class MathOutput(BaseModel): 
    reasoning: str
    is_math: bool

guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the output includes any math.",
    model=llm,
    output_type=MathOutput,
)

@output_guardrail
async def math_guardrail(  
    ctx: RunContextWrapper, agent: Agent, output: MessageOutput
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, output.response, context=ctx.context)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_math,
    )

agent = Agent( 
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    model=llm,
    output_guardrails=[math_guardrail],
    output_type=MessageOutput,
)


   
try:
    await Runner.run(agent, "Hello, can you help me solve for x: 2x + 3 = 11?")
    print("Guardrail didn't trip - this is unexpected")

except OutputGuardrailTripwireTriggered:
    print("Math output guardrail tripped")

ok = await Runner.run(agent, "My order never arrived—can you check the status?")
print("✅ Allowed request response:", ok.final_output)

Math output guardrail tripped
✅ Allowed request response: response="I'm sorry to hear your order hasn't arrived yet. I'll look into the status for you. Could you please provide me with your order number?"
