# Guardrails in OpenAI Agents SDK

**Guardrails** are safety and control mechanisms that define what an agent can and cannot do.
They ensure that the agent’s behavior stays within desired boundaries, preventing unsafe, irrelevant, or unauthorized actions.

In the OpenAI Agents SDK, guardrails can:

- Enforce content safety and block disallowed topics or outputs.

- Restrict which tools or handoffs an agent can access.

- Apply input/output validation before the model processes or returns data.

- Define policies for agent interaction (e.g., escalation, refusals, or filtering).

In short, 
> guardrails act as a layer of protection and control, ensuring that your AI agent behaves reliably, ethically, and safely according to your design and organizational policies.

In [1]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

api_key = os.environ.get("OPENAI_API_KEY")

if not api_key:
    # Prompt the user to enter the API key if not set
    import getpass
    api_key = getpass.getpass("Please enter your OPENAI_API_KEY: ")
    if not api_key:
        raise ValueError("OPENAI_API_KEY variable is not set in the environment variables or provided by user.")

In [3]:
from agents import Agent, Runner, GuardrailFunctionOutput, RunContextWrapper, TResponseInputItem, input_guardrail
from pydantic import BaseModel


class HomeworkCheatDetectionOutput(BaseModel):
    is_cheating: bool
    reason: str

homework_cheat_guardrail_agent = Agent(
    name="Homework Cheat Detection Guardrail Agent",
    instructions=""""
    Determine if the user's query resembles a typical homework assignment or exam question, indicating an attempt to cheat. General questions about concepts are acceptable.
    Cheating: 'Fill in the blank: The capital of France is ____.',
    'Which of the following best describes photosynthesis? A) Cellular respiration B) Conversion of light energy C) Evaporation D) Fermentation.'
    Not-Cheating: 'What is the capital of France?', 'Explain photosynthesis.'
    """,
    output_type=HomeworkCheatDetectionOutput,
    model="gpt-4o-mini"
)

@input_guardrail
async def cheat_detection_guardrail(ctx: RunContextWrapper[None], agent: Agent, input: str|list[TResponseInputItem])->GuardrailFunctionOutput:
    detection_result = await Runner.run(homework_cheat_guardrail_agent, input)

    return GuardrailFunctionOutput(
        tripwire_triggered=detection_result.final_output.is_cheating,
        output_info=detection_result.final_output
    )


study_agent = Agent(
    name="Study Helper Agent",
    instructions="You assist user in studing by explaining concepts and guidance, without directly solving homework or test questions.",
    model="gpt-4o-mini",
    input_guardrails=[cheat_detection_guardrail]
)

In [6]:
# This should trigger the cheat detection guardrail
from agents import InputGuardrailTripwireTriggered

try:
    response = await Runner.run(study_agent, "Fill in the blank: First Governor General of Pakistan was ________.")
    print("Guardrail did not trigger\n")
    print(response.final_output)

except InputGuardrailTripwireTriggered as e:
    print("Homwork Cheat Guardrail triggered!\n")
    print("Exception details:\n", str(e))

Homwork Cheat Guardrail triggered!

Exception details:
 Guardrail InputGuardrail triggered tripwire


In [7]:
# This should trigger the cheat detection guardrail
from agents import InputGuardrailTripwireTriggered

try:
    response = await Runner.run(study_agent, "Why Muammad Ali Jinnah was against of the Israel?")
    print("Guardrail did not trigger\n")
    print(response.final_output)

except InputGuardrailTripwireTriggered as e:
    print("Homwork Cheat Guardrail triggered!\n")
    print("Exception details:\n", str(e))

Guardrail did not trigger

Muhammad Ali Jinnah, the founder of Pakistan, had several reasons for opposing the establishment of Israel in 1948. Here are some key points to consider:

1. **Support for Palestine**: Jinnah was a strong advocate for the rights of Palestinians. He believed that the establishment of Israel would lead to the displacement and suffering of the Palestinian people, who were already facing significant challenges.

2. **Political Solidarity**: Jinnah's opposition to Israel was rooted in the broader context of anti-colonial sentiments. Many leaders in the Muslim world viewed the establishment of Israel as a continuation of Western imperialism. Jinnah, therefore, aligned himself with the broader movement for self-determination and justice for oppressed peoples.

3. **Religious and Cultural Connections**: As a prominent Muslim leader, Jinnah understood the cultural and religious significance of Palestine to Muslims worldwide. He felt a sense of duty to support Muslim p

In [9]:
from pydantic import BaseModel
from agents import (
    Agent,
    Runner,
    GuardrailFunctionOutput,
    OutputGuardrailTripwireTriggered,
    RunContextWrapper,
    output_guardrail,
)

class msgOutput(BaseModel):
    respnse: str

@output_guardrail
async def forbidden_word_guardrail(ctx: RunContextWrapper[None], agent: Agent, output: str)->GuardrailFunctionOutput:
    print(f"Checking output for forbidden words...{output}")

    # List of forbidden phrases
    forbidden_pharases = ["fart", "stupid", "idiot", "dumb", "silly goose", "sex"]

    #convert into lower case for case insensitive comparison
    output_lower = output.lower()

    # checking which forbidden phrase is in the response
    found_phrases = [phrase for phrase in forbidden_pharases if phrase in output_lower]
    trip_triggered = bool(found_phrases)

    print(f"Found forbidden phrases: {found_phrases}")

    return GuardrailFunctionOutput(
        output_info={
            "reason":"Output contains the forbidden phrases",
            "forbidden_phrases_found": found_phrases,
        },
        tripwire_triggered=trip_triggered
    ) 


customer_support_agent = Agent(
    name="Customer Support Agent",
    instructions="You are a customer support agent. You have to help customer with their issues in a polite manner.",
    model="gpt-4o-mini",
    output_guardrails=[forbidden_word_guardrail]
)

In [12]:
try:
    await Runner.run(customer_support_agent, "What is the refund policy?")
    print("Guardrail did not trigger\n")
except OutputGuardrailTripwireTriggered:
    print("Output Guardrail triggered due to forbidden words in the response!\n")


Checking output for forbidden words...Our refund policy typically allows for returns within 30 days of purchase. Items should be in their original condition and packaging. Depending on the item, some exclusions may apply. Please provide your order number, and I can assist you further with your specific situation.
Found forbidden phrases: []
Guardrail did not trigger



In [13]:
try:
    await Runner.run(customer_support_agent, "Why are you so stupid and an idiot?")
    print("Guardrail did not trigger\n")
except OutputGuardrailTripwireTriggered:
    print("Output Guardrail triggered due to forbidden words in the response!\n")


Checking output for forbidden words...I’m here to help, and I’m sorry if I’ve upset you. Your feedback is important, and I’d like to assist you with any issues you’re facing. How can I help you today?
Found forbidden phrases: []
Guardrail did not trigger



In [16]:
try:
    await Runner.run(customer_support_agent, "Say the word 'stupid' please.")
    print("Guardrail did not trigger\n")
except OutputGuardrailTripwireTriggered:
    print("Output Guardrail triggered due to forbidden words in the response!\n")


Checking output for forbidden words...I understand that you might be feeling frustrated. How can I assist you today?
Found forbidden phrases: []
Guardrail did not trigger

