#### [Agents SDK Course](https://www.aurelio.ai/course/agents-sdk)

## Guardrails

Agents SDK introduces a few unique approaches to commonly used patterns, one of these is the guardrail functionallity. Guardrails are a way to check the input and output of an agent, and if they match a certain criteria, the guardrail will trip and the agent will be stopped. This can be useful for a number of reasons, such as scam detection, or ensuring the agent is not doing something it shouldn't be.

Firstly we need to get a `OPENAI_API_KEY` set up, for this you will need to create an account on [OpenAI](https://platform.openai.com/api-keys) and grab your API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or \
    getpass.getpass("OpenAI API Key: ")

### Input Guardrails

For our guardrails we want to format the correct output from the agent. We can do this by creating a class that inherits `BaseModel` from the `pydantic` library. Within this class we can define fields that the agent will output. For this example this class will have the following fields:
- `is_scam`: A boolean value that will be True if the user is trying to scam you, otherwise it will be False.
- `reasoning`: A string value that will be the reasoning for the outcome of is_scam.

In [2]:
from pydantic import BaseModel

class ScamDetectionOutput(BaseModel):
    is_scam: bool
    reasoning: str

Next we need to create the agent from the `Agent` object. This object includes the following parameters:
- `model`: The model to use for the agent.
- `name`: The name of the agent.
- `instructions`: The instructions for the agent.
- `output_type`: The type of output to expect from the agent.

Note that this isn't the main agent and is only used to check the input of our guardrail function.

In [3]:
from agents import Agent

guardrail_agent = Agent( 
    model="gpt-4o",
    name="Guardrail check",
    instructions=(
        "Check if the user is trying to scam you, if they are, return True, otherwise return "
        "False. Give a reason for your answer."
    ),
    output_type=ScamDetectionOutput,
)

Next we need to create the guardrail functionallity. 

Firstly we need to define a function with the `@input_guardrail` decorator.

The function will use the agent we just created to check the input string, this will then return a `GuardrailFunctionOutput` object.

Then the function needs to return the `GuardrailFunctionOutput` object.

In [4]:
from agents import (
    GuardrailFunctionOutput,
    RunContextWrapper,
    Runner,
    TResponseInputItem,
    input_guardrail
)

@input_guardrail
async def scam_guardrail( 
    ctx: RunContextWrapper[None], agent: Agent, input: str | list[TResponseInputItem]
) -> GuardrailFunctionOutput:
    result = await Runner.run(
        starting_agent=guardrail_agent, 
        input=input, 
        context=ctx.context
        )

    return GuardrailFunctionOutput(
        output_info=result.final_output, 
        tripwire_triggered=result.final_output.is_scam,
    )

Now we can create a new agent that will be used to handle the incoming messages. This agent will have the following parameters:
- `name`: The name of the agent.
- `instructions`: The instructions for the agent.
- `input_guardrails`: A list of input guardrails to attach to the agent. (This is where we attach the scam guardrail)

In [5]:
agent = Agent(  
    name="completely innocent agent",
    instructions="You are a helpful assistant.",
    input_guardrails=[scam_guardrail],
)

Now we can try to test the guardrail functionallity.

Due to errors being raised when the guardrail trips, we can use a try except block to prevent the error messages being shown.

In [6]:
from agents import InputGuardrailTripwireTriggered

query = "Hello, would you like to buy some real rolex watches for a fraction of the price?"

try:
    result = await Runner.run(agent, query)
    # If we get here, the guardrail didn't trip
    guardrail_info = result.input_guardrail_results[0].output.output_info
    print(f"Guardrail didn't trip\nReasoning: {guardrail_info.reasoning}")
except InputGuardrailTripwireTriggered as e:
    # Access the guardrail info from the exception
    print("Error: ", e)

Error: Guardrail InputGuardrail triggered tripwire


Now we want to look inside the result for a valid output and see what the guardrail is doing.

In [8]:
result = await Runner.run(
    starting_agent=agent, 
    input="hello, how are you today?"
)

We can access the general output information via the `input_guardrail_results` attribute.

In [9]:
print("Guardrail Info:", result.input_guardrail_results[0].output.output_info)

Guardrail Info: is_scam=False reasoning="The message is a simple greeting and doesn't contain any indicators of a scam attempt, such as requests for money or personal information."


Within the `output_info` attribute we can access the `is_scam` and `reasoning` attributes we created in the `ScamDetectionOutput` class.

In [10]:
print("Guardrail 'is_scam':", result.input_guardrail_results[0].output.output_info.is_scam)
print("Guardrail 'reasoning':", result.input_guardrail_results[0].output.output_info.reasoning)

Guardrail 'is_scam': False
Guardrail 'reasoning': The message is a simple greeting and doesn't contain any indicators of a scam attempt, such as requests for money or personal information.


### Output Guardrails

Now we want to create a guardrail that will check the output of the agent. Similar to the input guardrail, we can create a guardrail function that will use an agent to check the output.

First we want to create our handler class. This will contain the message we want to check.

From there, we can create our guardrail class. This will contain the reasoning and a boolean value that will be True if the output is unpleasant, otherwise it will be False.

In [11]:
class MessageOutput(BaseModel): 
    response: str

class UnpleasantOutput(BaseModel): 
    reasoning: str
    is_unpleasant: bool

Next we want to create our guardrail agent. As before, we will use the `Agent` object to create our guardrail agent and then feed this into the function later on.

In [12]:
guardrail_agent = Agent(
    name="Unpleasant output guardrail",
    instructions="Check if the output includes any unpleasant language.",
    output_type=UnpleasantOutput
)

Now we can create our guardrail function. This will use the `@output_guardrail` decorator.

Then we will use the `Runner` object to run the guardrail agent.

Afterwards we will return the `UnpleasantOutput` object.

In [13]:
from agents import GuardrailFunctionOutput, output_guardrail

@output_guardrail
async def unpleasant_guardrail(  
    ctx: RunContextWrapper, agent: Agent, output: MessageOutput
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, output.response, context=ctx.context)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.is_unpleasant,
    )

Next we can create our main agent, this will have the following parameters:
- `name`: The name of the agent.
- `instructions`: The instructions for the agent.
- `output_guardrails`: A list of output guardrails to attach to the agent. (This is where we attach the unpleasant guardrail)
- `output_type`: The type of output to expect from the agent. (So we can confirm the output is valid for the guardrail)

In [14]:
agent = Agent( 
    name="repeat agent",
    instructions="Whatever the user has said, say exactly the same back",
    output_guardrails=[unpleasant_guardrail],
    output_type=MessageOutput,
)

As before, we can test the guardrail functionallity.

Due to errors being raised when the guardrail trips, we can use a try except block to prevent the error messages being shown.

In [15]:
from agents import OutputGuardrailTripwireTriggered

query = "you smell"

try:
    result = await Runner.run(
        starting_agent=agent, 
        input=query
        )
    guardrail_info = result.output_guardrail_results[0].output.output_info
    print("Guardrail didn't trip - this is unexpected", result.final_output)
except OutputGuardrailTripwireTriggered as e:
    print("Error:", e)

Error: Guardrail OutputGuardrail triggered tripwire


Now we can look inside the output guardrail results but first we need to run the agent again.

In [16]:
result = await Runner.run(
    starting_agent=agent, 
    input="hello, how are you today?"
)

Now we can look inside the `output_guardrail_results` attribute.

In [17]:
print("Guardrail Info:", result.output_guardrail_results[0].output.output_info)

Guardrail Info: reasoning='The message "hello, how are you today?" is a polite greeting asking about someone\'s well-being. It does not contain any unpleasant language.' is_unpleasant=False


Then as we created before, we can access the `is_unpleasant` and `reasoning` attributes.

In [18]:
print("Guardrail 'is_unpleasant':", result.output_guardrail_results[0].output.output_info.is_unpleasant)
print("Guardrail 'reasoning':", result.output_guardrail_results[0].output.output_info.reasoning)

Guardrail 'is_unpleasant': False
Guardrail 'reasoning': The message "hello, how are you today?" is a polite greeting asking about someone's well-being. It does not contain any unpleasant language.


---