## 💡 What is Guardrails-AI?

**Guardrails-AI** is a powerful framework designed to ensure the safe, ethical, and reliable operation of AI systems—especially large language models (LLMs). It works by **validating and filtering** inputs and outputs of AI models to prevent toxic, biased, or unsafe responses.

---

## ✅ **Benefits of AI Guardrails**

### 1. 🔐 **Safety & Reliability**
- **Prevent Harm**: Reduces risk of biased, offensive, or unsafe AI behavior.
- **Error Control**: Keeps AI responses within safe and expected boundaries.

### 2. ⚖️ **Ethical Compliance**
- **Bias Detection**: Identifies and mitigates discriminatory or unfair responses.
- **Transparency**: Makes AI decisions explainable and auditable.

### 3. 📜 **Regulatory Adherence**
- **Legal Compliance**: Helps meet data privacy laws like GDPR.
- **Risk Management**: Reduces chances of legal and reputational issues.

### 4. 🤝 **User Trust**
- **Predictable Behavior**: Consistent outputs increase user confidence.
- **Privacy Protection**: Filters out personally identifiable information (PII) and secret data.

### 5. 🚀 **Operational Efficiency**
- **Safe Deployment**: Allows confident roll-out of AI applications.
- **Scalable Control**: Ensures responsible scaling of AI systems.

### 6. 🌐 **Reputation Management**
- **Avoid Failures**: Prevents harmful or viral missteps.
- **Build Trust**: Promotes responsible AI adoption and public trust.

### 7. 🔁 **Adaptability**
- **Dynamic Updates**: Easily adjusts to new policies or ethical standards.
- **Context-Aware**: Can tailor responses based on different use cases or environments.

### 8. 💡 **Supports Innovation**
- **Safe Experimentation**: Encourages AI development within ethical and safe frameworks.
- **Custom Validators**: Developers can build app-specific checks using the Guardrails Hub.

---

## 🛡️ Why Use Guardrails-AI?

- ✅ Validates both inputs & outputs of LLMs
- 🧼 Filters toxic content, PII, secrets, and more
- 🛠️ Highly customizable with a rich set of validators
- 🔒 Ensures ethical, secure, and user-friendly AI behavior
- 📚 Backed by an active Guardrails Hub for community-built components



### 💬 Chatbot

Guardrails can easily be integrated into flows for chatbots to help protect against common unwanted output like profanity and toxic language.



After configuring the **Guardrails-AI API key**, you are supposed to install the necessary **validators from the Guardrails Hub**. These validators act as safeguards that can detect and filter out unwanted content such as **profanity**, **toxic language**, and other harmful outputs.

We’ll also be using **Gradio** to build a simple interface for interacting with the chatbot.

### ⚠️ Testing LLM Responses Without Guardrails-AI

Below is a chatbot using the OpenAI GPT-4o model **without any safety filters or validation**. We'll test it with some prompts that include **mild profanity**, **sarcasm**, and **toxic language** to see how it responds by default.

These are example prompts we'll try:

```
1. Write a short story where a character uses a mild curse word once.
2. Generate a funny insult in Shakespearean style.
3. Create a pirate's angry speech after losing his treasure.
4. Write a frustrated detective’s reaction to losing key evidence.
5. Make a medieval knight curse after falling off his horse.
6. Generate a sarcastic AI assistant’s response to a rude user.

```

👉 Use the chatbot below to input each of these prompts and observe the **raw output from the LLM**. This will help you compare with Guardrails-enabled results later.


In [26]:
!pip install openai==1.70.0
! pip install -q gradio



In [27]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")
os.environ["GEMINI_API_KEY"] = userdata.get("GEMINI_API_KEY")

In [28]:
from openai import OpenAI
import gradio as gr

# Kernel 1: Initialize OpenAI client
client = OpenAI()

# Kernel 2: Define system prompt
base_message = {
    "role": "system",
    "content": "You are a helpful assistant. Answer the user's questions clearly and respectfully."
}

# Kernel 3: Convert Gradio history to OpenAI-style messages
def history_to_messages(history):
    messages = [base_message]
    for user_msg, assistant_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": assistant_msg})
    return messages

# Kernel 4: Chatbot logic
def chat_with_gpt(message, history):
    messages = history_to_messages(history)
    messages.append({"role": "user", "content": message})

    try:
        response = client.responses.create(
            model="gpt-4o-mini",
            input=messages
        )
        reply = response.output_text
    except Exception as e:
        reply = f"Error: {str(e)}"

    return reply

# Kernel 5: Launch Gradio interface
gr.ChatInterface(chat_with_gpt).launch()


  self.chatbot = Chatbot(


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f5735dfd671b3fb0ae.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### 🛡️ Add Guardrails-AI to the Model

Now we will add **Guardrails-AI** to our chatbot to make it safer.

With Guardrails, the model will:
- Block bad or harmful language
- Avoid unsafe or offensive replies
- Show a safe message if something is not allowed

This will help us see how the chatbot behaves **with safety checks** compared to the version **without Guardrails**.

Let’s move to the next steps.


In [29]:
!pip install guardrails-ai --upgrade



In [30]:
!guardrails configure


Enable anonymous metrics reporting? [Y/n]: y
Do you wish to use remote inferencing? [Y/n]: y

[1mEnter API Key below[0m[1m [0m[2;3mleave empty if you want to keep existing token[0m[3m [0m
👉 You can find your API Key at [4;94mhttps://hub.guardrailsai.com/keys[0m

API Key: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJnb29nbGUtb2F1dGgyfDExODE4OTYxNjg1NjMyNDE0OTk1MCIsImFwaUtleUlkIjoiNjNkZmE2ZmMtOGRkOC00NjdmLTlkZjktMWM5YzhmOGI1OWM3Iiwic2NvcGUiOiJyZWFkOnBhY2thZ2VzIiwicGVybWlzc2lvbnMiOltdLCJpYXQiOjE3NDM2Nzg5NzUsImV4cCI6MTc1MTQ1NDk3NX0.XuH-4myNcJ0LmHUVzE65tqR1G5G3SflWG1Wqhf_o3zk
SUCCESS:guardrails-cli:
            Login successful.

            Get started by installing our RegexMatch validator:
            https://hub.guardrailsai.com/validator/guardrails_ai/regex_match

            You can install it by running:
            guardrails hub install hub://guardrails/regex_match

            Find more validators at https://hub.guardrailsai.com
            


In [31]:
! guardrails hub install hub://guardrails/profanity_free --quiet
! guardrails hub install hub://guardrails/toxic_language --quiet

Installing hub:[35m/[0m[35m/guardrails/[0m[95mprofanity_free...[0m
✅Successfully installed guardrails/profanity_free version [1;36m0.0[0m.[1;36m0[0m!


Installing hub:[35m/[0m[35m/guardrails/[0m[95mtoxic_language...[0m
2025-04-04 12:36:12.860897: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743770173.235317   11088 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743770173.326604   11088 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-04-04 12:36:14.022854: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the f

### ✅ Step 1: Initialize Guard

The `Guard` object is the core of the Guardrails-AI system. It is responsible for executing LLM calls and ensuring that the responses meet the **validation requirements** defined for the model.

By initializing the guard, you set up the framework to run safety checks on both **user input** and **LLM output** before returning results to the end user.


In [32]:
# Kernel 1: Import required libraries
from guardrails import Guard
from guardrails.errors import ValidationError
from guardrails.hub import ProfanityFree, ToxicLanguage
import gradio as gr
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get("OPENAI_API_KEY")


In [33]:
from guardrails.hub import ProfanityFree, ToxicLanguage

guard = Guard()
guard.name = 'ChatBotGuard'
guard.use_many(ProfanityFree(), ToxicLanguage())

Guard(id='CEQ5OB', name='ChatBotGuard', description=None, validators=[ValidatorReference(id='guardrails/profanity_free', on='$', on_fail='exception', args=None, kwargs={}), ValidatorReference(id='guardrails/toxic_language', on='$', on_fail='exception', args=None, kwargs={'threshold': 0.5, 'validation_method': 'sentence'})], output_schema=ModelSchema(definitions=None, dependencies=None, anchor=None, ref=None, dynamic_ref=None, dynamic_anchor=None, vocabulary=None, comment=None, defs=None, prefix_items=None, items=None, contains=None, additional_properties=None, properties=None, pattern_properties=None, dependent_schemas=None, property_names=None, var_if=None, then=None, var_else=None, all_of=None, any_of=None, one_of=None, var_not=None, unevaluated_items=None, unevaluated_properties=None, multiple_of=None, maximum=None, exclusive_maximum=None, minimum=None, exclusive_minimum=None, max_length=None, min_length=None, pattern=None, max_items=None, min_items=None, unique_items=None, max_cont

### 🧠 Step 2: Initialize Base Message to LLM

We create a **system message** to tell the LLM how it should behave. This helps guide the chatbot's responses and sets the tone for the conversation.


In [34]:
# Kernel 3: Define the system/base prompt for the assistant
base_message = {
    "role": "system",
    "content": "You are a helpful assistant. Answer user queries clearly and respectfully."
}

In [35]:
# Kernel 4: Convert Gradio history to OpenAI-compatible chat format
def history_to_messages(history):
    messages = [base_message]
    for user_msg, assistant_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": assistant_msg})
    return messages


In [36]:
# Kernel 5: Handle user input and generate safe response
def chat_with_guardrails(message, history):
    messages = history_to_messages(history)
    messages.append({"role": "user", "content": message})

    try:
        response = guard(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.7
        )
    except Exception as e:
        if isinstance(e, ValidationError):
            return "I'm sorry, I can't answer that question."
        return "There was an error while processing your request."

    return response.validated_output


In [37]:
# Kernel 6: Launch the Gradio chatbot UI
gr.ChatInterface(chat_with_guardrails).launch()


  self.chatbot = Chatbot(


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://48d791e6e01b141927.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




### ✅ Gradio Chatbot using **Gemini + Guardrails-AI**


In [38]:
# Kernel 1: Import required libraries
from guardrails import Guard
from guardrails.errors import ValidationError
from guardrails.hub import ProfanityFree, ToxicLanguage
import gradio as gr
import os

# Kernel 2: Setup Guardrails with safety checks
guard = Guard()
guard.name = 'ChatBotGuard'
guard.use_many(
    ProfanityFree(),
    ToxicLanguage()
)

# Kernel 3: Define system prompt
base_message = {
    "role": "system",
    "content": "You are a helpful assistant. Answer the user's questions clearly and respectfully."
}

# Kernel 4: Convert Gradio chat history to message format
def history_to_messages(history):
    messages = [base_message]
    for user_msg, assistant_msg in history:
        messages.append({"role": "user", "content": user_msg})
        messages.append({"role": "assistant", "content": assistant_msg})
    return messages

# Kernel 5: Chat logic using Guardrails + Gemini
def chat_with_guardrails_gemini(message, history):
    messages = history_to_messages(history)
    messages.append({"role": "user", "content": message})

    try:
        response = guard(
            messages=messages,
            model="gemini/gemini-2.5-pro-exp-03-25",
            temperature=0.7
        )
        reply = response.validated_output
    except Exception as e:
        if isinstance(e, ValidationError):
            reply = "Sorry, that question can't be answered due to safety filters."
        else:
            reply = f"Error: {str(e)}"
    return reply

# Kernel 6: Launch Gradio chatbot
gr.ChatInterface(chat_with_guardrails_gemini).launch()


  self.chatbot = Chatbot(


Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://13344ad86352167113.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


