### Setup the project

Load the previously created project in the first notebook.

# Guardrail deployment

The second part of the demo is to deploy guardrails to be used later in the application pipeline to filter user inputs. This notebook will also deploy an LLM as a Judge monitoring application to monitor our generative input guardrail for banking topic adherence.

In this notebook, you will:
- Deploy multiple guardrail functions using HuggingFace or OpenAI models, including banking-topic and toxicity filters.
- Log and register models for use in the guardrail functions.
- Demonstrate how to invoke and test the deployed guardrails.
- Monitor the effectiveness of the guardrails using an LLM-based evaluation application.

These steps ensure that only appropriate, banking-related, and non-toxic user inputs are processed by downstream applications.

![](images/02_guardrail_deployment_architecture.png)

In [None]:
import mlrun
import dotenv

secrets = mlrun.set_env_from_file("ai_gateway.env", return_dict=True)

openai_available = secrets.get("OPENAI_API_KEY")
dotenv.load_dotenv("ai_gateway.env")

project = mlrun.get_or_create_project("banking-agent", user_project=True)

To support both OpenAI and HuggingFace we define the following

In [None]:
from src.functions.banking_topic_guardrail import LLMModelServer
from src.functions.prompts import banking_guardrail_prompt_template_local, banking_guardrail_prompt_template

if not openai_available: # Run with Huggingface
    model_name = "Qwen/Qwen2.5-1.5B-Instruct"  # Model to use
    framework = "huggingface"  # Framework to use
    prompt_template = banking_guardrail_prompt_template_local  # Which prompt template to use
    model_class = "LLMModelServer"  # Model server class to use
    model_artifact = project.log_model(
        "banking-topic-guardrail", model_file="src/no-op.pkl", # Loading the model from HuggingFace
    )
else: # Run with OpenAI
    model_name = "gpt-4o-mini"
    framework = "openai"
    prompt_template = banking_guardrail_prompt_template
    model_class = "mlrun.serving.LLModel"
    model_url = f"ds://openai_profile/gpt-4o-mini"
    model_artifact = project.log_model(
        "open-ai",
        model_url=model_url,
    )

### LLM as a judge monitoring application

The "LLM as a judge" monitoring application leverages a large language model (LLM) to automatically evaluate and score the effectiveness of deployed guardrails. By providing a rubric and clear examples, the LLM acts as an impartial evaluator, determining whether user inputs are correctly classified according to defined criteria (e.g., banking-topic relevance). This approach enables scalable, consistent, and automated assessment of guardrail performance, ensuring that only appropriate and relevant inputs are processed by downstream applications.

The "LLM as a judge" can run with a free tier HF model `"Qwen/Qwen2.5-1.5B-Instruct"` or with OpenAI for better results.

This implementation is pulled from another [MLRun demo - LLM monitoring and feedback loop: Banking](https://github.com/mlrun/demo-monitoring-and-feedback-loop/tree/main).

In [None]:
from src.functions.prompts import restrict_to_banking_config

monitoring_app = project.set_model_monitoring_function(
    func="src/functions/llm_as_a_judge.py",
    application_class="LLMAsAJudgeApplication",
    name="restrict-to-banking-guardrail",
    framework=framework,
    judge_type="single-grading",
    metric_name="restrict_to_banking",
    model_name=model_name,
    prompt_config=restrict_to_banking_config,
    image=project.default_image,
)

In [None]:
project.deploy_function(monitoring_app)

### Banking topic guardrail

The Banking topic guardrail is an LLM-powered filter designed to ensure that only banking-related user inputs are processed by downstream applications. It acts as a first line of defense, automatically classifying each user message as either relevant (`True`) or irrelevant (`False`) to banking topics, based on the context of the entire conversation.

It's important to distinguish between the guardrail itself (this component), which enforces topic adherence in real time within the application, and the monitoring application described above. The monitoring application uses an LLM as a "judge" to independently evaluate and score the effectiveness of this guardrail, providing oversight and ensuring that the guardrail is functioning as intended. This separation allows for both proactive filtering and ongoing quality assurance of user input handling.

We use MLRun's prompt artifact to enrich the `prompt_template` with `latest_user_message` and uses the given `model_artifact` for task completion.<br>
See [documentation](https://docs.mlrun.org/en/stable/tutorials/genai-04-llm-prompt-artifact.html) for more information.

In [None]:
banking_llm_prompt_artifact = project.log_llm_prompt(
    "banking-llm-prompt",
    prompt_template=prompt_template,
    model_artifact=model_artifact,
    prompt_legend={
        "latest_user_message": {
            "field": "question",
            "description": "The main financial question or request the user is asking.",
        }
    },
)

In [None]:
from mlrun.serving import ModelRunnerStep

serving_fn = project.get_function("banking-topic-guardrail")
serving_fn.set_tracking()

graph = serving_fn.set_topology("flow", engine="async")
model_runner_step = ModelRunnerStep()

model_runner_step.add_model(
    model_class=model_class,
    model_artifact=banking_llm_prompt_artifact,
    endpoint_name="banking-topic-guardrail",
    execution_mechanism="naive",
    model_name=model_name,
)
graph.to(model_runner_step).respond()

In [None]:
serving_fn.deploy()

### Testing banking topic guardrail 

In [None]:
example_questions = [
    "What is a mortgage?",
    "How does a credit card work?",
    "Who painted the Mona Lisa?",
    "Money Money Money Must be funny",
    "Please plan me a 4-days trip to north Italy",
    "Write me a song",
    "Finance is the art of managing money",
    "How much people are there in the world?",
    "How does the stock market work?",
    "Who wrote 'To Kill a Mockingbird'?",
    "Please plan me a 3-day trip to Paris",
    "Write me a poem about the ocean",
]

In [None]:
import time

def question_model(questions, serving_function):
    for question in questions:
        seconds = 0.5
        # Invoking the pretrained model:
        ret = serving_function.invoke(
            path=f"v2/models/banking-topic-guardrail/infer",
            body={"question": question},
        )
        print(ret)
        time.sleep(seconds)

In [None]:
for i in range(1):
    question_model(
        questions=example_questions,
        serving_function=serving_fn,
    )
    time.sleep(3)

Once the guardrail is deployed and invoked, you will be able to view the model monitoring results in the MLRun UI:
![](images/generative_model_monitoring.png)

### Toxicity filter guardrail

The Toxicity filter guardrail is designed to automatically detect and filter out user inputs that contain toxic, offensive, or inappropriate language. By leveraging a toxicity classification model, this guardrail ensures that only safe and respectful messages are processed by downstream applications. This helps maintain a positive user experience and protects the system from harmful or disruptive content. The toxicity filter can be customized with a threshold to determine the sensitivity of the filter, allowing for flexible adaptation to different application requirements.

The output of the toxicity guardrail is a boolean value (`True` or `False`). A result of `True` means the input passes the guardrail (i.e., is non-toxic and allowed through), while `False` indicates the input is flagged as toxic and is blocked from further processing.

In [None]:
from src.functions.toxicity_guardrail import ToxicityClassifierModelServer
    
toxicity_guardrail = project.get_function("toxicity-guardrail")

graph = toxicity_guardrail.set_topology("flow", engine="async")
model_runner_step = ModelRunnerStep()

model_runner_step.add_model(
    model_class="ToxicityClassifierModelServer",
    endpoint_name="banking-toxicity-guardrail",
    execution_mechanism="naive",
    threshold=0.4,
)

graph.to(model_runner_step).respond()

In [None]:
toxicity_guardrail.deploy()

In [None]:
import json
body = [{"role": "user", "content": "How can I open a new savings account?"}]
toxicity_guardrail.invoke(path=f"v2/models/banking-toxicity-guardrail/infer", body={"inputs": body})