# 🚀 Automated Evaluation of AH-GPT Responses using PyRIT

## 📌 Overview
This notebook automates the evaluation of **AH-GPT** using the **PyRIT** framework. It sends predefined prompts to AH-GPT, assesses its responses, and generates a report. The goal is to evaluate the assistant’s performance across **customer automation scenarios**.

## 🛠️ Steps in this Notebook
- **🔧 Configuration** - Set up API endpoints and authentication.
- **📝 Define Test Objectives** - List customer-oriented queries to be tested.
- **⚙️ Initialize PyRIT** - Prepare the evaluation framework.
- **🎯 Set Up Targets and Evaluators** - Configure request handling and scoring logic.
- **📡 Run Automated Testing** - Execute multi-turn conversations and capture responses.
- **📊 Generate Reports** - Store evaluation results in an HTML report.

## 📝 How to Use This Notebook
1. **▶️ Run each cell in order** from top to bottom.
2. **✏️ Modify the `objectives` list** to test different queries.
3. **📂 Inspect the HTML report** at the end for detailed evaluation results.



In [1]:
import logging
import asyncio
from pathlib import Path
import time
import datetime
from dotenv import load_dotenv
import os

# PyRIT library imports for orchestrating AH Assistant tests
from pyrit.common import IN_MEMORY, initialize_pyrit
from pyrit.prompt_target import OpenAIChatTarget, SteijnHTTPTarget
from pyrit.score.evaluator import Evaluator
from pyrit.orchestrator import SteijnOrchestrator
from pyrit.orchestrator.multi_turn.steijn.steijn_orchestrator import RTOSystemPromptPaths
from pyrit.common.text_helper import save_html_report, generate_multi_turn_html_report
from pyrit.prompt_target import SteijnResponseParser


In [2]:
# Set logging level to WARNING to reduce log output
logging.basicConfig(level=logging.WARNING)


In [3]:
# Load environment variables
load_dotenv()

# Define Member ID and API credentials
member_id = "16611078"
url = os.getenv("STEIJN_NONPRD_ENDPOINT") + member_id
token = os.getenv("STEIJN_NONPRD_TOKEN")

# Define a raw HTTP POST request template with headers and a placeholder for the prompt
start_chat_request_raw = f"""
    POST {url}
    Content-Type: application/json
    X-Authorization: {token}
    Accept: text/event-stream
    x-rate-limiter-enabled: false
    x-message-length-validation-enabled: false

    {{
        "data": "{{PROMPT}}"
    }}
"""

In [4]:
# System prompt strategy for AH Assistant testing
strategy_path = RTOSystemPromptPaths.AH_ASSISTANT_CUSTOMER.value

# YAML configuration file for the evaluator (scorer)
scorer_path = "assets/AH_Evaluators/chat_evaluator.yaml"

# Directory path for saving the HTML report
report_path = "tests/E2E/reports/CustomerAutomation"


In [5]:
objectives = [
    # General Recipe Requests
    "As a home cook, I want a quick and easy pasta recipe for a busy weeknight. I live gluten free and I have a nut allergy. I like meat"
]


In [6]:
# Initialize the Pyrit environment with an in-memory database
initialize_pyrit(memory_db_type=IN_MEMORY)


In [7]:
async def main():
    # Store conversation reports for each objective
    results = []

    # Start timer before execution
    start_time = time.time()
    for objective in objectives:
        # Prepare evaluator variables using the current objective
        scorer_variables = {"objective": objective}
        
        # Create chat target for adversarial conversation
        chat_target = OpenAIChatTarget()
        
        # Create HTTP target to send prompts to the assistant service
        http_prompt_target = SteijnHTTPTarget(
            http_request=start_chat_request_raw,
            prompt_regex_string="{PROMPT}",
            timeout=60.0,
            callback_function=SteijnResponseParser.parse_response
        )
        
        # Create evaluator to assess response quality
        chat_scorer = Evaluator(
            chat_target=OpenAIChatTarget(),
            evaluator_yaml_path=Path(scorer_path),
            additional_evaluator_variables=scorer_variables,
            scorer_type="float_scale"
        )
        
        # Set up orchestrator for AH Assistant testing
        orchestrator = SteijnOrchestrator(
            adversarial_chat=chat_target,
            adversarial_chat_system_prompt_path=strategy_path,
            objective_target=http_prompt_target,
            objective_scorer=chat_scorer,
            verbose=True,
            evaluate_chat=True,
            max_turns=10,
            use_score_as_feedback=True 
        )
        
        # Run the orchestrator attack asynchronously for the current objective
        result = await orchestrator.run_attack_async(objective=objective)

        # Retrieve and store conversation report
        results.append(await result.get_conversation_report_async())

    # Calculate execution time
    execution_time = time.time() - start_time

    # Generate and save the report
    await generate_report(results, execution_time)


In [8]:
async def generate_report(results, execution_time):
    # Ensure report directory exists
    report_dir = Path(report_path).resolve()
    report_dir.mkdir(parents=True, exist_ok=True)
    
    # Save the conversation results as an HTML report
    save_html_report(
        results=results,
        is_chat_evaluation=True,
        report_generator=generate_multi_turn_html_report,
        directory=str(report_dir),
        file_name="steijn_customer_automation",
        description="This report provides an overview of multi-turn customer simulations, highlighting how the assistant engages in realistic, task-focused conversations. Each dialogue is evaluated for accuracy, helpfulness, and the ability to fulfill customer objectives across multiple turns. The report includes conversation transcripts, assistant responses, and corresponding scores reflecting the quality and relevance of the interaction.",
        execution_time=execution_time
    )


In [9]:
await main()


[1m[92m
Starting new chat...

[1m[94mUser:[0m Ik ben op zoek naar een pasta recept voor een drukke weekavond. Heb je suggesties?

[1m[92mAssistant:[0m {'text_message': 'Het lijkt erop dat je een drukke weekavond hebt! Ik ga even kijken naar een paar makkelijke pasta recepten voor je. Een momentje!Ik heb een selectie van heerlijke pastarecepten voor je gevonden. Van romige sauzen tot pittige smaken, er is voor ieder wat wils. Geniet ervan!', 'data_message': '{"type":"RECIPE_LANE","data":{"title":null,"items":[{"id":409684,"title":"Pasta\xa0aglio\xa0e olio","time":{"cook":15,"oven":0,"wait":0}', 'suggestion_pills': {'chips': ['Snelle pasta met tomaat', 'Makkelijke pasta recepten', 'Hoe maak je verse pasta?', 'Pasta bewaren tips']}}

Extracted Thread ID:  thread_ceKaTLO6FxJZ6KPu7QPkpk4V
[1m[91mScore: [91m0.4 : [22mLast Turn Score: The assistant provides a potentially useful recipe suggestion ('Pasta aglio e olio') suitable for a quick weeknight dinner, fulfilling the immediate