# 🚀 Automated Evaluation of AH-GPT Responses using PyRIT

## 📌 Overview
This notebook automates the evaluation of **AH-GPT** using the **PyRIT** framework. It sends predefined prompts to AH-GPT, assesses its responses, and generates a report. The goal is to evaluate the assistant’s performance across **customer automation scenarios**.

## 🛠️ Steps in this Notebook
- **🔧 Configuration** - Set up API endpoints and authentication.
- **📝 Define Test Objectives** - List customer-oriented queries to be tested.
- **⚙️ Initialize PyRIT** - Prepare the evaluation framework.
- **🎯 Set Up Targets and Evaluators** - Configure request handling and scoring logic.
- **📡 Run Automated Testing** - Execute multi-turn conversations and capture responses.
- **📊 Generate Reports** - Store evaluation results in an HTML report.

## 📝 How to Use This Notebook
1. **▶️ Run each cell in order** from top to bottom.
2. **✏️ Modify the `objectives` list** to test different queries.
3. **📂 Inspect the HTML report** at the end for detailed evaluation results.



In [1]:
import logging
import asyncio
from pathlib import Path
import time
from datetime import datetime
from dotenv import load_dotenv
import os

# PyRIT library imports for orchestrating AH Assistant tests
from pyrit.common import IN_MEMORY, initialize_pyrit
from pyrit.prompt_target import OpenAIChatTarget, SteijnHTTPTarget
from pyrit.score.evaluator import Evaluator
from pyrit.orchestrator import SteijnOrchestrator
from pyrit.orchestrator.multi_turn.steijn.steijn_orchestrator import RTOSystemPromptPaths
from pyrit.common.text_helper import generate_simulation_report
from pyrit.prompt_target import SteijnResponseParser


In [2]:
# Set logging level to WARNING to reduce log output
logging.basicConfig(level=logging.WARNING)


In [3]:
# Load environment variables
load_dotenv()

# Define Member ID and API credentials
member_id = "16611078"
url = os.getenv("STEIJN_NONPRD_ENDPOINT") + member_id
token = os.getenv("STEIJN_NONPRD_TOKEN")

# Define a raw HTTP POST request template with headers and a placeholder for the prompt
start_chat_request_raw = f"""
    POST {url}
    Content-Type: application/json
    X-Authorization: {token}
    Accept: text/event-stream
    x-rate-limiter-enabled: false
    x-message-length-validation-enabled: false

    {{
        "data": "{{PROMPT}}"
    }}
"""

In [4]:
# System prompt strategy for AH Assistant testing
strategy_path = RTOSystemPromptPaths.AH_ASSISTANT_CUSTOMER.value

# YAML configuration file for the evaluator (scorer)
scorer_path = "assets/AH_Evaluators/chat_evaluator.yaml"

# Directory path for saving the HTML report
report_path = "tests/E2E/reports/CustomerAutomation"


In [5]:
objectives = [
    # Ingredient-Based Filtering & Allergy Accommodation
    "As a parent of a child with nut allergies, I would like to find dessert recipes that are nut-free, avoid cross-contamination, and use sunflower seed butter instead of peanut butter.",
    "As someone with celiac disease, I would like to discover gluten-free pasta recipes that use chickpea flour instead of wheat and provide tips for preventing mushy texture.",
    
    # Cuisine and Flavor Customization
    "As an enthusiast of Levantine cuisine, I would like mezze platter recipes including muhammara, baba ganoush, and labneh, focusing on fresh pomegranate and sumac flavors.",
    "As a fan of Japanese izakaya, I would like small plate recipes such as yakitori and agedashi tofu, with vegetarian grilling options included.",

    # Dish Type and Occasion-Based Queries
    "As someone planning a romantic dinner, I would like a three-course aphrodisiac meal featuring oysters (or grilled eggplant for vegans), dark chocolate, and pomegranate cocktails.",
    "As a wine tasting party organizer, I would like vegan cheese board ideas with almond ricotta, fig jam, and rosemary crackers that pair with red and white wines.",

    # Diet and Health Goals
    "As a keto dieter, I would like fat bomb dessert recipes using coconut oil, cacao, and erythritol that fit into my macros.",
    "As a bodybuilder, I would like chicken breast recipes marinated in lemon and herbs with 40g of protein per serving, baked in under 30 minutes.",
   
    # Budget and Accessibility
    "As a student on a $50 weekly budget, I would like meal plans that feature beans, rice, and seasonal vegetables with a shopping list.",
    "As a remote worker on a budget, I would like make-ahead grain bowl recipes with quinoa, black beans, avocado, and tahini dressing.",
]

In [6]:
# Initialize the Pyrit environment with an in-memory database
initialize_pyrit(memory_db_type=IN_MEMORY)


In [7]:
async def run_simulation():
    # Store conversation reports for each objective
    results = []

    # Start timer before execution
    start_time = time.time()
    for objective in objectives:
        # Prepare evaluator variables using the current objective
        scorer_variables = {"objective": objective}
        
        # Create chat target for adversarial conversation
        chat_target = OpenAIChatTarget()
        
        # Create HTTP target to send prompts to the assistant service
        http_prompt_target = SteijnHTTPTarget(
            http_request=start_chat_request_raw,
            prompt_regex_string="{PROMPT}",
            timeout=60.0,
            callback_function=SteijnResponseParser.parse_response
        )
        
        # Create evaluator to assess response quality
        chat_scorer = Evaluator(
            chat_target=OpenAIChatTarget(),
            evaluator_yaml_path=Path(scorer_path),
            additional_evaluator_variables=scorer_variables,
            scorer_type="float_scale"
        )
        
        # Set up orchestrator for AH Assistant testing
        orchestrator = SteijnOrchestrator(
            adversarial_chat=chat_target,
            adversarial_chat_system_prompt_path=strategy_path,
            objective_target=http_prompt_target,
            objective_scorer=chat_scorer,
            verbose=True,
            evaluate_chat=True,
            max_turns=2,
            use_score_as_feedback=True 
        )
        
        # Run the orchestrator attack asynchronously for the current objective
        result = await orchestrator.run_attack_async(objective=objective)

        # Retrieve and store conversation report
        results.append(await result.get_conversation_report_async())

    return results


In [8]:
async def generate_report(results, execution_time):
    # Ensure report directory exists
    report_dir = Path(report_path).resolve()
    report_dir.mkdir(parents=True, exist_ok=True)

     # Create a timestamp string
    timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

    # Construct the filename with timestamp after the extension
    filename = f"customer_automation_report_{timestamp}.html"
    
    # Save the conversation results as an HTML report
    generate_simulation_report(
        results=results,
        save_path=report_dir / filename,
        description="This report provides an overview of multi-turn customer simulations, highlighting how the assistant engages in realistic, task-focused conversations. Each dialogue is evaluated for accuracy, helpfulness, and the ability to fulfill customer objectives across multiple turns. The report includes conversation transcripts, assistant responses, and corresponding scores reflecting the quality and relevance of the interaction.",
        execution_time=execution_time,
    )


In [9]:
async def main():
    # Run attack and measure execution time
    start_time = time.time()
    results = await run_simulation()
    execution_time = time.time() - start_time
    await generate_report(results, execution_time)

# Run the async function
await main()


[1m[92m
Starting new chat...

[1m[94mUser:[0m Ik ben op zoek naar dessertrecepten die volledig notenvrij zijn. Heb je suggesties?

[1m[92mAssistant:[0m {'text_message': 'Ik heb heerlijke dessertrecepten voor je gevonden, zonder noten. Geniet van deze zoete traktaties!Kijk zelf of de recepten en ingrediënten voldoen aan je voorkeuren.', 'data_message': '{"type":"RECIPE_LANE","data":{"title":null,"items":[{"id":1194518,"title":"Snel grand dessert","time":{"cook":15,"oven":0,"wait":0}', 'suggestion_pills': {'chips': ['Notenvrije chocoladetaart', 'Fruitige desserts zonder noten', 'Hoe maak je notenvrije koekjes?', 'Notenvrije ingrediënten tips']}}

Extracted Thread ID:  thread_hnxRvvsiXcCcowknSrvSgj1U
[1m[91mScore: [91m0.7 : [22mLast Turn Score:
The assistant provided suggestions and resources for nut-free desserts, directly addressing the user's request. It included potential recipes and tips, which are relevant to the user's message.

Full Chat Score:
The assistant effectivel