# 🚀 Automated QA Testing for Steijn Assistant using PyRIT

## 📌 Overview
This notebook automates **QA testing** for the **Steijn Assistant** using the **PyRIT** framework. It sends predefined prompts to the assistant, evaluates its responses, and generates a report.

## 🛠️ Steps in this Notebook
- **🔧 Setup Configuration** - Define API endpoints, authentication, and request templates.
- **📋 Load QA Dataset** - Define test questions and expected answers.
- **⚙️ Initialize PyRIT** - Configure the testing environment.
- **📡 Send Prompts & Evaluate Responses** - Run the main test loop.
- **📊 Generate Report** - Save the results for analysis.

## 📝 How to Use This Notebook
1. **▶️ Run each cell in order** from top to bottom.
2. **✏️ Modify the `qa_pairs` list** to test different questions and expected outcomes.
3. **📂 Inspect the HTML report** at the end for detailed evaluation results.


In [1]:
import uuid
import asyncio  # Needed for asynchronous operations
from pathlib import Path
import time
import datetime
from dotenv import load_dotenv
import os

# PyRIT Imports
from pyrit.common import IN_MEMORY, initialize_pyrit
from pyrit.prompt_target import OpenAIChatTarget, SteijnHTTPTarget
from pyrit.score.evaluator import Evaluator
from pyrit.orchestrator import SteijnPromptSendingOrchestrator
from pyrit.common.text_helper import save_html_report, generate_single_turn_html_report
from pyrit.prompt_target import SteijnResponseParser


In [2]:
# Use an in-memory database for a clean testing environment
initialize_pyrit(memory_db_type=IN_MEMORY)


In [3]:
# Load environment variables
load_dotenv()

# Define Member ID and API credentials
member_id = "16611078"
url = os.getenv("STEIJN_NONPRD_ENDPOINT") + member_id
token = os.getenv("STEIJN_NONPRD_TOKEN")

# Define a raw HTTP POST request template with headers and a placeholder for the prompt
start_chat_request_raw = f"""
    POST {url}
    Content-Type: application/json
    X-Authorization: {token}
    Accept: text/event-stream
    x-rate-limiter-enabled: false
    x-message-length-validation-enabled: false

    {{
        "data": "{{PROMPT}}"
    }}
"""


In [4]:
# Each QA pair contains a question and its expected answer suggestion
qa_pairs = [
    {"question": "Ik heb zin in fruit", "expected_outcomes": "TEXT: Vraag om meer details"},
    {"question": "Appel", "expected_outcomes": "PRODUCT_LANE: [Apples]"},
    {"question": "Verkopen jullie gehakt?", "expected_outcomes": "PRODUCT_LANE: [Gehakt]"},
    {"question": "Wat zijn vegetarische ontbijt producten", "expected_outcomes": "TEXT: Suggesties voor veganistische ontbijtproducten"},
    {"question": "Wat kost een kilo kipfilet?", "expected_outcomes": "PRODUCT_LANE: [Kipfilet]"},
]


In [5]:
# Create an HTTP target that sends prompts using the defined request template.
http_prompt_target = SteijnHTTPTarget(
    http_request=start_chat_request_raw,
    prompt_regex_string="{PROMPT}",
    timeout=60.0,
    callback_function=SteijnResponseParser.parse_response
)

# Create an evaluator that uses a YAML configuration for scoring suggestions.
scorer = Evaluator(
    chat_target=OpenAIChatTarget(),
    evaluator_yaml_path=Path("assets/AH_Evaluators/relevance_evaluator.yaml"),
    scorer_type="float_scale"
)

# Create the orchestrator for sending prompts and evaluating responses.
orchestrator = SteijnPromptSendingOrchestrator(
    objective_target=http_prompt_target,
    scorers=[scorer]
)


In [6]:
async def main():
    # Extract the list of questions and expected outcomes from qa_pairs.
    questions = [pair["question"] for pair in qa_pairs]
    expected_outcomes = [pair["expected_outcomes"] for pair in qa_pairs]
    
    # Start the timer before sending prompts.
    start_time = time.time()
    
    # Send the list of prompts asynchronously.
    await orchestrator.send_prompts_async(prompt_list=questions, expected_output_list=expected_outcomes)
    
    # Retrieve the chat results from the orchestrator.
    results = orchestrator.get_chat_results()
    
    # Calculate the total execution time.
    execution_time = time.time() - start_time

    # Generate and save the report
    await generate_report(results, execution_time)


In [7]:
async def generate_report(results, execution_time):
    # Define the report directory path and create it if it doesn't exist.
    report_dir = Path("tests/E2E/reports/DataSet").resolve()
    report_dir.mkdir(parents=True, exist_ok=True)
    
    # Generate and save the HTML report.
    save_html_report(
        results=results,
        directory=str(report_dir),
        report_generator=generate_single_turn_html_report,
        is_chat_evaluation=False,
        threshold=0.7,
        file_name="steijn_dataset",
        description="This report presents a comprehensive evaluation of the dataset by comparing each input with its corresponding actual and expected outputs, along with a score that quantifies the degree of alignment between the actual and expected responses.",
        execution_time=execution_time
    )


In [8]:
await main()


✅ Report saved at: /Users/denizdalkilic/Documents/Forks/PyRIT/tests/E2E/reports/DataSet/steijn_dataset_20250328_151113.html
