# RAG System Evaluation Examples

This notebook provides 5 distinct test cases to evaluate the Cybersecurity RAG Assistant. It covers:
1. **OWASP Top 10** (Specific Technical Definition)
2. **MITRE ATT&CK** (Conceptual Understanding)
3. **Thai Web Security Standard** (Cross-language/OCR Capability)
4. **OWASP Data Factors** (List Retrieval)
5. **Off-Topic** (Grounding Verification)

**Pre-requisites:**
- System must be running (`docker-compose up`)
- API must be accessible at `http://127.0.0.1:8000`
- Install dependencies: `pip install requests` (or run the cell below)

In [None]:
# Install necessary library for HTTP requests
%pip install requests

In [None]:
import requests
import json
from IPython.display import display, Markdown

API_URL = "http://127.0.0.1:8000/chat"

def ask_question(question):
    payload = {"question": question}
    headers = {"Content-Type": "application/json"}
    try:
        response = requests.post(API_URL, json=payload, headers=headers)
        response.raise_for_status()
        data = response.json()
        
        # Format Output
        display(Markdown(f"### Question: **{question}**"))
        display(Markdown(f"**Answer:** {data['answer']}"))
        if data.get('sources'):
            display(Markdown(f"**Sources:** `{data['sources']}`"))
        else:
            display(Markdown("**Sources:** _None provided_"))
        display(Markdown("---"))
        
    except Exception as e:
        print(f"Error: {e}")

## 1. OWASP Top 10: Specific Technical Definition
Testing exact retrieval of a definition from the English OWASP document.

In [None]:
ask_question("What is Broken Access Control?")

## 2. MITRE ATT&CK: Conceptual Understanding
Testing the ability to synthesize an answer regarding security philosophy.

In [None]:
ask_question("Why is it important to focus on post-compromise detection?")

## 3. Thai Web Security Standard (2025): OCR & Language Support
Testing retrieval from a Thai language document processed via Typhoon OCR.  
**Question:** "How often must agencies conduct self-assessment according to the notification?"

In [None]:
ask_question("ตามประกาศมาตรฐานเว็บไซต์ หน่วยงานต้องดำเนินการประเมินตนเอง (Self-Assessment) บ่อยแค่ไหน?")

## 4. OWASP Data Factors: List Retrieval
Testing the ability to list specific factors mentioned in the documentation.

In [None]:
ask_question("What are the data factors listed for each of the Top 10 categories?")

## 5. Off-Topic / Grounding Check
Testing the system's ability to refuse answering questions outside its provided context.

In [None]:
ask_question("How do I bake a cake?")