# RAG System Evaluation Examples

This notebook provides 5 distinct test cases to evaluate the Cybersecurity RAG Assistant. It covers:
1. **OWASP Top 10** (Specific Technical Definition)
2. **MITRE ATT&CK** (Conceptual Understanding)
3. **Thai Web Security Standard** (Cross-language/OCR Capability)
4. **OWASP Data Factors** (List Retrieval)
5. **Off-Topic** (Grounding Verification)

**Pre-requisites:**
- System must be running (`docker-compose up`)
- API must be accessible at `http://127.0.0.1:8000`
- Install dependencies: `pip install requests` (or run the cell below)

In [1]:
# Install necessary library for HTTP requests
%pip install requests

Note: you may need to restart the kernel to use updated packages.


In [2]:
import requests
import json
from IPython.display import display, Markdown

API_URL = "http://127.0.0.1:8000/chat"

def ask_question(question):
    payload = {"question": question}
    headers = {"Content-Type": "application/json"}
    try:
        response = requests.post(API_URL, json=payload, headers=headers)
        response.raise_for_status()
        data = response.json()
        
        # Format Output
        display(Markdown(f"### Question: **{question}**"))
        display(Markdown(f"**Answer:** {data['answer']}"))
        if data.get('sources'):
            display(Markdown(f"**Sources:** `{data['sources']}`"))
        else:
            display(Markdown("**Sources:** _None provided_"))
        display(Markdown("---"))
        
    except Exception as e:
        print(f"Error: {e}")

## 1. OWASP Top 10: Specific Technical Definition
Testing exact retrieval of a definition from the English OWASP document.

In [3]:
ask_question("What is Broken Access Control?")

### Question: **What is Broken Access Control?**

**Answer:** Broken Access Control refers to a vulnerability where an application allows an attacker to access or modify data that they shouldn't be able to, often due to inadequate access control checks.

**Sources:** `['owasp-top-10_ocr.md', 'https://owasp.org/Top10/A01_2021-Broken_Access_Control/']`

---

## 2. MITRE ATT&CK: Conceptual Understanding
Testing the ability to synthesize an answer regarding security philosophy.

In [4]:
ask_question("Why is it important to focus on post-compromise detection?")

### Question: **Why is it important to focus on post-compromise detection?**

**Answer:** Focusing on post-compromise detection is important because it allows defenders to understand actions and potential countermeasures in context, making it easier to identify adversary behavior.

**Sources:** `['mitre-attack-philosophy-2020_ocr.md']`

---

## 3. Thai Web Security Standard (2025): OCR & Language Support
Testing retrieval from a Thai language document processed via Typhoon OCR.  
**Question:** "How often must agencies conduct self-assessment according to the notification?"

In [5]:
ask_question("ตามประกาศมาตรฐานเว็บไซต์ หน่วยงานต้องดำเนินการประเมินตนเอง (Self-Assessment) บ่อยแค่ไหน?")

### Question: **ตามประกาศมาตรฐานเว็บไซต์ หน่วยงานต้องดำเนินการประเมินตนเอง (Self-Assessment) บ่อยแค่ไหน?**

**Answer:** หน่วยงานต้องดำเนินการประเมินตนเองอย่างน้อยปีละ 1 ครั้ง

**Sources:** `['thailand-web-security-standard-2025_ocr.md']`

---

## 4. OWASP Data Factors: List Retrieval
Testing the ability to list specific factors mentioned in the documentation.

In [6]:
ask_question("What are the data factors listed for each of the Top 10 categories?")

### Question: **What are the data factors listed for each of the Top 10 categories?**

**Answer:** The data factors listed for each of the Top 10 categories include CWES Mapped, Incident Rate, Weighted Exploit, Weighted Impact, Total Occurrences, and Total CVEs.

**Sources:** `['owasp-top-10_ocr.md', 'owasp-top-10_ocr.md']`

---

## 5. Off-Topic / Grounding Check
Testing the system's ability to refuse answering questions outside its provided context.

In [7]:
ask_question("How do I bake a cake?")

### Question: **How do I bake a cake?**

**Answer:** I don't know

**Sources:** _None provided_

---