# 03 – Controller Decision Logic

This notebook implements Step 3 from the blog:

- Use Stage A/B/C results
- Decide which corrective action is appropriate
- Demonstrate how an evaluation controller can stabilize a RAG system

We use:

- `rag_eval.stages.evaluate_all_stages`
- `rag_eval.controller.EvaluationController`

In [None]:
import os
import sys

PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), ".."))
SRC_PATH = os.path.join(PROJECT_ROOT, "src")

if SRC_PATH not in sys.path:
    sys.path.append(SRC_PATH)

from rag_eval.metrics import compute_all_metrics
from rag_eval.stages import evaluate_all_stages
from rag_eval.controller import EvaluationController

PROJECT_ROOT, SRC_PATH

## Helper: Evaluate and Control in One Step

We define a small helper function that:

1. Computes metrics
2. Evaluates Stage A/B/C
3. Asks the controller which action to take
4. Executes that action (simulated)


In [None]:
controller = EvaluationController()

def run_eval_and_control(query, metadata, retrieved_docs, reference, output, label):
    print(f"\n=== Scenario: {label} ===")
    metrics = compute_all_metrics(
        reference=reference,
        output=output,
        latency_ms=metadata.get("latency_ms", 120),
        token_count=metadata.get("token_count", 80),
        retrieval_ms=metadata.get("retrieval_ms", 30),
    )

    stage_results = evaluate_all_stages(
        query=query,
        metadata=metadata,
        retrieved_docs=retrieved_docs,
        gen_metrics={"intrinsic": metrics.intrinsic},
    )

    action_name = controller.choose_action(stage_results)
    correction = controller.execute(action_name)

    print("Stage results:", stage_results)
    print("Chosen action:", action_name)
    print("Correction detail:", correction)

    return stage_results, correction

## Scenario 1: Everything Is Healthy (No Action)

The controller should recommend `noop`.

In [None]:
query = "What changed in the AI auditing policy in 2024?"
metadata = {"domain": "policy", "max_query_tokens": 50}
retrieved_docs = [
    {"id": "doc1", "content": "Policy updated in 2024 to include AI auditing guidelines."},
    {"id": "doc2", "content": "Details of AI auditing process introduced in 2024."},
    {"id": "doc3", "content": "Background on AI auditing requirements."},
]
reference = "The policy was updated in 2024 to include new AI auditing guidelines."
output = "The policy was updated in 2024 with new guidelines for AI auditing and oversight."

run_eval_and_control(query, metadata, retrieved_docs, reference, output, label="healthy")

## Scenario 2: Retrieval Drift (Stage B Fails → adjust_retrieval)

Here we simulate poor retrieval while prompt and query remain reasonable.

In [None]:
retrieved_docs_drifted = [
    {"id": "doc1", "content": "Unrelated content about billing."},
]

output_drifted = "The policy mentions some updates, but details are missing."

run_eval_and_control(
    query,
    metadata,
    retrieved_docs_drifted,
    reference,
    output_drifted,
    label="retrieval_drift",
)

## Scenario 3: Reasoning / Grounding Drift (Stage C Fails → adjust_prompt)

Here retrieval is healthy, but the model produces an incorrect or weakly
grounded answer, leading to low intrinsic score and a prompt-level fix.

In [None]:
output_bad = "The policy introduced new marketing guidelines unrelated to AI auditing."

run_eval_and_control(
    query,
    metadata,
    retrieved_docs,
    reference,
    output_bad,
    label="reasoning_drift",
)