# Lab 10 · Evaluation, Latency, and Cache

*This lab notebook provides guided steps. All commands are intended for local execution.*

## Objectives
- A tiny evaluation set is logged for backend prompts.
- Latency and token usage metrics are recorded.
- A naive cache hashes prompt plus tool context.

## What will be learned
- Evaluation harness design is reviewed for LLM flows.
- Latency tracking is reinforced with timestamp instrumentation.
- Cache strategies are described for simple reuse.

## Prerequisites & install
The following commands are intended for local execution.

```bash
cd ai-web/backend
. .venv/bin/activate
pip install google-generativeai
```

## Step-by-step tasks
### Step 1: Evaluation dataset stub
A small evaluation dataset is stored for reproducibility.

In [None]:
from pathlib import Path
fixture_path = Path("ai-web/backend/app/eval_set.json")
fixture_path.write_text('''[
  {"prompt": "Summarize the project goals.", "expected": "A concise description is returned."},
  {"prompt": "List backend components.", "expected": "FastAPI, FAISS, and Gemini proxy are noted."}
]
''')
print("Evaluation dataset was created.")

### Step 2: Metrics helper
A helper is provided to time requests and count tokens.

In [None]:
from pathlib import Path
metrics_path = Path("ai-web/backend/app/metrics.py")
metrics_path.write_text('''import hashlib
import json
import time
from contextlib import contextmanager
from typing import Dict

TOKEN_LOG: Dict[str, int] = {}


@contextmanager
def track_latency(label: str):
    start = time.perf_counter()
    yield
    duration = time.perf_counter() - start
    print(f"{label} latency: {duration:.3f}s")


def cache_key(prompt: str, tools: str) -> str:
    payload = json.dumps({"prompt": prompt, "tools": tools}, sort_keys=True)
    return hashlib.sha256(payload.encode('utf-8')).hexdigest()


def record_tokens(label: str, count: int):
    TOKEN_LOG[label] = TOKEN_LOG.get(label, 0) + count
''')
print("Metrics helper was written.")

### Step 3: Cache-enabled endpoint
An evaluation endpoint is instrumented with caching and latency tracking.

In [None]:
from pathlib import Path
main_path = Path("ai-web/backend/app/main.py")
text = main_path.read_text()
if "evaluation_endpoint" not in text:
    addition = '''
from .metrics import cache_key, record_tokens, track_latency
from .llm import chat as llm_chat

CACHE = {}


@app.post("/api/evaluate")
def evaluation_endpoint(payload: dict):
    prompt = payload.get("prompt", "")
    tools = "".join(payload.get("tools", []))
    key = cache_key(prompt, tools)
    if key in CACHE:
        return {"cached": True, "response": CACHE[key]}
    with track_latency("evaluation"):
        response = llm_chat([
            {"role": "user", "content": prompt}
        ])
    record_tokens("evaluation", len(prompt.split()))
    CACHE[key] = response
    return {"cached": False, "response": response}
'''
    main_path.write_text(text.rstrip() + "
" + addition)
    print("Evaluation endpoint was appended.")
else:
    print("Evaluation endpoint already present.")

## Validation / acceptance checks
```bash
# locally
curl -X POST http://localhost:8000/api/evaluate -H 'Content-Type: application/json' -d '{"prompt":"Summarize the project goals."}'
```
- Responses indicate whether cache hits occurred and include evaluation output.
- React development mode shows the described UI state without console errors.

## Homework / extensions
- Token accounting is integrated with external monitoring dashboards.
- Additional evaluation prompts are composed for regression coverage.