# Evaluation Comparison Notebook
Use this notebook to run the QA bot and compare evaluation scores across frameworks.

**LangChain + Ollama prerequisites**
- Install [Ollama](https://ollama.ai/download) and run `ollama pull llama3` (or any model you prefer).
- In your `.env`, set `LANGCHAIN_USE_OLLAMA=true`, `OLLAMA_MODEL=llama3`, and optionally `OLLAMA_BASE_URL`.
- Restart the notebook kernel after updating environment variables so LangChain picks up the new settings.

In [13]:
from pathlib import Path
import json
import sys

root = Path("..").resolve()
sys.path.insert(0, str(root))
sys.path.insert(0, str(root / "src"))

from src.qa_bot import QABot
from evaluations.base_evaluator import EvaluationInput

bot = QABot(documents_path=root / "data" / "documents" / "sample_docs")
question = "How do you install the Python requests library?"
answer = bot.answer(question)
answer.response

'According to Python Requests, To install the library, use `pip install requests` in your Python environment.'

## Build Evaluation Dataset

Generate predictions for the full test set so we can score the QA bot with an evaluation framework.

In [14]:
from evaluations.utils import load_dataset_from_files

questions_path = root / "data" / "test_questions.json"
ground_truth_path = root / "data" / "ground_truth.json"

questions_data = json.loads(questions_path.read_text(encoding="utf-8"))

predictions: dict[str, str] = {}
for item in questions_data:
    response = bot.answer(item["question"])
    predictions[item["id"]] = response.response

eval_dataset = list(
    load_dataset_from_files(
        questions_path=questions_path,
        ground_truth_path=ground_truth_path,
        predictions=predictions,
    )
)

f"Prepared {len(eval_dataset)} evaluation samples."

'Prepared 3 evaluation samples.'

## Evaluate with LangChain

Run the LangChain evaluation runner. Configure OpenAI or Ollama credentials in `.env` before executing this cell.

In [17]:
from evaluations.langchain_eval_runner import LangChainEvalRunner

langchain_runner = LangChainEvalRunner(output_dir=root / "results")
try:
    langchain_result = langchain_runner.evaluate(eval_dataset)
    langchain_summary = {
        "framework": langchain_result.framework,
        "score": langchain_result.score,
        "details": langchain_result.details,
    }
except Exception as exc:
    langchain_summary = {
        "framework": "langchain",
        "error": str(exc),
    }

langchain_summary

{'framework': 'langchain',
 'score': None,
 'details': {'error': 'ChatOpenAI requires OPENAI_API_KEY; set LANGCHAIN_USE_OLLAMA=true to use a local model.',
  'provider': None}}