# Context Metric Example

This notebook demonstrates how to use the **Context** metric from Fair Forge to evaluate how well AI assistant responses align with the provided context.

## Installation

First, install Fair Forge and the required dependencies.

In [None]:
import sys
!uv pip install --python {sys.executable} --force-reinstall "$(ls ../../dist/*.whl)[context]" langchain-groq -q

## Setup

Import the required modules and configure your API key.

In [None]:
import os
import sys

sys.path.insert(0, os.path.dirname(os.getcwd()))

from fair_forge.metrics.context import Context
from helpers.retriever import LocalRetriever
from langchain_groq import ChatGroq

In [None]:
import getpass

GROQ_API_KEY = getpass.getpass("Enter your Groq API key: ")

## Initialize the Judge Model

The Context metric uses an LLM as a judge to evaluate responses. You can use any LangChain-compatible chat model.

In [None]:
judge_model = ChatGroq(
    model="openai/gpt-oss-120b",
    api_key=GROQ_API_KEY,
    temperature=0.0,
    reasoning_format="parsed",
)

## Run the Context Metric

The Context metric evaluates each Q&A interaction in your dataset, scoring how well the assistant's response aligns with the provided context.

In [None]:
metrics = Context.run(
    LocalRetriever,
    model=judge_model,
    use_structured_output=True,
    verbose=True,
)

## Analyze Results

Each metric contains:
- `context_awareness`: A score (0-1) indicating how well the response aligns with the context
- `context_insight`: The judge's explanation of the evaluation
- `context_thinkings`: The judge's chain-of-thought reasoning (if available)

In [None]:
print(f"Total interactions evaluated: {len(metrics)}\n")

for metric in metrics:
    print(f"QA ID: {metric.qa_id}")
    print(f"Context Awareness Score: {metric.context_awareness}")
    print(f"Insight: {metric.context_insight}")
    print("-" * 50)

## Calculate Average Score

In [None]:
avg_score = sum(m.context_awareness for m in metrics) / len(metrics)
print(f"Average Context Awareness: {avg_score:.2f}")

## Export Results

You can export the results to JSON for further analysis.

In [None]:
import json

results = [metric.model_dump() for metric in metrics]

with open("context_results.json", "w") as f:
    json.dump(results, f, indent=2)

print("Results exported to context_results.json")