# Regulatory Compliance Metric Example

This notebook demonstrates how to use the **Regulatory** metric from Fair Forge to evaluate whether AI assistant responses comply with a regulatory corpus (e.g., company policies, legal frameworks).

The metric uses:
1. **Embedding-based retrieval** to find relevant regulatory chunks
2. **Reranker model** to detect contradictions between responses and regulations

## Installation

First, install Fair Forge with the regulatory dependencies.

In [None]:
!pip install "alquimia-fair-forge[regulatory]" -q

## Setup

Import the required modules.

In [None]:
import json
from pathlib import Path

from fair_forge.connectors import LocalCorpusConnector
from fair_forge.core import Retriever
from fair_forge.metrics.regulatory import Regulatory
from fair_forge.schemas.common import Dataset

## Create a Custom Retriever

Load the conversation dataset from the local data directory.

In [None]:
class LocalRetriever(Retriever):
    """Load conversations from local JSON file."""

    def load_dataset(self) -> list[Dataset]:
        data_path = Path("../data/dataset.json")
        with open(data_path, encoding="utf-8") as f:
            data = json.load(f)
        return [Dataset.model_validate(d) for d in data]

## Configure the Corpus Connector

Point the corpus connector to the directory containing regulatory markdown files.

In [None]:
corpus_dir = Path("../corpus")
corpus_connector = LocalCorpusConnector(corpus_dir)

# Verify documents load correctly
documents = corpus_connector.load_documents()
print(f"Loaded {len(documents)} regulatory document(s):")
for doc in documents:
    print(f"  - {doc.source}: {len(doc.text)} characters")

## Run the Regulatory Metric

Configure and run the metric. Key parameters:
- `embedding_model`: Model for semantic retrieval (default: Qwen3-Embedding-0.6B)
- `reranker_model`: Model for contradiction detection (default: Qwen3-Reranker-0.6B)
- `chunk_size`: Characters per chunk
- `top_k`: Max chunks to retrieve per query
- `similarity_threshold`: Minimum similarity for retrieval
- `contradiction_threshold`: Score below which a chunk contradicts the response

In [None]:
metrics = Regulatory.run(
    LocalRetriever,
    corpus_connector=corpus_connector,
    embedding_model="Qwen/Qwen3-Embedding-0.6B",
    reranker_model="Qwen/Qwen3-Reranker-0.6B",
    chunk_size=500,
    chunk_overlap=50,
    top_k=5,
    similarity_threshold=0.3,
    contradiction_threshold=0.6,
    verbose=True,
)

## Analyze Results

Each metric contains:
- `compliance_score`: Score from 0-1 (ratio of supporting chunks)
- `verdict`: COMPLIANT, NON_COMPLIANT, or IRRELEVANT
- `supporting_chunks`: Number of chunks that support the response
- `contradicting_chunks`: Number of chunks that contradict the response
- `retrieved_chunks`: Detailed information about each retrieved chunk
- `insight`: Human-readable explanation

In [None]:
print(f"Total interactions evaluated: {len(metrics)}\n")

for metric in metrics:
    print(f"QA ID: {metric.qa_id}")
    print(f"Query: {metric.query[:80]}..." if len(metric.query) > 80 else f"Query: {metric.query}")
    print(f"Verdict: {metric.verdict}")
    print(f"Compliance Score: {metric.compliance_score:.2f}")
    print(f"Supporting: {metric.supporting_chunks}, Contradicting: {metric.contradicting_chunks}")
    print(f"Insight: {metric.insight}")
    print("-" * 60)

## Examine Retrieved Chunks

View the regulatory chunks that were matched for each interaction.

In [None]:
for metric in metrics:
    print(f"\n=== QA ID: {metric.qa_id} ===")
    print(f"Response: {metric.assistant[:100]}...")
    print(f"\nRetrieved {len(metric.retrieved_chunks)} chunk(s):")
    
    for chunk in metric.retrieved_chunks:
        print(f"\n  [{chunk.verdict}] {chunk.source} (chunk #{chunk.chunk_index})")
        print(f"  Similarity: {chunk.similarity:.4f}, Reranker: {chunk.reranker_score:.4f}")
        print(f"  Preview: {chunk.text[:100]}...")

## Calculate Summary Statistics

In [None]:
compliant = sum(1 for m in metrics if m.verdict == "COMPLIANT")
non_compliant = sum(1 for m in metrics if m.verdict == "NON_COMPLIANT")
irrelevant = sum(1 for m in metrics if m.verdict == "IRRELEVANT")
avg_score = sum(m.compliance_score for m in metrics) / len(metrics)

print("=== Summary ===")
print(f"Total interactions: {len(metrics)}")
print(f"Compliant: {compliant} ({100*compliant/len(metrics):.1f}%)")
print(f"Non-Compliant: {non_compliant} ({100*non_compliant/len(metrics):.1f}%)")
print(f"Irrelevant: {irrelevant} ({100*irrelevant/len(metrics):.1f}%)")
print(f"Average Compliance Score: {avg_score:.2f}")

## Export Results

Save the results to JSON for further analysis or reporting.

In [None]:
results = [metric.model_dump() for metric in metrics]

output_path = Path("../data/regulatory_results.json")
with open(output_path, "w", encoding="utf-8") as f:
    json.dump(results, f, indent=2, ensure_ascii=False)

print(f"Results saved to {output_path}")