# Day 20: Evaluation Suites for Language Models - Part 2

In this notebook, we'll focus on measuring and reducing hallucinations in language models, which is a critical aspect of model evaluation.

## Overview

Hallucinations in language models refer to generated content that is factually incorrect, internally inconsistent, or not supported by the provided context or real-world knowledge. Detecting and mitigating hallucinations is essential for building reliable AI systems.

In [None]:
# Import necessary libraries
import torch
import numpy as np
import pandas as pd
import json
import random
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

## 1. Types of Hallucinations

Hallucinations can be categorized into several types:

1. **Factual Hallucinations**: Generating content that contradicts established facts
2. **Contextual Hallucinations**: Generating content not supported by the provided context
3. **Logical Hallucinations**: Generating content with internal contradictions or logical inconsistencies

## 2. Implementing a Hallucination Detector

Let's implement a simple hallucination detector that can identify potential hallucinations in model outputs.

In [None]:
class HallucinationDetector:
    """A simple detector for identifying potential hallucinations in text."""
    
    def __init__(self):
        # Define patterns that often indicate hallucinations
        self.uncertainty_phrases = [
            "i believe", "i think", "probably", "likely", "might be", "could be",
            "possibly", "perhaps", "may have", "may be", "seems to be"
        ]
        
        self.factual_claim_patterns = [
            "in fact", "actually", "definitely", "certainly", "undoubtedly",
            "always", "never", "everyone knows", "it is well known", "clearly"
        ]
        
        self.contradiction_indicators = [
            "however", "but", "although", "nevertheless", "conversely",
            "on the other hand", "in contrast", "yet", "instead", "rather"
        ]
        
        # Knowledge base for fact-checking (very limited for demonstration)
        self.knowledge_base = {
            "earth": {"shape": "oblate spheroid", "satellite": "moon", "star": "sun"},
            "water": {"boiling point": "100°C", "freezing point": "0°C", "molecule": "H2O"},
            "human body": {"bones": "206", "blood type": "A, B, AB, O", "brain": "cerebrum, cerebellum, brainstem"},
            "python": {"creator": "Guido van Rossum", "first release": "1991", "type": "programming language"},
            "shakespeare": {"born": "1564", "died": "1616", "works": "hamlet, macbeth, romeo and juliet"}
        }
    
    def check_uncertainty(self, text):
        """Check if text contains uncertainty phrases."""
        text_lower = text.lower()
        found_phrases = [phrase for phrase in self.uncertainty_phrases if phrase in text_lower]
        return len(found_phrases) > 0, found_phrases
    
    def check_strong_claims(self, text):
        """Check if text contains strong factual claims."""
        text_lower = text.lower()
        found_phrases = [phrase for phrase in self.factual_claim_patterns if phrase in text_lower]
        return len(found_phrases) > 0, found_phrases
    
    def check_contradictions(self, text):
        """Check for potential contradictions in text."""
        text_lower = text.lower()
        found_phrases = [phrase for phrase in self.contradiction_indicators if phrase in text_lower]
        return len(found_phrases) > 0, found_phrases
    
    def check_against_knowledge_base(self, text):
        """Check text against knowledge base for factual accuracy."""
        text_lower = text.lower()
        potential_errors = []
        
        # Very simple fact checking (for demonstration only)
        for topic, facts in self.knowledge_base.items():
            if topic in text_lower:
                for attribute, value in facts.items():
                    # Check if attribute is mentioned
                    if attribute in text_lower:
                        # Check if the correct value is mentioned
                        if value not in text_lower:
                            # Look for incorrect values
                            words_after_attribute = text_lower.split(attribute)[1].split()[:5]
                            potential_errors.append(f"Potential error about {topic}'s {attribute}: expected '{value}'")
        
        return len(potential_errors) > 0, potential_errors
    
    def detect_hallucinations(self, text):
        """Detect potential hallucinations in text."""
        results = {
            "has_uncertainty": False,
            "uncertainty_phrases": [],
            "has_strong_claims": False,
            "strong_claim_phrases": [],
            "has_contradictions": False,
            "contradiction_phrases": [],
            "has_factual_errors": False,
            "potential_factual_errors": [],
            "hallucination_score": 0.0,
            "hallucination_risk": "low"
        }
        
        # Check for uncertainty
        results["has_uncertainty"], results["uncertainty_phrases"] = self.check_uncertainty(text)
        
        # Check for strong claims
        results["has_strong_claims"], results["strong_claim_phrases"] = self.check_strong_claims(text)
        
        # Check for contradictions
        results["has_contradictions"], results["contradiction_phrases"] = self.check_contradictions(text)
        
        # Check against knowledge base
        results["has_factual_errors"], results["potential_factual_errors"] = self.check_against_knowledge_base(text)
        
        # Calculate hallucination score (simple heuristic)
        score = 0.0
        if results["has_uncertainty"]:
            score += 0.2 * len(results["uncertainty_phrases"])
        if results["has_strong_claims"]:
            score += 0.3 * len(results["strong_claim_phrases"])
        if results["has_contradictions"]:
            score += 0.4 * len(results["contradiction_phrases"])
        if results["has_factual_errors"]:
            score += 0.5 * len(results["potential_factual_errors"])
        
        results["hallucination_score"] = min(1.0, score)  # Cap at 1.0
        
        # Determine risk level
        if results["hallucination_score"] < 0.3:
            results["hallucination_risk"] = "low"
        elif results["hallucination_score"] < 0.6:
            results["hallucination_risk"] = "medium"
        else:
            results["hallucination_risk"] = "high"
        
        return results

## 3. Testing the Hallucination Detector

Let's test our hallucination detector on some example texts.

In [None]:
# Create hallucination detector
detector = HallucinationDetector()

# Test texts with varying levels of potential hallucination
test_texts = [
    "The Earth is an oblate spheroid and orbits around the Sun. It has one natural satellite, the Moon.",
    "I believe the Earth might be flat, although scientists claim it's round. In fact, many people throughout history have believed the Earth is flat.",
    "Water boils at 100°C at sea level, but it actually boils at 90°C on mountains. However, it might boil at different temperatures depending on altitude.",
    "Shakespeare was born in 1564 and died in 1616. He definitely wrote 154 sonnets and 37 plays, including Hamlet, Macbeth, and Romeo and Juliet.",
    "Python was created by Guido van Rossum in 1991. It's certainly the most popular programming language ever created, and everyone knows it's the easiest to learn."
]

# Test the detector
for i, text in enumerate(test_texts):
    print(f"\nText {i+1}:\n{text}\n")
    results = detector.detect_hallucinations(text)
    print(f"Hallucination Risk: {results['hallucination_risk']} (Score: {results['hallucination_score']:.2f})")
    
    if results["has_uncertainty"]:
        print(f"Uncertainty Phrases: {', '.join(results['uncertainty_phrases'])}")
    
    if results["has_strong_claims"]:
        print(f"Strong Claims: {', '.join(results['strong_claim_phrases'])}")
    
    if results["has_contradictions"]:
        print(f"Potential Contradictions: {', '.join(results['contradiction_phrases'])}")
    
    if results["has_factual_errors"]:
        print(f"Potential Factual Errors: {', '.join(results['potential_factual_errors'])}")
    
    print("-" * 50)

## 4. Implementing a Retrieval-Augmented Generation (RAG) System

One effective way to reduce hallucinations is through Retrieval-Augmented Generation (RAG), which grounds model responses in retrieved information.

In [None]:
class SimpleRAG:
    """A simple Retrieval-Augmented Generation system."""
    
    def __init__(self, model, tokenizer, knowledge_base):
        self.model = model
        self.tokenizer = tokenizer
        self.knowledge_base = knowledge_base
    
    def retrieve_relevant_info(self, query):
        """Retrieve relevant information from knowledge base."""
        query_lower = query.lower()
        relevant_info = []
        
        # Simple keyword matching (in a real system, this would use embeddings or BM25)
        for topic, facts in self.knowledge_base.items():
            if topic in query_lower:
                topic_info = f"{topic.capitalize()}: "
                fact_strings = [f"{attr} is {val}" for attr, val in facts.items()]
                topic_info += ", ".join(fact_strings)
                relevant_info.append(topic_info)
        
        return "\n".join(relevant_info) if relevant_info else ""
    
    def generate_response(self, query, max_new_tokens=100):
        """Generate a response using RAG."""
        # Retrieve relevant information
        retrieved_info = self.retrieve_relevant_info(query)
        
        # Create augmented prompt
        if retrieved_info:
            augmented_prompt = f"""Based on the following information:\n\n{retrieved_info}\n\nPlease answer: {query}"""
        else:
            augmented_prompt = f"Please answer: {query}"
        
        # Generate response
        inputs = self.tokenizer(augmented_prompt, return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = self.model.generate(
                inputs["input_ids"],
                max_new_tokens=max_new_tokens,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
                pad_token_id=self.tokenizer.pad_token_id
            )
        
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = response[len(self.tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)):].strip()
        
        return {
            "query": query,
            "retrieved_info": retrieved_info,
            "augmented_prompt": augmented_prompt,
            "response": response
        }

## 5. Testing the RAG System

Let's test our RAG system and compare it with standard generation.

In [None]:
# Load a model for demonstration
try:
    model_name = "gpt2"  # Using a small model for demonstration
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
    
    # Add padding token if it doesn't exist
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        model.config.pad_token_id = model.config.eos_token_id
    
    print(f"Model loaded: {model_name}")
    
    # Create RAG system
    rag = SimpleRAG(model, tokenizer, detector.knowledge_base)
    
    # Test queries
    test_queries = [
        "What shape is the Earth?",
        "Tell me about water's boiling and freezing points.",
        "When was Shakespeare born and what are some of his works?",
        "Who created Python and when?"
    ]
    
    # Standard generation function
    def standard_generate(query, max_new_tokens=100):
        inputs = tokenizer(f"Please answer: {query}", return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = model.generate(
                inputs["input_ids"],
                max_new_tokens=max_new_tokens,
                do_sample=True,
                temperature=0.7,
                top_p=0.9,
                pad_token_id=tokenizer.pad_token_id
            )
        
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = response[len(tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)):].strip()
        
        return response
    
    # Compare standard generation vs. RAG
    for query in test_queries:
        print(f"\nQuery: {query}\n")
        
        # Standard generation
        std_response = standard_generate(query)
        print(f"Standard Response:\n{std_response}\n")
        std_results = detector.detect_hallucinations(std_response)
        print(f"Hallucination Risk: {std_results['hallucination_risk']} (Score: {std_results['hallucination_score']:.2f})\n")
        
        # RAG generation
        rag_result = rag.generate_response(query)
        print(f"Retrieved Info: {rag_result['retrieved_info']}\n")
        print(f"RAG Response:\n{rag_result['response']}\n")
        rag_results = detector.detect_hallucinations(rag_result['response'])
        print(f"Hallucination Risk: {rag_results['hallucination_risk']} (Score: {rag_results['hallucination_score']:.2f})")
        
        print("-" * 50)
    
except Exception as e:
    print(f"Error running RAG comparison: {e}")
    print("Skipping RAG demonstration.")

## 6. Conclusion

In this notebook, we've explored techniques for measuring and reducing hallucinations in language models:

1. We implemented a simple hallucination detector that can identify potential issues in model outputs
2. We created a basic Retrieval-Augmented Generation (RAG) system to ground model responses in factual information
3. We compared standard generation with RAG to demonstrate how retrieval can reduce hallucinations

These techniques form an important part of a comprehensive evaluation framework for language models. By systematically measuring and addressing hallucinations, we can build more reliable and trustworthy AI systems.

In practice, more sophisticated approaches would be used, including:

- Advanced retrieval methods using dense embeddings or hybrid search
- More comprehensive knowledge bases or external APIs
- LLM-based evaluators for more nuanced hallucination detection
- Self-consistency techniques that generate multiple responses and check for agreement

The key takeaway is that hallucination detection and mitigation should be an integral part of any language model evaluation suite.