# Detect Hallucinations
This notebook shows three different methods of detecting hallucinations:
  1) Asking an LLM to rate the output, and list any potential hallucinations.  This is more of a qualitative judgment.
  2) Use an LLM specifically trained to detect hallucinations.  Here we use the [Vectara model](https://huggingface.co/vectara/hallucination_evaluation_model?) from HuggingFace. 3) Is the Contextual Embedding Similarity Analysis, which evaluates the semantic similarity between source and generated texts using deep contextual embeddings.
  
This notebook is divided into 4 sections:
  1) Set up the environment.
  2) Set up the functions for detecting hallucinations.
  3) Test out the functions.
  4) Above 3 steps for Contextual Embedding Similarity Analysis

## 1) Set up the envionment.

In [1]:
!pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/171.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m92.2/171.5 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.11.0->sentence-transformers)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (1

In [2]:
from sentence_transformers import CrossEncoder

model = CrossEncoder('vectara/hallucination_evaluation_model')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/738M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/575 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.65M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/23.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/173 [00:00<?, ?B/s]

## 2) Set up the functions for detecting hallucinations.

In [3]:
def get_hallucination_score(source, generation):
    '''
    A score of less than 0.5 indicates a likely hallucination.
    Note that the context length of the model is 512 tokens across both documents.
    '''
    scores = model.predict([source,generation])
    return scores

## 3) Test out the functions.

In [4]:
#A flag to run the tests - defaults to off so that this functions can be called by other scripts.
RUN_EXAMPLES = False
if __name__ == '__main__':
    RUN_EXAMPLES = True

In [5]:
if RUN_EXAMPLES:
    original_text = "A man walks into a bar and buys a drink"
    generated_text = "A man swigs alcohol at a pub"

    is_hallucination = False
    score = get_hallucination_score(original_text,generated_text)
    if score<0.5:is_hallucination=True

    print ("Original:  ",original_text)
    print ("Generated: ",generated_text)
    if is_hallucination:
        print("The generated text likely contains a hallucination. (score of %s)"%(round(score,3)))
    else:
        print("The generated text likely does NOT contain a hallucination. (score of %s)"%(round(score,3)))

Original:   A man walks into a bar and buys a drink
Generated:  A man swigs alcohol at a pub
The generated text likely does NOT contain a hallucination. (score of 0.534)


## 4) Contextual Embedding Similarity Analysis

In [7]:
from transformers import AutoTokenizer, AutoModel
import torch
from scipy.spatial.distance import cosine

# Load a model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')

In [9]:
def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)  # Average pooling
    return embeddings.squeeze()  # Ensure we have a 1D array

def calculate_similarity(source_text, generated_text):
    source_embedding = get_embedding(source_text)
    generated_embedding = get_embedding(generated_text)
    similarity = 1 - cosine(source_embedding.detach().numpy(), generated_embedding.detach().numpy())
    return similarity

def is_hallucination(source_text, generated_text, threshold=0.8):
    similarity = calculate_similarity(source_text, generated_text)
    return similarity < threshold, similarity

In [12]:
# Testing the functions
source = "A man walks into a bar and buys a drink."
generated = "A man goes to a park and rides a bicycle."

hallucination_detected, confidence = is_hallucination(source, generated)
print(f"Hallucination Detected: {hallucination_detected}, Similarity Score: {confidence}")

Hallucination Detected: True, Similarity Score: 0.3324964940547943
