# Detect Hallucinations
This notebook shows two different methods of detecting hallucinations:
  1) Asking an LLM to rate the output, and list any potential hallucinations.  This is more of a qualitative judgment.
  2) Use an LLM specifically trained to detect hallucinations.  Here we use the [Vectara model](https://huggingface.co/vectara/hallucination_evaluation_model?) from HuggingFace.
  
This notebook is divided into three sections:
  1) Set up the environment.
  2) Set up the functions for detecting hallucinations.
  3) Test out the functions.

## 1) Set up the envionment.

In [3]:
#!pip install -U sentence-transformers

Collecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m638.0 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting transformers<5.0.0,>=4.6.0 (from sentence-transformers)
  Downloading transformers-4.35.0-py3-none-any.whl.metadata (123 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m123.1/123.1 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting torch>=1.6.0 (from sentence-transformers)
  Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Collecting torchvision (from sentence-transformers)
  Downloading torchvision-0.16.0-cp310-cp310-manylinux1_x86_64.whl.metadata (6.6 kB)
Collecting sentencepiece (from sentence-transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━

In [5]:
from sentence_transformers import CrossEncoder

model = CrossEncoder('vectara/hallucination_evaluation_model')

## 2) Set up the functions for detecting hallucinations.

In [6]:
def get_hallucination_score(source, generation):
    '''
    A score of less than 0.5 indicates a likely hallucination.
    Note that the context length of the model is 512 tokens across both documents.
    '''
    scores = model.predict([source,generation])
    return scores

## 3) Test out the functions.

In [17]:
#A flag to run the tests - defaults to off so that this functions can be called by other scripts.
RUN_EXAMPLES = False
if __name__ == '__main__':
    RUN_EXAMPLES = True

In [18]:
if RUN_EXAMPLES:
    original_text = "A man walks into a bar and buys a drink"
    generated_text = "A man swigs alcohol at a pub"

    is_hallucination = False
    score = get_hallucination_score(original_text,generated_text)
    if score<0.5:is_hallucination=True

    print ("Original:  ",original_text)
    print ("Generated: ",generated_text)
    if is_hallucination:
        print("The generated text likely contains a hallucination. (score of %s)"%(round(score,3)))
    else:
        print("The generated text likely does NOT contain a hallucination. (score of %s)"%(round(score,3)))

Original:   A man walks into a bar and buys a drink
Generated:  A man swigs alcohol at a pub
The generated text likely does NOT contain a hallucination. (score of 0.534)
