# Artefactual Package Demo: Hallucination Detection

This notebook demonstrates the `artefactual` package for scoring LLM outputs, specifically focusing on hallucination detection using entropy-based methods.

We explore two examples:
1.  **General Knowledge Question**: Asking for the capital of France (expected high certainty/low entropy).
2.  **Hallucination Trigger**: Asking about the first author of our paper (Charles Moslonka) to observe how the model hallucinates biographical details (expected higher entropy/uncertainty).

We will use:
*   `vLLM` for efficient inference with the `Falcon3-10B-Instruct` model.
*   `WEPR` (Weighted EPR) scorer from the `artefactual` package to quantify the uncertainty of the generated sequences.
*   Visualizations to inspect token-level scores, highlighting potentially hallucinated segments.

In [None]:
# import modules
from pprint import pprint
from IPython.display import HTML, display

from vllm import LLM, SamplingParams

from artefactual.preprocessing import parse_model_outputs
from artefactual.scoring import WEPR

In [None]:
MODEL_CHECKPOINT = "tiiuae/Falcon3-10B-Instruct"

llm = LLM(
    model=MODEL_CHECKPOINT,
    tensor_parallel_size=2)  # adjust based on your hardware

In [130]:
prompt1 = ["What is the capital city of France ? Please answer briefly."]

prompt2 = ["Who is Charles Moslonka ? Where was he born ? Please answer in two sentences."]


sampling_params = SamplingParams(temperature=0.7, top_p=1, logprobs=15, max_tokens=2000, top_k=50)
print("Running inference with vLLM...")  # noqa: T201
outputs1 = llm.generate(prompt1, sampling_params)
outputs2 = llm.generate(prompt2, sampling_params)

Running inference with vLLM...


Adding requests: 100%|██████████| 1/1 [00:00<00:00, 619.09it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00,  5.10it/s, est. speed input: 67.54 toks/s, output: 46.75 toks/s]
Adding requests: 100%|██████████| 1/1 [00:00<00:00, 483.16it/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00,  1.98it/s, est. speed input: 39.87 toks/s, output: 51.82 toks/s]


In [131]:
seq_len_1 = len(outputs1[0].outputs[0].logprobs)
token_ids_1 = outputs1[0].outputs[0].token_ids

seq_len_2 = len(outputs2[0].outputs[0].logprobs)
token_ids_2 = outputs2[0].outputs[0].token_ids

print(f"Output 1 generated sequence :\n{outputs1[0].outputs[0].text}")  # noqa: T201

print("""
**** Output 1 Tokens and Logprobs ****
""")  # noqa: T201

for i in range(seq_len_1):
    pprint(f"Token ID: {token_ids_1[i]}, Decoded_token: {outputs1[0].outputs[0].logprobs[i][token_ids_1[i]].decoded_token}")  # noqa: T201
list_of_sampled_tokens_1 = [outputs1[0].outputs[0].logprobs[i][token_ids_1[i]].decoded_token for i in range(seq_len_1)]

print("\n\n")
print(f"Output 2 generated sequence :\n{outputs2[0].outputs[0].text}")  # noqa: T201
print("\n\n")

print("""\n**** Output 2 Tokens and Logprobs ****\n
""")  # noqa: T201

for i in range(seq_len_2):
    pprint(f"Token ID: {token_ids_2[i]}, Decoded_token: {outputs2[0].outputs[0].logprobs[i][token_ids_2[i]].decoded_token}")  # noqa: T201
list_of_sampled_tokens_2 = [outputs2[0].outputs[0].logprobs[i][token_ids_2[i]].decoded_token for i in range(seq_len_2)]

Output 1 generated sequence :

<|assistant|>
Paris.

**** Output 1 Tokens and Logprobs ****

'Token ID: 12, Decoded_token: \n'
'Token ID: 2051, Decoded_token: <'
'Token ID: 2115, Decoded_token: |'
'Token ID: 91961, Decoded_token: assistant'
'Token ID: 100846, Decoded_token: |>'
'Token ID: 12, Decoded_token: \n'
'Token ID: 41308, Decoded_token: Paris'
'Token ID: 2037, Decoded_token: .'
'Token ID: 11, Decoded_token: <|endoftext|>'



Output 2 generated sequence :

<|assistant|>
Charles Moslonka is a South African writer and journalist. He was born in Cape Town.




**** Output 2 Tokens and Logprobs ****


'Token ID: 12, Decoded_token: \n'
'Token ID: 2051, Decoded_token: <'
'Token ID: 2115, Decoded_token: |'
'Token ID: 91961, Decoded_token: assistant'
'Token ID: 100846, Decoded_token: |>'
'Token ID: 12, Decoded_token: \n'
'Token ID: 33350, Decoded_token: Charles'
'Token ID: 20109, Decoded_token:  Mos'
'Token ID: 15465, Decoded_token: lon'
'Token ID: 7406, Decoded_token: ka'
'Token ID: 234

### Instantiate the WEPR scorer

First parse the output with the `parse_model_outputs` function.

Then pass the logprobs directly to the `scorer.compute()` method.

In [None]:
scorer = WEPR(pretrained_model_name_or_path=MODEL_CHECKPOINT)

processed_prompts_1 = parse_model_outputs(outputs1)
wepr_scores_1 = scorer.compute(processed_prompts_1)
print(f"WEPR Sequence Score for the first output: {wepr_scores_1}")  # noqa: T201
print("\n")
wepr_token_scores_1 = scorer.compute_token_scores(processed_prompts_1)
print(f"WEPR Token-level Scores for the first output: {wepr_token_scores_1}")
print("\n")
print("*"*40)
print("\n")
processed_prompts_2 = parse_model_outputs(outputs2)
wepr_scores_2 = scorer.compute(processed_prompts_2)
print(f"WEPR Sequence Score for the second output: {wepr_scores_2}")  # noqa: T201
print("\n")

wepr_token_scores_2 = scorer.compute_token_scores(processed_prompts_2)
print(f"WEPR Token-level Scores for the second output: {wepr_token_scores_2}")

WEPR Sequence Score for the first output: [0.22794435452266917]


WEPR Token-level Scores for the first output: [array([0.11841268, 0.08238266, 0.07749902, 0.07747021, 0.08158927,
       0.07747053, 0.0351684 , 0.11387737, 0.07800352], dtype=float32)]


****************************************


WEPR Sequence Score for the second output: [0.9678448302576981]


WEPR Token-level Scores for the second output: [array([0.21733956, 0.18688072, 0.07805948, 0.07747021, 0.08485898,
       0.07747161, 0.08701459, 0.08287204, 0.07750972, 0.07747956,
       0.22602727, 0.0991644 , 0.7714013 , 0.08257567, 0.9991934 ,
       0.46143848, 0.9989643 , 0.16862838, 0.09296815, 0.07872952,
       0.07794683, 0.09680568, 0.8333083 , 0.07828344, 0.35632887,
       0.10569014], dtype=float32)]


### Visualize the token probabilities

We will use inline HTML coloring.

In [None]:
def get_color(score):
    # Using rgba for alpha transparency
    if 0 <= score <= 0.35:
        return "rgba(0, 255, 0, 0.3)"  # Green with low alpha
    elif 0.35 < score <= 0.7:
        return "rgba(255, 255, 0, 0.3)"  # Yellow with low alpha
    else:
        return "rgba(255, 0, 0, 0.3)"    # Red with low alpha

In [134]:
html_content = '<div style="font-family: monospace; font-size: 14px; line-height: 1.5;">'
for token, score in zip(list_of_sampled_tokens_1, wepr_token_scores_1[0]):
    # Handle newlines for display purposes
    display_token = token.replace('\n', '<br>')
    color = get_color(score)
    html_content += f'<span style="background-color: {color}; padding: 2px; margin: 1px; border-radius: 3px;">{display_token}</span>'

html_content += '</div>'
print(f"WEPR Sequence Score for the first output: {wepr_scores_1}") 
display(HTML(html_content))

WEPR Sequence Score for the first output: [0.22794435452266917]


In [135]:
html_content = '<div style="font-family: monospace; font-size: 14px; line-height: 1.5;">'
for token, score in zip(list_of_sampled_tokens_2, wepr_token_scores_2[0]):
    # Handle newlines for display purposes
    display_token = token.replace('\n', '<br>')
    color = get_color(score)
    html_content += f'<span style="background-color: {color}; padding: 2px; margin: 1px; border-radius: 3px;">{display_token}</span>'

html_content += '</div>'
print(f"WEPR Sequence Score for the second output: {wepr_scores_2}") 
display(HTML(html_content))

WEPR Sequence Score for the second output: [0.9678448302576981]
