# Predictive Entropy for UQ of LLMs

Welcome to this tutorial on using predictive entropy with LLMs!
In this guide, you’ll learn:
- What predictive entropy is and how it helps quantify uncertainty in LLMs
- How predictive entropy integrates with LUQ (Language Model Uncertainty Quantification)

In [None]:
!pip install git+https://github.com/alexanderVNikitin/luq
!pip install -q transformers accelerate

In [None]:
import luq
from luq.models import HFLLMWrapper
from luq.methods import PredictiveEntropyEstimator
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch


## Step 1: Initialize an LLM

First, initialize the LLM wrapped with LUQ.
In this example, we’ll use the `meta-llama/Meta-Llama-3-8B-instruct model`.

In [None]:
from huggingface_hub import notebook_login

notebook_login()

In [None]:
model_id = "meta-llama/Meta-Llama-3-8B-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

hf_wrapper = HFLLMWrapper(tokenizer=tokenizer, model=model)

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Error while downl

OSError: meta-llama/Meta-Llama-3-8B-instruct does not appear to have files named ('model-00001-of-00004.safetensors', 'model-00002-of-00004.safetensors', 'model-00003-of-00004.safetensors'). Checkout 'https://huggingface.co/meta-llama/Meta-Llama-3-8B-instruct/tree/main'for available files.

## Step 2. Generate Samples from the LLM

Here, we generate multiple samples from the LLM. we use a prompt "Choose a single random letter from the following options: a, b, c, or d. Only respond with the letter."

In [13]:
prompt = "Choose a single random letter from the following options: a, b, c, or d. Only respond with the letter."

llm = HFLLMWrapper(tokenizer=tokenizer, model=model)
samples = luq.models.generate_n_samples_and_answer(
    llm,
    prompt=prompt
)

## Step 3. Predictive Entropy
### Step 3a. Background

Let the set $\mathcal{T}$ be a vocabulary of tokens, $S \in \mathcal{T}^N$ be a sequence of length $N$, consisting of tokens, $s_i \in \mathcal{T}$.
Then the probability of a sentence for an input $x$ is given as
\begin{equation}
p(S = s \mid x) = \prod\nolimits_{i}{p(s_i | s_{<i}, x)}, 
\end{equation}
where $s_{<i}$ denotes all tokens preceding $s_i$. 


The predictive entropy for an input $x$ and a random output sequence $S$ is defined as
\begin{equation}
    U(x)=H(S \mid x)=-\sum\nolimits_{s} p(s \mid x) \log p(s \mid x),
\end{equation}
where the summation is over all possible output sequences $s$. As in previous cases, this score is approximated using a finite number of sampled outputs.


### Step 3b: Apply Predictive Entropy in LUQ

Now, we can estimate predictive entropy of LLM response by using the samples generated on the previous step.

In [None]:
pe_estimator = PredictiveEntropyEstimator()
print(pe_estimator.estimate_uncertainty(samples))