# Generation of Reference Distributions ($f_{T}^{l}$)

This notebook implements the protocol described in Section 5 to generate the reference answers used to estimate the background temperature. We collect outputs from a set of idealized reference models $\mathcal{L}$ (SmolLM3-3B, Llama-3.2-3B-Instruct, Mistral-7B-Instruct-v0.3) over a grid of temperatures. These outputs form the empirical distributions $f_{T}^{l}, l \in \mathcal{L}$, which serve as the baselines to measure the background temperature of the system under test.

Note that the models and dataset used here serve as an example, other models (in a stable inference environment) and dataset could be used.

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, logging
from datasets import load_dataset
import numpy as np
from tqdm import tqdm
import json
import torch
device = "cuda"  # for GPU usage or "cpu" for CPU usage

### Reference Models Setup
As defined in the experimental settings, we use locally executed models to ensure a stable inference environment. 
- Models: Llama-3.2-3B-Instruct, Mistral-7B-Instruct-v0.3, SmolLM3-3B.
- Stability: These models are run on local GPUs to minimize system-level non-determinism, serving as the "quasi-ideal" baseline described in Section 3.2.

In [None]:
reference_models = ["meta-llama/Llama-3.2-3B-Instruct",
                    "mistralai/Mistral-7B-Instruct-v0.3",
                    "HuggingFaceTB/SmolLM3-3B"]

In [None]:
def messages_prompt(model_name, prompt):
    if model_name == "HuggingFaceTB/SmolLM3-3B":
        messages = [
                        {"role": "system", "content": "/no_think"},
                        {"role": "user", "content": prompt}
                    ]
    elif model_name == "/opt/data/IIPV/experiments/NLP/LanguageModels/Llama-3.2-3B-Instruct":  #"meta-llama/Llama-3.2-3B-Instruct"
        messages = [
                        {"role": "user", "content": prompt}
                    ]
    elif model_name == "/opt/data/IIPV/experiments/NLP/LanguageModels/Mistral-7B-Instruct-v0.3": #"mistralai/Mistral-7B-Instruct-v0.3"
        messages = [
                        {"role": "user", "content": prompt}
                    ]
    else:
        messages = [
                        {"role": "user", "content": prompt}
                    ]
    return messages


### Setup dell'Esperimento e Generazione delle Risposte

The following cells define the dataset ($\Pi^{30}$) and sampling temperatures ($\Theta$). Then we launch and save the actual generation of answers by the reference models.

In [None]:
temperatures = list(np.arange(0.0, 0.2, 0.01)) + list(np.arange(0.2, 0.5 + 0.05, 0.05)) + [0.6,0.7, 0.8,0.9, 1] 
temperatures = [round(t,2) for t in temperatures]

In [None]:
prompts = load_dataset("truthfulqa/truthful_qa", 'generation')['validation']['question'][:30]
N = 32
tok_limit = 32

In [None]:
logging.set_verbosity_error()
references = {}    #dictionary that will contain all the answers for each model, temperature and prompt.

for model_name in reference_models:
    print(model_name)
    references[model_name] = {}
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
    for temperature in tqdm(temperatures):
        references[model_name][temperature] = {}
        for question in prompts:
            answers = []
            messages = messages_prompt(model_name, question)            
            text = tokenizer.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=True,
            )
            model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
            L = len(model_inputs.input_ids[0])
            for i in range(N):
                try:
                    #some model does not give the possibility of set the temperature to 0, but set top_k=1 is equivalent.
                    if temperature == 0:
                        generated_ids = model.generate(**model_inputs, max_new_tokens=tok_limit, top_k = 1)
                    else:
                        generated_ids = model.generate(**model_inputs, max_new_tokens=tok_limit, temperature = temperature)
                    answers.append(generated_ids[0][L:].tolist())
                except Exception as e:
                    print(e)
            references[model_name][temperature][question] = answers
    del model
    del tokenizer
    torch.cuda.empty_cache()


In [None]:
with open("references.json", "w") as f:
    json.dump(references, f, indent=4)