# Example of prior elicitation for a dataset

In [1]:
# import the necessary functions and classes
from llm_elicited_priors.utils import load_prompts
from llm_elicited_priors.gpt import GPTOutputs, get_llm_elicitation_for_dataset
from llm_elicited_priors.datasets import load_wine_quality

import numpy as np

In [2]:
# wrapper for language models
# see llm_elicited_priors.gpt for more details
CLIENT_CLASS = GPTOutputs
CLIENT_KWARGS = dict(
    temperature=0.1,
    model_id="gpt-3.5-turbo-0125",
    result_args=dict(
        response_format={"type": "json_object"},
    ),
)

In [3]:
# load the dataset which contains information
# about the feature names, target names, and 
# the dataset itself
wine_quality = load_wine_quality()

In [4]:
# load the prompts for the system and user roles
system_roles = load_prompts("prompts/elicitation/system_roles_wine_quality.txt")
user_roles = load_prompts("prompts/elicitation/user_roles_wine_quality.txt")

In [5]:
# reducing the number of descriptions for demonstration
system_roles = system_roles[:2]
user_roles = user_roles[:2]

In [6]:
# create the llm client
client = CLIENT_CLASS(**CLIENT_KWARGS)

In [7]:
#### elicit the priors for the dataset ####
expert_priors = get_llm_elicitation_for_dataset(
    # the language model client
    client=client,
    # the prompts
    system_roles=system_roles,
    user_roles=user_roles,
    # the dataset contains the feature names as an attribute
    feature_names=wine_quality.feature_names.tolist(),
    # the dataset contains the target names as an attribute
    target_map={k: v for v, k in enumerate(wine_quality.target_names)},
    # print the prompts before passing them to the language model
    verbose=True,
)

Getting priors for 4 combinations:   0%|          | 0/4 [00:00<?, ?it/s]

System role 
 --------- 
 
You are a simulator of a logistic regression predictive model 
for predicting wine quality from its physicochemical characteristics.
Here the inputs are physicochemical characteristics and the output is 
the probability of the wine being of good quality from physicochemical characteristics. 
Specifically, the targets are bad quality or good quality with mapping
'bad quality' = 0 and 'good quality' = 1.
With your best guess, you can provide the probabilities of 
the wine being of good quality for the given physicochemical characteristics.
User query 
 --------- 
 
I am a data scientist with a dataset and the task: predicting wine 
quality from its physicochemical characteristics. 
I would like to use your model to predict the quality of my samples.
I have a dataset that is made up of the following features:
['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulpha

Getting priors for 4 combinations:  25%|██▌       | 1/4 [00:04<00:13,  4.45s/it]



matched features:
fixed acidity: fixed acidity: [0, 0.5]
volatile acidity: volatile acidity: [-0.2, 0.3]
citric acid: citric acid: [0.3, 0.4]
residual sugar: residual sugar: [0.1, 0.2]
chlorides: chlorides: [-0.4, 0.3]
free sulfur dioxide: free sulfur dioxide: [0.1, 0.2]
total sulfur dioxide: total sulfur dioxide: [-0.1, 0.2]
density: density: [-0.3, 0.4]
pH: pH: [0, 0.3]
sulphates: sulphates: [0.2, 0.3]
alcohol: alcohol: [0.5, 0.4]


elicitation prior:
 [[ 0.   1. ]
 [ 0.   0.5]
 [-0.2  0.3]
 [ 0.3  0.4]
 [ 0.1  0.2]
 [-0.4  0.3]
 [ 0.1  0.2]
 [-0.1  0.2]
 [-0.3  0.4]
 [ 0.   0.3]
 [ 0.2  0.3]
 [ 0.5  0.4]]
System role 
 --------- 
 
You are a simulator of a logistic regression predictive model 
for predicting wine quality from its physicochemical characteristics.
Here the inputs are physicochemical characteristics and the output is 
the probability of the wine being of good quality from physicochemical characteristics. 
Specifically, the targets are bad quality or good quality with

Getting priors for 4 combinations:  50%|█████     | 2/4 [00:07<00:07,  3.88s/it]



matched features:
fixed acidity: fixed acidity: [0.1, 0.05]
volatile acidity: volatile acidity: [-0.2, 0.08]
citric acid: citric acid: [0.15, 0.06]
residual sugar: residual sugar: [0.05, 0.03]
chlorides: chlorides: [-0.25, 0.1]
free sulfur dioxide: free sulfur dioxide: [0.1, 0.04]
total sulfur dioxide: total sulfur dioxide: [-0.15, 0.05]
density: density: [-0.1, 0.04]
pH: pH: [0.05, 0.03]
sulphates: sulphates: [0.2, 0.07]
alcohol: alcohol: [0.3, 0.1]


elicitation prior:
 [[ 0.    1.  ]
 [ 0.1   0.05]
 [-0.2   0.08]
 [ 0.15  0.06]
 [ 0.05  0.03]
 [-0.25  0.1 ]
 [ 0.1   0.04]
 [-0.15  0.05]
 [-0.1   0.04]
 [ 0.05  0.03]
 [ 0.2   0.07]
 [ 0.3   0.1 ]]
System role 
 --------- 
 
You function as a simulator for a logistic regression model aimed at 
predicting wine quality based on its physicochemical properties. The inputs 
to the model are these physicochemical properties, and the output is the 
likelihood of the wine being of high quality. The specific targets are 
bad quality or good 

Getting priors for 4 combinations:  75%|███████▌  | 3/4 [00:11<00:03,  3.91s/it]



matched features:
fixed acidity: fixed acidity: [0, 0.5]
volatile acidity: volatile acidity: [-0.2, 0.3]
citric acid: citric acid: [0.3, 0.4]
residual sugar: residual sugar: [0.1, 0.2]
chlorides: chlorides: [-0.4, 0.3]
free sulfur dioxide: free sulfur dioxide: [0.1, 0.2]
total sulfur dioxide: total sulfur dioxide: [-0.1, 0.2]
density: density: [-0.3, 0.4]
pH: pH: [0, 0.3]
sulphates: sulphates: [0.2, 0.3]
alcohol: alcohol: [0.5, 0.4]


elicitation prior:
 [[ 0.   1. ]
 [ 0.   0.5]
 [-0.2  0.3]
 [ 0.3  0.4]
 [ 0.1  0.2]
 [-0.4  0.3]
 [ 0.1  0.2]
 [-0.1  0.2]
 [-0.3  0.4]
 [ 0.   0.3]
 [ 0.2  0.3]
 [ 0.5  0.4]]
System role 
 --------- 
 
You function as a simulator for a logistic regression model aimed at 
predicting wine quality based on its physicochemical properties. The inputs 
to the model are these physicochemical properties, and the output is the 
likelihood of the wine being of high quality. The specific targets are 
bad quality or good quality, mapped as 'bad quality' = 0 and '

Getting priors for 4 combinations: 100%|██████████| 4/4 [00:15<00:00,  3.95s/it]



matched features:
fixed acidity: fixed acidity: [0.1, 0.05]
volatile acidity: volatile acidity: [-0.2, 0.1]
citric acid: citric acid: [0.15, 0.08]
residual sugar: residual sugar: [-0.1, 0.06]
chlorides: chlorides: [-0.25, 0.12]
free sulfur dioxide: free sulfur dioxide: [0.05, 0.03]
total sulfur dioxide: total sulfur dioxide: [-0.15, 0.07]
density: density: [-0.2, 0.1]
pH: pH: [0.1, 0.05]
sulphates: sulphates: [0.2, 0.1]
alcohol: alcohol: [0.3, 0.15]


elicitation prior:
 [[ 0.    1.  ]
 [ 0.1   0.05]
 [-0.2   0.1 ]
 [ 0.15  0.08]
 [-0.1   0.06]
 [-0.25  0.12]
 [ 0.05  0.03]
 [-0.15  0.07]
 [-0.2   0.1 ]
 [ 0.1   0.05]
 [ 0.2   0.1 ]
 [ 0.3   0.15]]





In [8]:
print("Elicited priors:")
print(np.stack(expert_priors))

Elicited priors:
[[[ 0.    1.  ]
  [ 0.    0.5 ]
  [-0.2   0.3 ]
  [ 0.3   0.4 ]
  [ 0.1   0.2 ]
  [-0.4   0.3 ]
  [ 0.1   0.2 ]
  [-0.1   0.2 ]
  [-0.3   0.4 ]
  [ 0.    0.3 ]
  [ 0.2   0.3 ]
  [ 0.5   0.4 ]]

 [[ 0.    1.  ]
  [ 0.1   0.05]
  [-0.2   0.08]
  [ 0.15  0.06]
  [ 0.05  0.03]
  [-0.25  0.1 ]
  [ 0.1   0.04]
  [-0.15  0.05]
  [-0.1   0.04]
  [ 0.05  0.03]
  [ 0.2   0.07]
  [ 0.3   0.1 ]]

 [[ 0.    1.  ]
  [ 0.    0.5 ]
  [-0.2   0.3 ]
  [ 0.3   0.4 ]
  [ 0.1   0.2 ]
  [-0.4   0.3 ]
  [ 0.1   0.2 ]
  [-0.1   0.2 ]
  [-0.3   0.4 ]
  [ 0.    0.3 ]
  [ 0.2   0.3 ]
  [ 0.5   0.4 ]]

 [[ 0.    1.  ]
  [ 0.1   0.05]
  [-0.2   0.1 ]
  [ 0.15  0.08]
  [-0.1   0.06]
  [-0.25  0.12]
  [ 0.05  0.03]
  [-0.15  0.07]
  [-0.2   0.1 ]
  [ 0.1   0.05]
  [ 0.2   0.1 ]
  [ 0.3   0.15]]]
