# Tutorial 2: Verbalized Distribution

There has been growing evidence that verbalized distribution can achieve high performance when asking LLMs Multiple Choice Questions. QSTN supports this option out of the box. We will show this on a simple example tp see how models predict the 2024 US election.

## Setting up the Prompt

In [None]:
from qstn.prompt_builder import LLMPrompt
from qstn.utilities import placeholder

import pandas as pd

system_prompt = "You are an expert political analyst."

# We can add any state election we want to predict here.
elections_to_predict = [
    "2024 US Presidential Election",
    "2024 United States presidential election in Illinois",
]

# The placeholders automatically define at which point of the prompt the questions are asked.
formatted_tasks = [
    f"Please predict the outcome of the {election}. {placeholder.PROMPT_OPTIONS} {placeholder.PROMPT_AUTOMATIC_OUTPUT_INSTRUCTIONS} {placeholder.PROMPT_QUESTIONS}"
    for election in elections_to_predict
]

# If we want to ask multiple questions we can define them here or save them in a csv
questionnaire = pd.DataFrame(
    [{"questionnaire_item_id": 1, "question_content": "Percentage of each Candidate"}]
)

interviews: list[LLMPrompt] = []

# This creates a system prompt and an instruction for the model, which is not in the system prompt. We also set a seed for reproducibility.
for task, election in zip(formatted_tasks, elections_to_predict):
    interviews.append(
        LLMPrompt(
            questionnaire_source=questionnaire,
            questionnaire_name=election,
            system_prompt=system_prompt,
            prompt=task,
            seed=42,
        )
    )

## Using Verbalized Distribution

To now get valid verbalized distribution output for our model we need to do two things:

1. Define the Response Generation Method.


In [2]:
from qstn.inference.response_generation import JSONVerbalizedDistribution

# We can also adjut the automatic template to our liking. 
# If we don't want create an automatic template, we can just not put it into the prompt.
response_generation_method = JSONVerbalizedDistribution(
    output_template="Respond only in JSON format, where the keys are the names of the candidates and the values are the percentage of votes the candidate achieves.",
    output_index_only=False, # If we want to save tokens we can output only the index of our answer
)

2. Define the options the LLM should have when responding. For now we choose 5 candidates that had some chances at the end of LLamas pretraining cutoff.

In [3]:
from qstn.prompt_builder import generate_likert_options

# Our five most likely candidates and how they are presented to the model
options = generate_likert_options(
    n=5,
    answer_texts=["Biden", "Trump", "Harris", "DeSantis", "Kennedy"],
    response_generation_method=response_generation_method,
    list_prompt_template="The candidates are {options}.", # Our automatic Option Prompt
)

Finally we have to prepare the prompt with all the options that we defined:

In [4]:
for interview in interviews:
    interview.prepare_prompt(
        question_stem=f"Please predict the {placeholder.QUESTION_CONTENT} now. The percentage of each candidate should add up to 100%.",
        answer_options=options,
        randomized_item_order=True, # We can easily randomize the options
    )

And look at the whole prompt:

In [5]:
system_prompt, prompt = interviews[0].get_prompt_for_questionnaire_type()

print(f"System Prompt: {system_prompt}")
print(f"Prompt: {prompt}")

System Prompt: You are an expert political analyst.
Prompt: Please predict the outcome of the 2024 US Presidential Election. The candidates are 1: Biden, 2: Trump, 3: Harris, 4: DeSantis, 5: Kennedy. Respond only in JSON format, where the keys are the names of the candidates and the values are the percentage of votes the candidate achieves. Please predict the Percentage of each Candidate now. The percentage of each candidate should add up to 100%.


And we can run inference:

In [None]:
from vllm import LLM

# First we create the model
model = LLM("meta-llama/Llama-3.2-3B-Instruct", max_model_len=1000)

In [None]:
from qstn.survey_manager import conduct_survey_single_item
# Second we run inference
results = conduct_survey_single_item(
    model,
    llm_prompts=interviews,
    max_tokens=500,
    seed=42,
)

## Parsing Output

We can easily parse the output now, as it is in JSON format.

In [8]:
from qstn import parser

parsed_response = parser.parse_json(results)

We get one DataFrame for each of our Interviews.

In [None]:
df = parsed_response[interviews[0]]
df2 = parsed_response[interviews[1]]

display(df)

|    |   questionnaire_item_id | question                                                                                                     |   1: Biden |   2: Trump |   3: Harris |   4: DeSantis |   5: Kennedy |
|---:|------------------------:|:-------------------------------------------------------------------------------------------------------------|-----------:|-----------:|------------:|--------------:|-------------:|
|  0 |                       1 | Please predict the Percentage of each Candidate now. The percentage of each candidate should add up to 100%. |         25 |         40 |           0 |            30 |            5 |


We can also get both answers in a combined df.

In [None]:
from qstn.utilities import create_one_dataframe

df_complete = create_one_dataframe(parsed_response)
display(df_complete)

|    | questionnaire_name                                   |   questionnaire_item_id | question                                                                                                     |   1: Biden |   2: Trump |   3: Harris |   4: DeSantis |   5: Kennedy |
|---:|:-----------------------------------------------------|------------------------:|:-------------------------------------------------------------------------------------------------------------|-----------:|-----------:|------------:|--------------:|-------------:|
|  0 | 2024 US Presidential Election                        |                       1 | Please predict the Percentage of each Candidate now. The percentage of each candidate should add up to 100%. |       25   |       40   |         0   |          30   |          5   |
|  1 | 2024 United States presidential election in Illinois |                       1 | Please predict the Percentage of each Candidate now. The percentage of each candidate should add up to