<a href="https://colab.research.google.com/github/Bwayne1966/Articles/blob/main/Verbalized_Sampling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ✍️ Verbalized Sampling: How to Unlock LLM Diversity with a Simple Prompt

### 🔗 Useful Links
[Arxiv Paper](https://arxiv.org/abs/2510.01171) &nbsp;&nbsp;&nbsp;&nbsp; [Blog Post](simonucl.notion.site/verbalized-sampling) &nbsp;&nbsp;&nbsp;&nbsp; [Github Page](https://github.com/CHATS-lab/verbalize-sampling)
&nbsp;&nbsp;&nbsp;&nbsp;
[Package](https://pypi.org/project/verbalized-sampling/)

### 🔗 Link to other notebooks

* **[Direct vs. Verbalized Sampling](https://colab.research.google.com/drive/1UDk4W5w6gF0dQ9Tpu0sPQethEht51GXL#offline=true&sandboxMode=true):** A head-to-head comparison showing a 2-3x diversity improvement in creative tasks while maintaining quality.
* **[Image Generation with VS](https://colab.research.google.com/drive/1J18VJRnrCjIb6sTivY-znb8C3JsLQCIz#offline=true&sandboxMode=true):** A visual comparison for text-to-image generation, showcasing creative diversity.
* **[Complete Framework Tutorial](https://colab.research.google.com/drive/1eC0nIUVC1kyANxxzhNib44qmPphdWy9o#offline=true&sandboxMode=true):** A step-by-step guide to using verbalized sampling, covering everything from API basics to advanced features.

## Introduction

This notebook demonstrates how a **simple prompting technique can boost an LLM's creativity by 2x**. Our method effectively mitigates "mode collapse", the tendency for models to generate very similar, boring responses.

Our running example will be the task of **Story Writing**, using the prompt: "**Write a 100-word story about a bear.**"

In [1]:
#@title Config: Put Your OpenAI API Key
OPENAI_API_KEY = ''

N = 5
USER_PROMPT = "Write a 100-word story about a bear."
DIVERSITY_TUNING_WEIGHT = 0.05

In [2]:
import os
from openai import OpenAI
import re
import textwrap

# Put Your OpenAI API Key
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Initialize OpenAI client
client = OpenAI()

#@title Utility Functions
from IPython.display import display, HTML, Markdown, clear_output
def display_story_comparison(direct_stories, vs_stories, title):
    """Display stories in a side-by-side comparison format"""
    display(Markdown(f"## {title}"))

    # Create HTML table for comparison
    html = "<table border='1' style='width:100%; border-collapse: collapse;'>"
    html += "<tr><th style='width:50%; padding:10px; background-color:#f0f0f0;'>Direct Prompting</th>"
    html += "<th style='width:50%; padding:10px; background-color:#e8f4f8;'>Verbalized Sampling (Ours)</th></tr>"

    max_stories = max(len(direct_stories), len(vs_stories))

    for i in range(max_stories):
        html += "<tr>"

        # Direct Prompting column
        html += "<td style='vertical-align:top; padding:10px;'>"
        if i < len(direct_stories):
            html += f"<strong>Story {i+1}:</strong><br>"
            html += f"<div style='font-family: serif; line-height: 1.4;'>{direct_stories[i]}</div>"
        html += "</td>"

        # Verbalized sampling column
        html += "<td style='vertical-align:top; padding:10px;'>"
        if i < len(vs_stories):
            html += f"<strong>Story {i+1}:</strong><br>"
            html += f"<div style='font-family: serif; line-height: 1.4;'>{vs_stories[i]}</div>"
        html += "</td>"

        html += "</tr>"

    html += "</table>"
    display(HTML(html))

def display_single_story_table(stories, title):
    """Display a single list of stories in a table format"""
    display(Markdown(f"## {title}"))

    # Create HTML table
    html = "<table border='1' style='width:100%; border-collapse: collapse;'>"
    html += "<tr><th style='width:100%; padding:10px; background-color:#f0f0f0;'>Generated Stories</th></tr>"

    for i, story in enumerate(stories):
        html += "<tr>"
        html += "<td style='vertical-align:top; padding:10px;'>"
        html += f"<strong>Story {i+1}:</strong><br>"
        html += f"<div style='font-family: serif; line-height: 1.4;'>{story}</div>"
        html += "</td>"
        html += "</tr>"

    html += "</table>"
    display(HTML(html))

---

## Traditional Method: Direct Prompting with High Temperature


First, the most common way to generate creative stories is to query LLMs **multiple times** at a **high temperature**.

Let's see how it works in practice.



In [3]:
direct_responses = []
for i in range(5):
    response = client.chat.completions.create(
        model="gpt-4.1",
        messages=[
            {
                "role": "user",
                # direct prompting
                "content": USER_PROMPT
            }
        ],
        # high temperature
        temperature=1.0,
    )
    response_text = response.choices[0].message.content
    direct_responses.append(response_text)
    print(f"Generated story {i+1} of 5: {response_text}")

clear_output()
display_single_story_table(direct_responses, "Direct Prompting")

AuthenticationError: Error code: 401 - {'error': {'message': "You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.", 'type': 'invalid_request_error', 'param': None, 'code': None}}

**Takeaway**: We see that even with a high `temperature`, all stories from direct prompting share the same theme, the same core idea, and a similar beginning, showing that direct prompting often leads to repetitive outputs (mode collapse).

**So, are LLMs just not creative?**

## Our Solution: **Verbalized Sampling** (VS)

LLMs can, in fact, be highly creative when prompted correctly. We introduce **Verbalized Sampling (VS)**, a simple method that prompts the model to generate a *distribution* of responses with their *probabilities*.

Now, let's rerun the same example to see how VS can mitigate mode collapse and unleash the LLM's true creative potential.

In [None]:
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "system",
            # Verbalized Sampling (ours)
            "content": """You are a helpful assistant. For each query, please generate a set of five possible responses, \
            each within a separate <response> tag. Responses should each include a <text> and a numeric <probability>. \
            Please sample at random from the full distribution."""
        },
        {
            "role": "user",
            "content": USER_PROMPT,
        }
    ],
    temperature=1.0,
)

# Get the response content
response_content = response.choices[0].message.content

vs_standard_stories = re.findall(r'<text>(.*?)</text>', response_content, re.DOTALL)

display_single_story_table(vs_standard_stories, "Verbalized Sampling")

**Takeaway:** The results are now much more creative and varied. As you can see, each story starts differently and is more diverse.

**But can we go further?**


## VS with **Tunable Diversity**

Yes! We can push for even greater creativity by introducing a **probability threshold**.

This technique asks the model to generate more unique, "long-tail" responses. This gives us an effective knob to **tune the diversity** of the final output.


In [None]:
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "system",
            # Verbalized Sampling (diversity tuning)
            "content": f"""You are a helpful assistant. For each query, please generate a set of 5 possible responses, \
            each within a separate <response> tag. Responses should each include a <text> and a numeric <probability>. \
            Please sample at random from the tails of the distribution, such that the probability of each response is less than {str(DIVERSITY_TUNING_WEIGHT)}."""
        },
        {
            "role": "user",
            "content": USER_PROMPT,
        }
    ],
    temperature=1.0,
)

# Get the response content
response_content = response.choices[0].message.content

vs_stories = re.findall(r'<text>(.*?)</text>', response_content, re.DOTALL)

display_single_story_table(vs_stories, "Verbalized Sampling (w/ Diversity Tuning)")

**Takeaway:** This enables an even higher level of creativity!

For more interesting and creative stories, try **setting the threshold to 0.05 or 0.01 to further increase the diversity**.

### Direct ⚔️ VS: A Side-by-Side Comparison

In this section, we present a side-by-side comparison to highlight the differences between the direct method and Verbalized Sampling.

In [None]:
display_story_comparison(direct_responses, vs_stories, "Story Generation Comparison")



---


# Key Benefits of Verbalized Sampling

📣 **A simple training-free prompt** to mitigate mode collapse

🎯 **Tunable Diversity**: Explores the "tails" of the distribution for tunable creativity.

📈 **Scales with Model Size**: Verbalized Sampling works even better on larger models without sacrificing quality.

✨ **Various Applications**: Effective for creative writing, social simulation, tweet and blog ideas, brainstorming, lesson plan ideas, etc.

### 💡 Practical tips for using VS
- **Use large or reasoning models**: Works best with models like GPT-5, Claude-4, and Gemini 2.5 Pro.
- **Ask for longer outputs if length matters**: The LLM may shorten the final output since it's generated in a single response. Where possible, simply tell it the length you want.
- **Provide a JSON schema for reliability**: In some cases, models can fail to follow the required format, causing parsing errors. Providing a `json_schema` or using a `structured_output` feature ensures the output format.



---


# Try it yourself!
**Here's the direct copy-paste ready code!**

### 1. For Chat Interface
Prefix your normal query with:
```
<instruction>
For each query, please generate a set of five possible responses, each within a separate <response> tag.
Responses should each include a <text> and a numeric <probability> in JSON format.
Please sample at random from the full distribution.
</instruction>
Write a 100-word story about a bear.
```
or you can also adjust the diversity level by changing the system prompt to:
```
<instruction>
For each query, please generate a set of five possible responses, each within a separate <response> tag.
Responses should each include a <text> and a numeric <probability> in JSON format.
Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10.
</instruction>
Write a 100-word story about a bear.
```

### 2. For API Calls/Playgrounds:
*System Prompt*
```
You are a helpful assistant. For each query,
please generate a set of five possible responses, each within a separate <response> tag.
Responses should each include a <text> and a numeric <probability> in JSON format.
Please sample at random from the full distribution.
```
or you can also adjust the diversity level by changing the system prompt to:
```
You are a helpful assistant. For each query,
please generate a set of five possible responses, each within a separate <response> tag.
Responses should each include a <text> and a numeric <probability> in JSON format.
Please sample at random from the tails of the distribution, such that the probability of each response is less than 0.10.
```

*User Prompt*
```
Write a 100-word story about a bear.
```

---
# 🔗 Useful Links
[Arxiv Paper](https://arxiv.org/abs/2510.01171)

[Blog Post](simonucl.notion.site/verbalized-sampling)

[Github Page](https://github.com/CHATS-lab/verbalize-sampling)

[Package](https://pypi.org/project/verbalized-sampling/)